Tuesday, March 18, 2025

Commonly used DataFrame functions

Data Manipulation Functions

1. select(): Selects a subset of columns from the DataFrame.
2. filter(): Filters the DataFrame based on a condition.
3. where(): Similar to filter(), but allows for more complex conditions.
4. groupBy(): Groups the DataFrame by one or more columns.
5. agg(): Performs aggregation operations on the grouped DataFrame.
6. join(): Joins two DataFrames based on a common column.
7. union(): Combines two DataFrames into a single DataFrame.
8. intersect(): Returns the intersection of two DataFrames.
9. exceptAll(): Returns the difference between two DataFrames.

Data Transformation Functions

1. withColumn(): Adds a new column to the DataFrame.
2. withColumnRenamed(): Renames an existing column in the DataFrame.
3. drop(): Drops one or more columns from the DataFrame.
4. cast(): Casts a column to a different data type.
5. orderBy(): Sorts the DataFrame by one or more columns.
6. sort(): Similar to orderBy(), but allows for more complex sorting.
7. repartition(): Repartitions the DataFrame into a specified number of partitions.

Data Analysis Functions

1. count(): Returns the number of rows in the DataFrame.
2. sum(): Returns the sum of a column in the DataFrame.
3. avg(): Returns the average of a column in the DataFrame.
4. max(): Returns the maximum value of a column in the DataFrame.
5. min(): Returns the minimum value of a column in the DataFrame.
6. groupBy().pivot(): Pivots the DataFrame by a column and performs aggregation.
7. corr(): Returns the correlation between two columns in the DataFrame.


Data Inspection Functions

1. show(): Displays the first few rows of the DataFrame.
2. printSchema(): Prints the schema of the DataFrame.
3. dtypes: Returns the data types of the columns in the DataFrame.
4. columns: Returns the column names of the DataFrame.
5. head(): Returns the first few rows of the DataFrame.

No comments:

Post a Comment

Data synchronization in Lakehouse

Data synchronization in Lakebase ensures that transactional data and analytical data remain up-to-date across the lakehouse and Postgres d...