Data Manipulation Functions
1. select(): Selects a subset of columns from the DataFrame.
2. filter(): Filters the DataFrame based on a condition.
3. where(): Similar to filter(), but allows for more complex conditions.
4. groupBy(): Groups the DataFrame by one or more columns.
5. agg(): Performs aggregation operations on the grouped DataFrame.
6. join(): Joins two DataFrames based on a common column.
7. union(): Combines two DataFrames into a single DataFrame.
8. intersect(): Returns the intersection of two DataFrames.
9. exceptAll(): Returns the difference between two DataFrames.
Data Transformation Functions
1. withColumn(): Adds a new column to the DataFrame.
2. withColumnRenamed(): Renames an existing column in the DataFrame.
3. drop(): Drops one or more columns from the DataFrame.
4. cast(): Casts a column to a different data type.
5. orderBy(): Sorts the DataFrame by one or more columns.
6. sort(): Similar to orderBy(), but allows for more complex sorting.
7. repartition(): Repartitions the DataFrame into a specified number of partitions.
Data Analysis Functions
1. count(): Returns the number of rows in the DataFrame.
2. sum(): Returns the sum of a column in the DataFrame.
3. avg(): Returns the average of a column in the DataFrame.
4. max(): Returns the maximum value of a column in the DataFrame.
5. min(): Returns the minimum value of a column in the DataFrame.
6. groupBy().pivot(): Pivots the DataFrame by a column and performs aggregation.
7. corr(): Returns the correlation between two columns in the DataFrame.
Data Inspection Functions
1. show(): Displays the first few rows of the DataFrame.
2. printSchema(): Prints the schema of the DataFrame.
3. dtypes: Returns the data types of the columns in the DataFrame.
4. columns: Returns the column names of the DataFrame.
5. head(): Returns the first few rows of the DataFrame.
Subscribe to:
Post Comments (Atom)
Data synchronization in Lakehouse
Data synchronization in Lakebase ensures that transactional data and analytical data remain up-to-date across the lakehouse and Postgres d...
-
Steps to Implement Medallion Architecture : Ingest Data into the Bronze Layer : Load raw data from external sources (e.g., databases, AP...
-
from pyspark.sql import SparkSession from pyspark.sql.types import ArrayType, StructType from pyspark.sql.functions import col, explode_o...
-
Databricks Platform Architecture The Databricks platform architecture consists of two main components: the Control Plane and the Data Pla...
No comments:
Post a Comment