Dataframe Operations
1. select(): Select specific columns from a DataFrame.
2. filter(): Filter rows based on conditions.
3. where(): Similar to filter(), but uses SQL-like syntax.
4. groupBy(): Group rows by one or more columns.
5. agg(): Perform aggregation operations (e.g., sum, count, avg).
6. join(): Join two DataFrames based on a common column.
7. union(): Combine two DataFrames into a single DataFrame.
Data Manipulation
1. withColumn(): Add a new column to a DataFrame.
2. withColumnRenamed(): Rename an existing column.
3. drop(): Drop one or more columns from a DataFrame.
4. cast(): Cast a column to a different data type.
Data Analysis
1. count(): Count the number of rows in a DataFrame.
2. sum(): Calculate the sum of a column.
3. avg(): Calculate the average of a column.
4. max(): Find the maximum value in a column.
5. min(): Find the minimum value in a column.
Data Transformation
1. explode(): Transform an array column into separate rows.
2. flatten(): Flatten a nested struct column.
3. split(): Split a string column into an array.
Subscribe to:
Post Comments (Atom)
Data synchronization in Lakehouse
Data synchronization in Lakebase ensures that transactional data and analytical data remain up-to-date across the lakehouse and Postgres d...
-
Steps to Implement Medallion Architecture : Ingest Data into the Bronze Layer : Load raw data from external sources (e.g., databases, AP...
-
from pyspark.sql import SparkSession from pyspark.sql.types import ArrayType, StructType from pyspark.sql.functions import col, explode_o...
-
Databricks Platform Architecture The Databricks platform architecture consists of two main components: the Control Plane and the Data Pla...
No comments:
Post a Comment