Data Manipulation
1. withColumn(): Add a new column to a DataFrame.
2. withColumnRenamed(): Rename an existing column.
3. drop(): Drop one or more columns from a DataFrame.
4. cast(): Cast a column to a different data type.
Data Analysis
1. count(): Count the number of rows in a DataFrame.
2. sum(): Calculate the sum of a column.
3. avg(): Calculate the average of a column.
4. max(): Find the maximum value in a column.
5. min(): Find the minimum value in a column.
Data Transformation
1. explode(): Transform an array column into separate rows.
2. flatten(): Flatten a nested struct column.
3. split(): Split a string column into an array.
Dataframe Operations
1. distinct(): Return a DataFrame with unique rows.
2. intersect(): Return a DataFrame with rows common to two DataFrames
3. exceptAll(): Return a DataFrame with rows in the first DataFrame but not in the second.
4. repartition(): Repartition a DataFrame to increase or decrease the number of partitions.
5. coalesce(): Coalesce a DataFrame to reduce the number of partitions.
Data Manipulation
1. orderBy(): Order a DataFrame by one or more columns.
2. sort(): Sort a DataFrame by one or more columns.
3. limit(): Limit the number of rows in a DataFrame.
4. sample(): Return a sampled subset of a DataFrame.
5. randomSplit(): Split a DataFrame into multiple DataFrames randomly.
Data Analysis
1. corr(): Calculate the correlation between two columns.
2. cov(): Calculate the covariance between two columns.
3. skewness(): Calculate the skewness of a column.
4. kurtosis(): Calculate the kurtosis of a column.
5. approxQuantile(): Calculate an approximate quantile of a column.
Data Transformation
1. udf(): Create a user-defined function (UDF) to transform data.
2. apply(): Apply a UDF to a column.
3. transform(): Transform a DataFrame using a UDF.
4. map(): Map a DataFrame to a new DataFrame using a UDF.
String Functions
1. concat(): Concatenate two or more string columns.
2. length(): Calculate the length of a string column.
3. lower(): Convert a string column to lowercase.
4. upper(): Convert a string column to uppercase.
5. trim(): Trim whitespace from a string column.
Date and Time Functions
1. current_date(): Return the current date.
2. current_timestamp(): Return the current timestamp.
3. date_format(): Format a date column.
4. hour(): Extract the hour from a timestamp column.
5. dayofweek(): Extract the day of the week from a date column.
Subscribe to:
Post Comments (Atom)
Data synchronization in Lakehouse
Data synchronization in Lakebase ensures that transactional data and analytical data remain up-to-date across the lakehouse and Postgres d...
-
Steps to Implement Medallion Architecture : Ingest Data into the Bronze Layer : Load raw data from external sources (e.g., databases, AP...
-
from pyspark.sql import SparkSession from pyspark.sql.types import ArrayType, StructType from pyspark.sql.functions import col, explode_o...
-
Databricks Platform Architecture The Databricks platform architecture consists of two main components: the Control Plane and the Data Pla...
No comments:
Post a Comment