Sunday, February 16, 2025

What is a Medallion architecture

The Medallion Architecture is a data design pattern used in Databricks to logically organize data within a Lakehouse. The goal is to incrementally and progressively improve the structure and quality of data as it flows through each layer of the architecture1. The architecture consists of three layers: Bronze, Silver, and Gold.

Layers of Medallion Architecture: Bronze Layer: This layer is where raw data is ingested from external source systems. The data is stored "as-is" with minimal processing2. The focus is on quick data capture and providing an historical archive.

Silver Layer: In this layer, the data from the Bronze layer is cleaned, validated, and conformed. It provides an "Enterprise view" of key business entities and transactions2. This layer is used for self-service analytics and ad-hoc reporting.

Gold Layer: The Gold layer contains highly refined and enriched data. It is optimized for advanced analytics and machine learning2. This layer serves as the source for business intelligence and decision-making.

Benefits: Data Quality: Incrementally improves data quality as it moves through each layer.

Scalability: Supports large-scale data processing and analytics.

Flexibility: Allows for different levels of data processing and usage.

Databricks provides tools like Delta Live Tables (DLT) to help users build data pipelines with Bronze, Silver, and Gold tables efficiently

No comments:

Post a Comment

Data synchronization in Lakehouse

Data synchronization in Lakebase ensures that transactional data and analytical data remain up-to-date across the lakehouse and Postgres d...