Azure Data Engineering · End-to-End Pipeline
SQL Server → ADLS Gen2
via
Medallion Architecture
A production-style pipeline ingesting operational data
through Bronze, Silver and Gold layers using ADF watermark
CDC, Databricks Auto Loader, and Delta Live Tables.
Azure Data Factory
Databricks
Delta Live Tables
Unity Catalog
Medallion Architecture
Pipeline Architecture
Data Source
Azure SQL Server
Relational OLTP database
Dimension Tables
Fact Tables
Bronze Layer
Raw Ingestion
ADLS Gen2 · Parquet
ForEach Activity
Watermark-based CDC
JSON State Management
Incremental Copy
Full Reload Reset
Silver Layer
Stream Processing
ADLS Gen2 · Delta Tables
Structured Streaming
cloudFiles
Read / Write Checkpoints
trigger(once=True)
Delta Lake
Unity Catalog
DLT Pipeline
CDC Auto Flow
Gold Layer
Analytics Layer
ADLS Gen2 · Delta Tables
Delta Live Tables
dlt.create_auto_cdc_flow()
SCD Type 2
SCD Type 1
DLT Expectations
Data Quality Checks
Jinja Templating
Dynamic SQL
Techniques & Concepts Applied
⚡
Watermark-based CDC
JSON state files track lastLoadTime per table — only
modified rows copied each run
🌊
Structured Streaming
Auto Loader with isolated read/write checkpoints for
fault-tolerant exactly-once delivery
🔺
Delta Lake
ACID-compliant managed tables in Unity Catalog
across Silver and Gold layers
✅
DLT Expectations
Declarative data quality rules enforced at staging —
invalid records dropped before Gold
🔄
SCD Type 2
Full change history on dimensions tracked via
__START_AT / __END_AT metadata
🧬
Delta Live Tables
dlt.create_auto_cdc_flow() for declarative
CDC-driven MERGE pipeline management
📦
Databricks Asset Bundle
Dev / prod deployment targets with source-linked
workspace deployment via YAML config
🧩
Jinja Templating
Dynamic multi-table SQL JOIN generation at runtime —
no hardcoded query logic
🔐
Access Connector
Credential-free identity-based auth between
Databricks and ADLS Gen2