Aman Bhattarai  /  Data Engineering Portfolio
View on GitHub
Azure Data Engineering · End-to-End Pipeline

SQL Server → ADLS Gen2
via Medallion Architecture

A production-style pipeline ingesting operational data through Bronze, Silver and Gold layers using ADF watermark CDC, Databricks Auto Loader, and Delta Live Tables.

Azure Data Factory Databricks Delta Live Tables Unity Catalog Medallion Architecture
Pipeline Architecture
Data Source
Azure SQL Server
Relational OLTP database
Dimension Tables Fact Tables
ADF · CDC Incremental
Bronze Layer
Raw Ingestion
ADLS Gen2 · Parquet
ForEach Activity Watermark-based CDC JSON State Management Incremental Copy Full Reload Reset
Auto Loader
cloudFiles
Silver Layer
Stream Processing
ADLS Gen2 · Delta Tables
Structured Streaming cloudFiles Read / Write Checkpoints trigger(once=True) Delta Lake Unity Catalog
DLT Pipeline
CDC Auto Flow
Gold Layer
Analytics Layer
ADLS Gen2 · Delta Tables
Delta Live Tables dlt.create_auto_cdc_flow() SCD Type 2 SCD Type 1 DLT Expectations Data Quality Checks Jinja Templating Dynamic SQL
Watermark-based CDC
JSON state files track lastLoadTime per table — only modified rows copied each run
🌊
Structured Streaming
Auto Loader with isolated read/write checkpoints for fault-tolerant exactly-once delivery
🔺
Delta Lake
ACID-compliant managed tables in Unity Catalog across Silver and Gold layers
DLT Expectations
Declarative data quality rules enforced at staging — invalid records dropped before Gold
🔄
SCD Type 2
Full change history on dimensions tracked via __START_AT / __END_AT metadata
🧬
Delta Live Tables
dlt.create_auto_cdc_flow() for declarative CDC-driven MERGE pipeline management
📦
Databricks Asset Bundle
Dev / prod deployment targets with source-linked workspace deployment via YAML config
🧩
Jinja Templating
Dynamic multi-table SQL JOIN generation at runtime — no hardcoded query logic
🔐
Access Connector
Credential-free identity-based auth between Databricks and ADLS Gen2