You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Azure Data Factory (ADF) is a cloud-based data integration service that allows you to create data-driven workflows (pipelines) for orchestrating and automating data movement and data transformation. It's the ETL/ELT service of choice in Azure for enterprise-scale data integration scenarios.
Data Factory enables you to ingest data from various sources, transform it at scale, and load it into data stores for analytics, reporting, and machine learning.
Core Concepts
graph TB
subgraph "Azure Data Factory"
PIPELINE[Pipeline]
ACTIVITY[Activities]
DATASET[Datasets]
LS[Linked Services]
TRIGGER[Triggers]
IR[Integration Runtime]
end
TRIGGER --> PIPELINE
PIPELINE --> ACTIVITY
ACTIVITY --> DATASET
DATASET --> LS
LS --> IR
style PIPELINE fill:#0078D4,color:#fff
flowchart LR
subgraph "Extract"
SRC1[(SQL Server)]
SRC2[(Oracle)]
SRC3[Files]
end
subgraph "Transform"
DF[Data Flow]
end
subgraph "Load"
DW[(Synapse Analytics)]
end
SRC1 & SRC2 & SRC3 --> DF --> DW
style DF fill:#0078D4,color:#fff
Loading
Pattern 2: ELT Pipeline
flowchart LR
subgraph "Extract & Load"
COPY[Copy Activity]
end
subgraph "Source"
SRC[(Source DB)]
end
subgraph "Staging"
LAKE[(Data Lake)]
end
subgraph "Transform"
DB[Databricks]
SYN[Synapse]
end
SRC --> COPY --> LAKE
LAKE --> DB --> DW[(Data Warehouse)]
LAKE --> SYN --> DW
style COPY fill:#0078D4,color:#fff
Loading
Pattern 3: Incremental Load
flowchart TB
GET[Get Watermark] --> LOOKUP[Lookup New Data]
LOOKUP --> COPY[Copy New Rows]
COPY --> UPDATE[Update Watermark]
style GET fill:#50E6FF
style COPY fill:#0078D4,color:#fff
Loading
Copy Activity
Architecture
flowchart LR
subgraph "Source"
SRC[Source Dataset]
SRC_LS[Linked Service]
end
subgraph "Copy Activity"
DIR[DIU Allocation]
PARALLEL[Parallel Copy]
STAGING[Staging if needed]
end
subgraph "Sink"
SINK[Sink Dataset]
SINK_LS[Linked Service]
end
SRC --> SRC_LS --> DIR
DIR --> PARALLEL --> STAGING --> SINK_LS --> SINK
style DIR fill:#0078D4,color:#fff
flowchart TB
subgraph "Development"
DEV_ADF[Dev Data Factory]
DEV_GIT[Feature Branch]
end
subgraph "Build"
ARM[ARM Templates]
PARAM[Parameter Files]
end
subgraph "Release"
TEST[Test Environment]
PROD[Production]
end
DEV_ADF --> DEV_GIT --> ARM
ARM --> TEST --> PROD
PARAM --> TEST & PROD
style ARM fill:#50E6FF
Loading
Best Practices
Design
Practice
Description
Modular pipelines
Reusable components
Use parameters
Environment flexibility
Incremental loads
Efficient processing
Error handling
Proper failure paths
Naming conventions
Consistent naming
Performance
Practice
Description
Right-size DIUs
Balance cost/performance
Use staging
For Synapse/PolyBase
Partition data
Parallel processing
Optimize queries
Source optimization
Cache lookup results
Reduce round trips
Operations
Practice
Description
Enable Git
Version control
Monitor pipelines
Proactive alerting
Document pipelines
Annotations
Test thoroughly
Debug before publish
Architecture Patterns
Pattern: Data Lake Architecture
flowchart TB
subgraph "Sources"
SRC1[(Operational DB)]
SRC2[Files]
SRC3[APIs]
end
subgraph "Data Factory"
ADF[Orchestration]
end
subgraph "Data Lake"
RAW[Raw Zone]
CURATED[Curated Zone]
CONSUME[Consumption Zone]
end
subgraph "Analytics"
SYN[Synapse]
PBI[Power BI]
end
SRC1 & SRC2 & SRC3 --> ADF
ADF --> RAW --> CURATED --> CONSUME
CONSUME --> SYN --> PBI
style ADF fill:#0078D4,color:#fff
Loading
Pattern: Hybrid Integration
flowchart LR
subgraph "On-Premises"
ONPREM[(On-Prem SQL)]
SHIR[Self-Hosted IR]
end
subgraph "Azure"
ADF[Data Factory]
ADLS[(Data Lake)]
SYN[(Synapse)]
end
ONPREM --> SHIR --> ADF
ADF --> ADLS --> SYN
style ADF fill:#0078D4,color:#fff
style SHIR fill:#50E6FF