Course
data-engineering-zoomcamp
Question
Why did you choose a layered pipeline (raw → canonical → analytics → mart) instead of directly analyzing the raw data?
Answer
Direct analysis on raw data creates tight coupling between ingestion format and analytical logic.
This leads to:
- repeated cleaning logic across queries
- inconsistent metric definitions
- difficulty scaling transformations
By introducing layers:
- canonical layer enforces schema consistency and data correctness
- analytics layer encapsulates reusable metric logic
- mart layer optimizes for consumption and time-series analysis
This separation mirrors production-grade data architectures (ELT pattern), improves reproducibility, and ensures that downstream insights remain stable even if upstream data sources change.
Checklist
Course
data-engineering-zoomcamp
Question
Why did you choose a layered pipeline (raw → canonical → analytics → mart) instead of directly analyzing the raw data?
Answer
Direct analysis on raw data creates tight coupling between ingestion format and analytical logic.
This leads to:
By introducing layers:
This separation mirrors production-grade data architectures (ELT pattern), improves reproducibility, and ensures that downstream insights remain stable even if upstream data sources change.
Checklist