Skip to content

[FAQ] Why is the pipeline structured into multiple layers instead of directly analyzing raw data? #264

@AsherJD-io

Description

@AsherJD-io

Course

data-engineering-zoomcamp

Question

Why did you choose a layered pipeline (raw → canonical → analytics → mart) instead of directly analyzing the raw data?

Answer

Direct analysis on raw data creates tight coupling between ingestion format and analytical logic.

This leads to:

  • repeated cleaning logic across queries
  • inconsistent metric definitions
  • difficulty scaling transformations

By introducing layers:

  • canonical layer enforces schema consistency and data correctness
  • analytics layer encapsulates reusable metric logic
  • mart layer optimizes for consumption and time-series analysis

This separation mirrors production-grade data architectures (ELT pattern), improves reproducibility, and ensures that downstream insights remain stable even if upstream data sources change.

Checklist

  • I have searched existing FAQs and this question is not already answered
  • The answer provides accurate, helpful information
  • I have included any relevant code examples or links

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions