End-to-end retail analytics platform on Snowflake + dbt — ingesting multi-source retail data via Fivetran, transforming through a layered dbt project (Staging → Intermediate → Marts), and serving star schema fact/dim tables to BI tools. Achieved 45% reduction in analyst query time and 30% improvement in data accuracy through modular dbt models, automated testing, and CI/CD-driven deployments.
flowchart TD
subgraph Sources["📥 Source Systems"]
A1[E-commerce Platform\nOrders · Customers]
A2[Product Catalogue\nInventory · Pricing]
A3[Marketing Platform\nCampaigns · Events]
A4[Store Operations\nStore · Channel Data]
end
subgraph Ingestion["⬇️ Ingestion — Fivetran"]
B[Fivetran\nManaged Connectors]
C[Snowflake RAW Schema\nAs-received source tables]
end
subgraph Transformation["🔄 Transformation — dbt on Snowflake"]
D[Staging Layer\nViews · Rename · Cast · Clean\nSource freshness checks]
E[Intermediate Layer\nEphemeral · Joins · Business Logic\nLTV · Order enrichment]
F[Marts — Core\nTables · Star Schema\ndim_customers · dim_products\ndim_dates · fct_orders · fct_order_items]
G[Marts — Marketing\nTables · Campaign Performance\nCTR · Conversion Rate · Revenue per Click]
H[Snapshots\nSCD Type 2\ncustomer_snapshot]
end
subgraph Quality["✅ Data Quality"]
I[dbt Tests\nUnique · Not Null · Accepted Values\ndbt_expectations · Custom SQL tests]
J[Source Freshness\nFivetran lag monitoring]
end
subgraph Orchestration["⚙️ Orchestration"]
K[Apache Airflow\nScheduled dbt runs\nDag: dbt_retail_daily]
end
subgraph Serving["📊 Serving"]
L[Tableau\nExecutive Dashboards]
M[Power BI\nSelf-service Analytics]
N[Snowflake\nZero-copy Cloning\nDev/Test Environments]
end
A1 & A2 & A3 & A4 --> B --> C
C --> D --> E --> F & G
C --> H
I -.->|Tests| D & F & G
J -.->|Freshness| C
K -->|Orchestrate| D & E & F & G
F & G --> L & M & N
style D fill:#FF694B,stroke:#333,color:#fff
style F fill:#29B5E8,stroke:#333,color:#fff
style G fill:#29B5E8,stroke:#333,color:#fff
style K fill:#017CEE,stroke:#333,color:#fff
A retail business with 3,000+ stores had analyst teams spending 60%+ of their time writing repetitive SQL joins across raw Fivetran tables in Snowflake — no standardised models, no shared definitions for metrics like LTV or conversion rate, and no data quality tests. Every team had a different number for the same KPI.
A layered dbt project enforcing a single source of truth — staging views clean and rename raw data, intermediate ephemeral models handle complex business logic, and mart tables serve pre-joined star schema to BI tools. dbt tests run on every PR via GitHub Actions, and Snowflake zero-copy cloning enables safe dev/test environments without data duplication.
| Metric | Value |
|---|---|
| Analyst query time reduction | 45% |
| Data accuracy improvement | 30% |
| dbt models | 14 (5 staging · 2 intermediate · 5 core marts · 1 marketing · 1 snapshot) |
| dbt tests | 30+ (schema + custom SQL) |
| Source tables | 5 (orders, customers, products, order_items, store_events) |
| Deployment | Automated via GitHub Actions on every merge to main |
| Dev environments | Snowflake zero-copy cloning — no data duplication cost |
| Layer | Technology |
|---|---|
| Data Warehouse | Snowflake (multi-cluster, zero-copy cloning) |
| Ingestion | Fivetran (managed connectors) |
| Transformation | dbt Core 1.8 |
| Orchestration | Apache Airflow |
| Testing | dbt tests, dbt_expectations, custom SQL assertions |
| SQL Linting | SQLFluff (Snowflake dialect) |
| CI/CD | GitHub Actions |
| BI | Tableau, Power BI |
| Packages | dbt_utils, dbt_expectations, codegen |
snowflake-dbt-retail-analytics/
├── models/
│ ├── staging/ # Views — clean, rename, cast raw sources
│ │ ├── _sources.yml # Source definitions + freshness config
│ │ ├── _stg_models.yml # Staging model schema + tests
│ │ ├── stg_orders.sql
│ │ ├── stg_customers.sql
│ │ ├── stg_products.sql
│ │ ├── stg_order_items.sql
│ │ └── stg_store_events.sql
│ ├── intermediate/ # Ephemeral — joins + business logic
│ │ ├── _int_models.yml
│ │ ├── int_orders_enriched.sql # Orders + customer + item aggregations
│ │ └── int_customer_lifetime_value.sql # LTV + retention segmentation
│ └── marts/
│ ├── core/ # Tables — star schema, BI-ready
│ │ ├── _core_models.yml
│ │ ├── dim_customers.sql # Customer dim with LTV enrichment
│ │ ├── dim_products.sql # Product dim with margin calculations
│ │ ├── dim_dates.sql # Date spine (dbt_utils.date_spine)
│ │ ├── fct_orders.sql # Order fact — grain: one row per order
│ │ └── fct_order_items.sql # Order items fact — grain: one row per line
│ └── marketing/ # Tables — campaign analytics
│ ├── _marketing_models.yml
│ └── mrt_campaign_performance.sql # CTR, CVR, revenue per click
├── snapshots/
│ └── customer_snapshot.sql # SCD Type 2 — customer attribute history
├── macros/
│ ├── cents_to_dollars.sql # Reusable cents → dollars conversion
│ ├── generate_surrogate_key.sql # Project surrogate key wrapper
│ └── safe_divide.sql # Null-safe division
├── tests/
│ └── assert_net_amount_positive.sql # Custom SQL test
├── seeds/ # Reference / lookup data (CSV)
├── analyses/ # Ad-hoc analytical queries
├── dbt_project.yml # Project config + materialisation strategy
├── packages.yml # dbt package dependencies
├── profiles.yml # Snowflake connection template
├── requirements.txt
├── .sqlfluff # SQL linting rules (Snowflake dialect)
└── .github/workflows/
└── dbt_ci.yml # GitHub Actions: compile + test + lint on PR
- Staging as views, marts as tables — staging views always reflect latest source data with zero storage cost; mart tables are pre-aggregated for fast BI query response.
- Intermediate models as ephemeral — complex join logic runs in-query without creating intermediate Snowflake objects, keeping the warehouse schema clean.
- dbt_utils.date_spine for dim_dates — generated date dimension from a single config var, eliminating manual CSV maintenance and enabling dynamic range extension.
- SCD Type 2 via dbt snapshot — customer attribute history (loyalty tier, segment) captured automatically using timestamp strategy — no custom merge logic required.
- Surrogate keys via dbt_utils.generate_surrogate_key — consistent MD5-based surrogate keys across all dims and facts, decoupled from source system natural keys.
- Snowflake zero-copy cloning — dev and test environments cloned from prod instantly with no storage duplication cost, enabling safe experimentation.
# Clone the repo
git clone https://github.com/jesseantony/snowflake-dbt-retail-analytics.git
cd snowflake-dbt-retail-analytics
# Install Python dependencies
pip install -r requirements.txt
# Copy and fill in your Snowflake credentials
cp profiles.yml ~/.dbt/profiles.yml
# Edit ~/.dbt/profiles.yml with your Snowflake account details
# Install dbt packages
dbt deps
# Verify connection
dbt debug
# Run all models
dbt run
# Run tests
dbt test
# Generate and serve docs
dbt docs generate
dbt docs servedbt snowflake data-engineering analytics-engineering star-schema elt python airflow github-actions fivetran data-modelling scd-type-2 sqlfluff