Skip to content

jesseantony/snowflake-dbt-retail-analytics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Snowflake + dbt Retail Analytics Platform

dbt Snowflake Python Apache Airflow GitHub Actions License dbt CI

End-to-end retail analytics platform on Snowflake + dbt — ingesting multi-source retail data via Fivetran, transforming through a layered dbt project (Staging → Intermediate → Marts), and serving star schema fact/dim tables to BI tools. Achieved 45% reduction in analyst query time and 30% improvement in data accuracy through modular dbt models, automated testing, and CI/CD-driven deployments.


📐 Architecture

flowchart TD
    subgraph Sources["📥 Source Systems"]
        A1[E-commerce Platform\nOrders · Customers]
        A2[Product Catalogue\nInventory · Pricing]
        A3[Marketing Platform\nCampaigns · Events]
        A4[Store Operations\nStore · Channel Data]
    end

    subgraph Ingestion["⬇️ Ingestion — Fivetran"]
        B[Fivetran\nManaged Connectors]
        C[Snowflake RAW Schema\nAs-received source tables]
    end

    subgraph Transformation["🔄 Transformation — dbt on Snowflake"]
        D[Staging Layer\nViews · Rename · Cast · Clean\nSource freshness checks]
        E[Intermediate Layer\nEphemeral · Joins · Business Logic\nLTV · Order enrichment]
        F[Marts — Core\nTables · Star Schema\ndim_customers · dim_products\ndim_dates · fct_orders · fct_order_items]
        G[Marts — Marketing\nTables · Campaign Performance\nCTR · Conversion Rate · Revenue per Click]
        H[Snapshots\nSCD Type 2\ncustomer_snapshot]
    end

    subgraph Quality["✅ Data Quality"]
        I[dbt Tests\nUnique · Not Null · Accepted Values\ndbt_expectations · Custom SQL tests]
        J[Source Freshness\nFivetran lag monitoring]
    end

    subgraph Orchestration["⚙️ Orchestration"]
        K[Apache Airflow\nScheduled dbt runs\nDag: dbt_retail_daily]
    end

    subgraph Serving["📊 Serving"]
        L[Tableau\nExecutive Dashboards]
        M[Power BI\nSelf-service Analytics]
        N[Snowflake\nZero-copy Cloning\nDev/Test Environments]
    end

    A1 & A2 & A3 & A4 --> B --> C
    C --> D --> E --> F & G
    C --> H
    I -.->|Tests| D & F & G
    J -.->|Freshness| C
    K -->|Orchestrate| D & E & F & G
    F & G --> L & M & N

    style D fill:#FF694B,stroke:#333,color:#fff
    style F fill:#29B5E8,stroke:#333,color:#fff
    style G fill:#29B5E8,stroke:#333,color:#fff
    style K fill:#017CEE,stroke:#333,color:#fff
Loading

🎯 Problem Statement

A retail business with 3,000+ stores had analyst teams spending 60%+ of their time writing repetitive SQL joins across raw Fivetran tables in Snowflake — no standardised models, no shared definitions for metrics like LTV or conversion rate, and no data quality tests. Every team had a different number for the same KPI.

✅ Solution

A layered dbt project enforcing a single source of truth — staging views clean and rename raw data, intermediate ephemeral models handle complex business logic, and mart tables serve pre-joined star schema to BI tools. dbt tests run on every PR via GitHub Actions, and Snowflake zero-copy cloning enables safe dev/test environments without data duplication.


📊 Key Metrics

Metric Value
Analyst query time reduction 45%
Data accuracy improvement 30%
dbt models 14 (5 staging · 2 intermediate · 5 core marts · 1 marketing · 1 snapshot)
dbt tests 30+ (schema + custom SQL)
Source tables 5 (orders, customers, products, order_items, store_events)
Deployment Automated via GitHub Actions on every merge to main
Dev environments Snowflake zero-copy cloning — no data duplication cost

🛠️ Tech Stack

Layer Technology
Data Warehouse Snowflake (multi-cluster, zero-copy cloning)
Ingestion Fivetran (managed connectors)
Transformation dbt Core 1.8
Orchestration Apache Airflow
Testing dbt tests, dbt_expectations, custom SQL assertions
SQL Linting SQLFluff (Snowflake dialect)
CI/CD GitHub Actions
BI Tableau, Power BI
Packages dbt_utils, dbt_expectations, codegen

📁 Project Structure

snowflake-dbt-retail-analytics/
├── models/
│   ├── staging/                      # Views — clean, rename, cast raw sources
│   │   ├── _sources.yml              # Source definitions + freshness config
│   │   ├── _stg_models.yml           # Staging model schema + tests
│   │   ├── stg_orders.sql
│   │   ├── stg_customers.sql
│   │   ├── stg_products.sql
│   │   ├── stg_order_items.sql
│   │   └── stg_store_events.sql
│   ├── intermediate/                 # Ephemeral — joins + business logic
│   │   ├── _int_models.yml
│   │   ├── int_orders_enriched.sql   # Orders + customer + item aggregations
│   │   └── int_customer_lifetime_value.sql  # LTV + retention segmentation
│   └── marts/
│       ├── core/                     # Tables — star schema, BI-ready
│       │   ├── _core_models.yml
│       │   ├── dim_customers.sql     # Customer dim with LTV enrichment
│       │   ├── dim_products.sql      # Product dim with margin calculations
│       │   ├── dim_dates.sql         # Date spine (dbt_utils.date_spine)
│       │   ├── fct_orders.sql        # Order fact — grain: one row per order
│       │   └── fct_order_items.sql   # Order items fact — grain: one row per line
│       └── marketing/                # Tables — campaign analytics
│           ├── _marketing_models.yml
│           └── mrt_campaign_performance.sql  # CTR, CVR, revenue per click
├── snapshots/
│   └── customer_snapshot.sql         # SCD Type 2 — customer attribute history
├── macros/
│   ├── cents_to_dollars.sql          # Reusable cents → dollars conversion
│   ├── generate_surrogate_key.sql    # Project surrogate key wrapper
│   └── safe_divide.sql               # Null-safe division
├── tests/
│   └── assert_net_amount_positive.sql  # Custom SQL test
├── seeds/                            # Reference / lookup data (CSV)
├── analyses/                         # Ad-hoc analytical queries
├── dbt_project.yml                   # Project config + materialisation strategy
├── packages.yml                      # dbt package dependencies
├── profiles.yml                      # Snowflake connection template
├── requirements.txt
├── .sqlfluff                         # SQL linting rules (Snowflake dialect)
└── .github/workflows/
    └── dbt_ci.yml                    # GitHub Actions: compile + test + lint on PR

⚙️ Key Engineering Decisions

  • Staging as views, marts as tables — staging views always reflect latest source data with zero storage cost; mart tables are pre-aggregated for fast BI query response.
  • Intermediate models as ephemeral — complex join logic runs in-query without creating intermediate Snowflake objects, keeping the warehouse schema clean.
  • dbt_utils.date_spine for dim_dates — generated date dimension from a single config var, eliminating manual CSV maintenance and enabling dynamic range extension.
  • SCD Type 2 via dbt snapshot — customer attribute history (loyalty tier, segment) captured automatically using timestamp strategy — no custom merge logic required.
  • Surrogate keys via dbt_utils.generate_surrogate_key — consistent MD5-based surrogate keys across all dims and facts, decoupled from source system natural keys.
  • Snowflake zero-copy cloning — dev and test environments cloned from prod instantly with no storage duplication cost, enabling safe experimentation.

🚀 Quick Start

# Clone the repo
git clone https://github.com/jesseantony/snowflake-dbt-retail-analytics.git
cd snowflake-dbt-retail-analytics

# Install Python dependencies
pip install -r requirements.txt

# Copy and fill in your Snowflake credentials
cp profiles.yml ~/.dbt/profiles.yml
# Edit ~/.dbt/profiles.yml with your Snowflake account details

# Install dbt packages
dbt deps

# Verify connection
dbt debug

# Run all models
dbt run

# Run tests
dbt test

# Generate and serve docs
dbt docs generate
dbt docs serve

📌 Topics

dbt snowflake data-engineering analytics-engineering star-schema elt python airflow github-actions fivetran data-modelling scd-type-2 sqlfluff

About

End-to-end retail analytics platform on Snowflake + dbt | Staging → Marts · SCD Type 2 · 45% query time reduction · GitHub Actions CI/CD

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors