Skip to content

Latest commit

 

History

History
50 lines (38 loc) · 1.86 KB

File metadata and controls

50 lines (38 loc) · 1.86 KB

Dataiku Demo — Customer Analytics Pipeline

End-to-end customer analytics pipeline that ingests Snowflake data into Dataiku DSS, computes RFM scores, CLV estimates, and churn risk, writes results back to Snowflake, and is mirrored on Databricks with validated parity.


Dataiku Flow

Dataiku Demo Flow

Architecture

Snowflake (DEV.DATAIKU_DEMO)
  ├── CUSTOMERS        (1,000 rows)
  └── TRANSACTIONS     (8,000 rows)
          │
          ▼  Dataiku DSS (DEMO project)
  ┌───────────────────────────────────────────┐
  │  [Shaker]  filter STATUS = 'completed'    │
  │      → transactions_completed             │
  │                                           │
  │  [Join]    LEFT JOIN on CUSTOMER_ID       │
  │      → customer_transactions_joined       │
  │                                           │
  │  [Python]  RFM + CLV + Churn analytics   │
  │      → CUSTOMER_ANALYTICS_OUTPUT          │
  └───────────────────────────────────────────┘
          │
          ▼
  Snowflake  DEV.DATAIKU_DEMO.CUSTOMER_ANALYTICS_OUTPUT
  Databricks dev.dataiku_demo.customer_analytics_output  ← migrated, parity verified

Parity validation with Datafold

Parity was validated using Datafold — a data reliability platform that runs cross-database diffs at scale using bisection hashing.

The validate_parity.py script uses the same open-source data-diff library that powers Datafold cloud.