Dataiku Demo — Customer Analytics Pipeline

End-to-end customer analytics pipeline that ingests Snowflake data into Dataiku DSS, computes RFM scores, CLV estimates, and churn risk, writes results back to Snowflake, and is mirrored on Databricks with validated parity.

Dataiku Flow

Architecture

Snowflake (DEV.DATAIKU_DEMO)
  ├── CUSTOMERS        (1,000 rows)
  └── TRANSACTIONS     (8,000 rows)
          │
          ▼  Dataiku DSS (DEMO project)
  ┌───────────────────────────────────────────┐
  │  [Shaker]  filter STATUS = 'completed'    │
  │      → transactions_completed             │
  │                                           │
  │  [Join]    LEFT JOIN on CUSTOMER_ID       │
  │      → customer_transactions_joined       │
  │                                           │
  │  [Python]  RFM + CLV + Churn analytics   │
  │      → CUSTOMER_ANALYTICS_OUTPUT          │
  └───────────────────────────────────────────┘
          │
          ▼
  Snowflake  DEV.DATAIKU_DEMO.CUSTOMER_ANALYTICS_OUTPUT
  Databricks dev.dataiku_demo.customer_analytics_output  ← migrated, parity verified

Parity validation with Datafold

Parity was validated using Datafold — a data reliability platform that runs cross-database diffs at scale using bisection hashing.

Datadiff run: https://app.datafold.com/datadiffs/13857162
Algorithm: bisection hash on CUSTOMER_ID
Result: 0 differences across all 1,000 rows

The validate_parity.py script uses the same open-source data-diff library that powers Datafold cloud.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataiku Demo — Customer Analytics Pipeline

Dataiku Flow

Architecture

Parity validation with Datafold

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Dataiku Demo — Customer Analytics Pipeline

Dataiku Flow

Architecture

Parity validation with Datafold