SQL Product Analytics + Retention/Funnel Dashboard

Production-style product analytics project for internship interviews. It turns a public retail dataset into a SQL-first analytics system that answers practical questions about acquisition, activation, conversion, retention, monetization, and experiment performance through a FastAPI service and a Streamlit dashboard.

Note: revenue, orders, repeat behavior, and cohorts come from real ecommerce transactions. Acquisition, top-of-funnel stages, and experiment readouts are modeled or simulated from that transaction data and are documented as assumptions, not observed product events.

Why This Project Matters

Product teams need more than raw orders. They need a decision layer that explains where users drop off, which segments are healthiest, and what to test next.
Public datasets rarely include clean product-event funnels, so this project shows how to build an honest analytics mart with explicit assumptions instead of hiding the gaps.
The repo is structured like an application, not a notebook dump: ingest, marts, evaluation, API, dashboard, tests, and public-facing project documentation.

Architecture

flowchart LR
    A[UCI Online Retail II] --> B[Ingestion Pipeline]
    B --> C[(DuckDB Warehouse)]
    C --> D[SQL Models and Views]
    D --> E[FastAPI Metrics Service]
    E --> F[Streamlit Dashboard]
    D --> G[Pytest Data and API Tests]
    C --> H[Evaluation Checks]

Product Questions Answered

Where are users dropping off in the funnel?
Which acquisition channels or segments convert best?
How does retention change by signup cohort?
Which user segments are healthiest or most at risk?
If a simulated product change is tested, what is the likely impact?

Stack

Python
DuckDB
SQL-first marts and views
Pandas only where needed for ingestion and modeled event generation
FastAPI
Streamlit
Plotly
pytest
optional Docker support

Data Source And Assumptions

Primary source:

UCI Online Retail II: transactional retail data with invoices, products, timestamps, customers, and countries

Why a modeled analytics mart is necessary:

The dataset contains orders, not full product events like page visits, signups, activation, or controlled experiments.
To support real product analytics questions, the pipeline creates a reproducible modeled top-of-funnel layer with documented assumptions.

Observed facts:

orders
revenue
customer purchase timing
repeat behavior
geography

Modeled assumptions:

acquisition_channel is inferred from first-order characteristics and timing, not directly observed
top-of-funnel events like visited_site, signed_up, and activated_account are generated deterministically from observed customer cohorts plus synthetic non-converting prospects
experiment results are simulated from the same cohort mix to demonstrate a realistic A/B readout without pretending the raw dataset contains one

This tradeoff is deliberate and interview-friendly: the repo is honest about what is observed vs inferred.

Repo Structure

src/sql_product_analytics/
  pipeline/      # ingest, marts, evaluation
  analytics/     # shared service logic
  api/           # FastAPI app and schemas
  dashboard/     # Streamlit client app
sql/             # actual SQL model files
tests/           # metrics, API, CLI coverage
data/            # raw, staging, warehouse

Key SQL Models

Example:

SELECT
    cohort_date,
    retention_day,
    cohort_size,
    active_customers,
    ROUND(
        CAST(active_customers AS DOUBLE) / NULLIF(cohort_size, 0),
        4
    ) AS retention_rate
FROM cohort_retention
ORDER BY cohort_date, retention_day;

Quickstart

python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"

Run the full pipeline:

sql-product-analytics run-all

Run from the included fixture instead of downloading the public dataset:

sql-product-analytics run-all --source-path tests/fixtures/sample_online_retail.csv

Serve the API:

sql-product-analytics serve-api --host 127.0.0.1 --port 8000

Serve the dashboard:

sql-product-analytics serve-dashboard --api-base-url http://127.0.0.1:8000

Run tests:

pytest

CLI Commands

ingest
build-marts
evaluate
run-all
serve-api
serve-dashboard

API Endpoints

GET /health
GET /kpis/overview
GET /kpis/trends
GET /funnel
GET /funnel/by-segment
GET /retention/cohorts
GET /segments/health
GET /experiments/summary
GET /insights/recommendations

Example:

curl "http://127.0.0.1:8000/kpis/overview"

Example overview response excerpt from the included fixture:

{
  "total_orders": 10,
  "purchasing_customers": 5,
  "total_revenue": 389.0,
  "avg_order_value": 38.9,
  "repeat_customer_rate": 0.8,
  "signup_rate": 0.7714,
  "activation_rate": 0.8148,
  "purchase_conversion_rate": 0.1852,
  "strongest_segment": "Email / Core"
}

Sample Business Insights

From the fixture-backed warehouse:

Funnel volume falls from 35 modeled visitors to 27 signups, then to 22 activated users, with only 5 observed purchasers. That makes purchase conversion the biggest downstream bottleneck.
The strongest sample segment is Email / Core, while Email / At Risk New and Email / Emerging are flagged as fragile due to low D30 retention or high inactivity share.
The simulated onboarding experiment improves activation meaningfully, but purchase lift is weak in the sample. That supports more testing, not a blind rollout.

Dashboard Pages

Executive Overview: KPIs, revenue trends, strongest segments
Funnel Explorer: stage counts and conversion by segment type
Retention & Cohorts: cohort heatmap plus D1/D7/D30 summaries
Experiment / Insights: simulated experiment readout and business recommendations

Evaluation Checks

The evaluate command runs:

raw row count sanity
customer dimension integrity
null-customer checks
funnel monotonicity
retention bounds
experiment variant coverage

Limitations

Acquisition channels are inferred, not observed.
The source is transactional, so top-of-funnel and experiment layers are modeled.
Retention is purchase-based retention, not session-level engagement.
This is designed for local reproducibility and interview clarity, not warehouse-scale orchestration.

Interview Talking Points

Why SQL-first marts are the center of the project
How you handled missing event-level data without pretending it existed
How cohort definitions and repeat behavior were implemented
Why the API-backed dashboard is better than reading raw files directly in Streamlit
Where the business recommendations come from and where they stop being causal claims

Deployment Notes

Optional Docker files are included via the lightweight Dockerfile, but they should be treated as manual setup and not as a verified deployment path from this repo pass. For production-style hosting, run the API and dashboard as separate services against the same DuckDB volume or swap DuckDB for a Postgres-backed adapter in v2.

Screenshots

Real browser screenshots captured from the running local app:

To refresh them locally, capture the Executive Overview and Retention & Cohorts pages after starting the API and dashboard.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
data		data
sql		sql
src/sql_product_analytics		src/sql_product_analytics
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SQL Product Analytics + Retention/Funnel Dashboard

Why This Project Matters

Architecture

Product Questions Answered

Stack

Data Source And Assumptions

Repo Structure

Key SQL Models

Quickstart

CLI Commands

API Endpoints

Sample Business Insights

Dashboard Pages

Evaluation Checks

Limitations

Interview Talking Points

Deployment Notes

Screenshots

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SQL Product Analytics + Retention/Funnel Dashboard

Why This Project Matters

Architecture

Product Questions Answered

Stack

Data Source And Assumptions

Repo Structure

Key SQL Models

Quickstart

CLI Commands

API Endpoints

Sample Business Insights

Dashboard Pages

Evaluation Checks

Limitations

Interview Talking Points

Deployment Notes

Screenshots

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages