
This repository documents my complete 40-day structured journey learning SQL — from writing my very first SELECT query to building production-grade analytics on real BigQuery public datasets. Every phase contains real-world business problems, hands-on SQL exercises written from scratch, and AI-assisted workflows using ChatGPT, Claude, and Gemini.
The goal: be fully capable of working as a confident, interview-ready Data Analyst by Day 41.
3 Portfolio Projects | 7 Phases
| Phase |
Days |
Focus |
Level |
| Phase 1 — Foundations |
Days 1–5 |
SELECT, WHERE, ORDER BY, DISTINCT, NULL Handling |
Beginner → Informed |
| Phase 2 — Core SQL Mechanics |
Days 6–10 |
Aggregations, GROUP BY, HAVING, Date & String Functions |
Informed → Functional |
| Phase 3 — JOINs & Subqueries |
Days 11–15 |
INNER/LEFT/FULL JOINs, Subqueries, EXISTS, Correlated |
Functional → Competent |
| Phase 4 — CTEs & Advanced Filtering |
Days 16–18 |
WITH, CASE WHEN, COALESCE, UNION, INTERSECT, EXCEPT |
Competent → Proficient |
| Phase 5 — Window Functions |
Days 19–25 |
ROW_NUMBER, RANK, LAG/LEAD, NTILE, Running Totals |
Proficient → Advanced |
| Phase 6 — Business Analytics |
Days 26–30 |
Cohort Analysis, Funnels, Retention, RFM, Revenue |
Advanced → Professional |
| Phase 7 — Portfolio Projects |
Days 31–35 |
3 End-to-End Projects, GitHub READMEs |
Professional → Portfolio-Ready |
| Phase |
Status |
Progress |
| Phase 1 — Foundations |
🔄 In Progress |
2 / 5 days |
| Phase 2 — Core SQL Mechanics |
⏳ Upcoming |
0 / 5 days |
| Phase 3 — JOINs & Subqueries |
⏳ Upcoming |
0 / 5 days |
| Phase 4 — CTEs & Advanced Filtering |
⏳ Upcoming |
0 / 3 days |
| Phase 5 — Window Functions |
⏳ Upcoming |
0 / 7 days |
| Phase 6 — Business Analytics |
⏳ Upcoming |
0 / 5 days |
| Phase 7 — Portfolio Projects |
⏳ Upcoming |
0 / 5 days |
Phase 1 — Foundations & Environment (Days 1–5)
- BigQuery Sandbox setup and public dataset access
SELECT, FROM, LIMIT — basic row retrieval
WHERE with all comparison operators: =, !=, <, >, <=, >=
- Logic operators:
AND, OR, NOT with correct operator precedence
BETWEEN for range filtering, IN for set filtering
DISTINCT — single column and multi-column unique values
ORDER BY — multi-column sorting with mixed ASC/DESC
COUNT(*) vs COUNT(column) — critical NULL distinction
LIKE with % and _ for pattern matching
- Arithmetic in
SELECT for calculated columns
- SQL's 9-step logical execution order (the most misunderstood concept)
- Business-to-SQL translation mental model
Phase 2 — Core SQL Mechanics (Days 6–10)
COUNT, SUM, AVG, MIN, MAX with NULL edge cases
GROUP BY — single-column and multi-column grouping
HAVING vs WHERE — when each runs and why it matters
DATE_TRUNC, EXTRACT, DATE_DIFF for time-based analytics
UPPER, LOWER, TRIM, CONCAT, SUBSTR, REPLACE for string manipulation
IS NULL, IS NOT NULL, COALESCE for safe null handling
- Monthly revenue trends, category-level aggregations
Phase 3 — JOINs & Subqueries (Days 11–15)
INNER JOIN, LEFT JOIN, RIGHT JOIN, FULL OUTER JOIN
LEFT JOIN + IS NULL to find unmatched records
- Multi-table JOIN chains across 3+ tables
SELF JOIN for hierarchical and comparative queries
CROSS JOIN for combination generation
- Scalar subqueries in
SELECT
- Derived table subqueries in
FROM
- Subqueries in
WHERE with IN and NOT IN
- Correlated subqueries referencing the outer query
EXISTS and NOT EXISTS — and when to prefer them over IN
Phase 4 — CTEs & Advanced Filtering (Days 16–18)
WITH Common Table Expressions — single and chained
- Recursive CTEs for hierarchical data
CASE WHEN for conditional labeling and tiering
CASE WHEN inside aggregate functions (pivot-style)
COALESCE for null-safe output
NULLIF to prevent division-by-zero
UNION ALL, UNION — stacking result sets
INTERSECT, EXCEPT — set algebra on query results
Phase 5 — Window Functions (Days 19–25)
ROW_NUMBER() — deduplication, exact top-N per group
RANK() and DENSE_RANK() — tie-handling differences explained
LAG() and LEAD() — period-over-period calculations
NTILE(n) — quartile and percentile segmentation
SUM() OVER() — cumulative running totals
AVG() OVER() with ROWS BETWEEN — moving averages
PARTITION BY + ORDER BY + frame specification combined
- Top-N per group pattern — most common window function interview question
Phase 6 — Business Analytics with SQL (Days 26–30)
- Cohort analysis — monthly cohort construction from raw data
- Multi-step conversion funnel with drop-off rates
- Day-7 and Day-30 user retention using date arithmetic
- RFM scoring and customer segment labeling
- Revenue MoM and YoY comparison using
LAG()
NULLIF in all percentage calculations (division-by-zero protection)
- Translating a stakeholder's question into a complete analytics query
Phase 7 — Portfolio Projects (Days 31–35)
- End-to-end project scoping from a business question
- Independent dataset, table, and query strategy selection
- Professional GitHub README writing for non-technical readers
- 3 distinct business domains across 3 published projects
- Findings written in plain English for stakeholder presentation
| # |
Project |
Domain |
Key SQL Techniques |
| ⭐ |
E-Commerce Analytics Deep Dive |
E-Commerce |
Cohort, Retention, CTEs, Window Functions |
| ⭐ |
Urban Mobility Analytics |
Transportation |
EXTRACT, RANK(), Aggregation, Trends |
| ⭐ |
Customer Segmentation & RFM |
Marketing Analytics |
NTILE, LAG, RFM Scoring, Multi-CTE |
| Dataset |
Domain |
Phases |
bigquery-public-data.thelook_ecommerce |
E-Commerce |
Phases 1–3, Projects 1 & 3 |
bigquery-public-data.chicago_taxi_trips |
Transportation |
Phase 2, Project 2 |
bigquery-public-data.austin_bikeshare |
Urban Mobility |
Phase 1, Project 2 |
bigquery-public-data.chicago_crime |
Public Safety |
Phase 1 |
All datasets are 100% free via Google BigQuery Sandbox — no credit card required.

Built with 💙 | Google BigQuery · Standard SQL · AI-Assisted Learning · 2026