Skip to content

viveksharma-data-analyst/sql-data-analyst-portfolio

Repository files navigation

Typing SVG






🧭 About This Repository

This repository documents my complete 40-day structured journey learning SQL — from writing my very first SELECT query to building production-grade analytics on real BigQuery public datasets. Every phase contains real-world business problems, hands-on SQL exercises written from scratch, and AI-assisted workflows using ChatGPT, Claude, and Gemini.

The goal: be fully capable of working as a confident, interview-ready Data Analyst by Day 41.


📐 Learning Structure

3 Portfolio Projects  |  7 Phases
Phase Days Focus Level
Phase 1 — Foundations Days 1–5 SELECT, WHERE, ORDER BY, DISTINCT, NULL Handling Beginner → Informed
Phase 2 — Core SQL Mechanics Days 6–10 Aggregations, GROUP BY, HAVING, Date & String Functions Informed → Functional
Phase 3 — JOINs & Subqueries Days 11–15 INNER/LEFT/FULL JOINs, Subqueries, EXISTS, Correlated Functional → Competent
Phase 4 — CTEs & Advanced Filtering Days 16–18 WITH, CASE WHEN, COALESCE, UNION, INTERSECT, EXCEPT Competent → Proficient
Phase 5 — Window Functions Days 19–25 ROW_NUMBER, RANK, LAG/LEAD, NTILE, Running Totals Proficient → Advanced
Phase 6 — Business Analytics Days 26–30 Cohort Analysis, Funnels, Retention, RFM, Revenue Advanced → Professional
Phase 7 — Portfolio Projects Days 31–35 3 End-to-End Projects, GitHub READMEs Professional → Portfolio-Ready

📊 Progress Tracker

Phase Status Progress
Phase 1 — Foundations 🔄 In Progress 2 / 5 days
Phase 2 — Core SQL Mechanics ⏳ Upcoming 0 / 5 days
Phase 3 — JOINs & Subqueries ⏳ Upcoming 0 / 5 days
Phase 4 — CTEs & Advanced Filtering ⏳ Upcoming 0 / 3 days
Phase 5 — Window Functions ⏳ Upcoming 0 / 7 days
Phase 6 — Business Analytics ⏳ Upcoming 0 / 5 days
Phase 7 — Portfolio Projects ⏳ Upcoming 0 / 5 days

🔬 Skills Covered

Phase 1 — Foundations & Environment (Days 1–5)
  • BigQuery Sandbox setup and public dataset access
  • SELECT, FROM, LIMIT — basic row retrieval
  • WHERE with all comparison operators: =, !=, <, >, <=, >=
  • Logic operators: AND, OR, NOT with correct operator precedence
  • BETWEEN for range filtering, IN for set filtering
  • DISTINCT — single column and multi-column unique values
  • ORDER BY — multi-column sorting with mixed ASC/DESC
  • COUNT(*) vs COUNT(column) — critical NULL distinction
  • LIKE with % and _ for pattern matching
  • Arithmetic in SELECT for calculated columns
  • SQL's 9-step logical execution order (the most misunderstood concept)
  • Business-to-SQL translation mental model
Phase 2 — Core SQL Mechanics (Days 6–10)
  • COUNT, SUM, AVG, MIN, MAX with NULL edge cases
  • GROUP BY — single-column and multi-column grouping
  • HAVING vs WHERE — when each runs and why it matters
  • DATE_TRUNC, EXTRACT, DATE_DIFF for time-based analytics
  • UPPER, LOWER, TRIM, CONCAT, SUBSTR, REPLACE for string manipulation
  • IS NULL, IS NOT NULL, COALESCE for safe null handling
  • Monthly revenue trends, category-level aggregations
Phase 3 — JOINs & Subqueries (Days 11–15)
  • INNER JOIN, LEFT JOIN, RIGHT JOIN, FULL OUTER JOIN
  • LEFT JOIN + IS NULL to find unmatched records
  • Multi-table JOIN chains across 3+ tables
  • SELF JOIN for hierarchical and comparative queries
  • CROSS JOIN for combination generation
  • Scalar subqueries in SELECT
  • Derived table subqueries in FROM
  • Subqueries in WHERE with IN and NOT IN
  • Correlated subqueries referencing the outer query
  • EXISTS and NOT EXISTS — and when to prefer them over IN
Phase 4 — CTEs & Advanced Filtering (Days 16–18)
  • WITH Common Table Expressions — single and chained
  • Recursive CTEs for hierarchical data
  • CASE WHEN for conditional labeling and tiering
  • CASE WHEN inside aggregate functions (pivot-style)
  • COALESCE for null-safe output
  • NULLIF to prevent division-by-zero
  • UNION ALL, UNION — stacking result sets
  • INTERSECT, EXCEPT — set algebra on query results
Phase 5 — Window Functions (Days 19–25)
  • ROW_NUMBER() — deduplication, exact top-N per group
  • RANK() and DENSE_RANK() — tie-handling differences explained
  • LAG() and LEAD() — period-over-period calculations
  • NTILE(n) — quartile and percentile segmentation
  • SUM() OVER() — cumulative running totals
  • AVG() OVER() with ROWS BETWEEN — moving averages
  • PARTITION BY + ORDER BY + frame specification combined
  • Top-N per group pattern — most common window function interview question
Phase 6 — Business Analytics with SQL (Days 26–30)
  • Cohort analysis — monthly cohort construction from raw data
  • Multi-step conversion funnel with drop-off rates
  • Day-7 and Day-30 user retention using date arithmetic
  • RFM scoring and customer segment labeling
  • Revenue MoM and YoY comparison using LAG()
  • NULLIF in all percentage calculations (division-by-zero protection)
  • Translating a stakeholder's question into a complete analytics query
Phase 7 — Portfolio Projects (Days 31–35)
  • End-to-end project scoping from a business question
  • Independent dataset, table, and query strategy selection
  • Professional GitHub README writing for non-technical readers
  • 3 distinct business domains across 3 published projects
  • Findings written in plain English for stakeholder presentation

🏆 Portfolio Projects

# Project Domain Key SQL Techniques
E-Commerce Analytics Deep Dive E-Commerce Cohort, Retention, CTEs, Window Functions
Urban Mobility Analytics Transportation EXTRACT, RANK(), Aggregation, Trends
Customer Segmentation & RFM Marketing Analytics NTILE, LAG, RFM Scoring, Multi-CTE

🗃️ Datasets Used

Dataset Domain Phases
bigquery-public-data.thelook_ecommerce E-Commerce Phases 1–3, Projects 1 & 3
bigquery-public-data.chicago_taxi_trips Transportation Phase 2, Project 2
bigquery-public-data.austin_bikeshare Urban Mobility Phase 1, Project 2
bigquery-public-data.chicago_crime Public Safety Phase 1

All datasets are 100% free via Google BigQuery Sandbox — no credit card required.


🛠️ Tools & Technologies

BigQuery PostgreSQL MySQL GitHub ChatGPT Claude Gemini LeetCode HackerRank Notion


💼 Connect With Me

Built with 💙 | Google BigQuery · Standard SQL · AI-Assisted Learning · 2026

About

40-day SQL learning journey — from zero to job-ready. BigQuery, business analytics, window functions, and real projects.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors