|
1 | | - # MLOps SQL Project 🚀 |
| 1 | +# Business SQL Analytics — PostgreSQL · Python · Tableau · Excel |
2 | 2 |
|
3 | | -      |
| 3 | + |
| 4 | + |
| 5 | + |
| 6 | + |
| 7 | + |
| 8 | + |
| 9 | + |
4 | 10 |
|
5 | 11 | --- |
6 | 12 |
|
7 | | -## Overview |
| 13 | +## What This Project Does |
8 | 14 |
|
9 | | -This repository provides a **structured MLOps project** that integrates **SQL, Python, Tableau**, and **Excel-based analysis**. |
10 | | -It includes **SQL data queries**, **Python scripts for automation**, and **Tableau dashboards for visualization**. |
| 15 | +End-to-end business analytics pipeline built on a simulated retail dataset: **customers, orders, products, and transactions** across 4 normalized tables. |
| 16 | + |
| 17 | +**26+ SQL queries** organized from data validation → aggregations → transaction analysis → multi-table joins, with Python automation for data generation and Tableau dashboards for stakeholder reporting. |
| 18 | + |
| 19 | +**Pipeline:** `PostgreSQL → SQL Analytics → Python Export → Tableau + Excel` |
11 | 20 |
|
12 | 21 | --- |
13 | 22 |
|
14 | | -## Project Structure |
| 23 | +## Business Questions & Findings |
15 | 24 |
|
16 | | -```plaintext |
17 | | -mlops_sql_project/ |
18 | | -├── env/ # Environment Configuration |
19 | | -│ ├── .env # Environment variables (e.g. DB credentials) |
20 | | -│ ├── db_config.yaml # Database configuration (host, port, user, etc.) |
21 | | -│ ├── logging_config.yaml # Logging configuration (for Python logs) |
22 | | -│ └── settings.json # General project settings |
23 | | -│ |
24 | | -├── Excel/ # Excel-based Analysis (exported reports or manual exploration) |
25 | | -│ ├── transactions_overview.xlsx |
26 | | -│ ├── sales_summary.xlsx |
27 | | -│ ├── customer_behavior.xlsx |
28 | | -│ |
29 | | -├── Tableau/ # Tableau Dashboards & Reports |
30 | | -│ ├── sales_dashboard.twb # KPI dashboard for sales |
31 | | -│ ├── transaction_analysis.twb # Visualization of transaction flow |
32 | | -│ ├── customer_insights.twb # Customer segmentation & behavior |
33 | | -│ |
34 | | -├── python/ # Data Generation & Automation |
35 | | -│ ├── generate_customers.py # Generates random customer data |
36 | | -│ ├── generate_orders.py # Generates random orders |
37 | | -│ ├── generate_products.py # Generates product catalog |
38 | | -│ ├── generate_transactions.py # Simulates transaction history |
39 | | -│ ├── transactions_overview.py # Generates Excel summary |
40 | | -│ ├── sales_summary.py # KPIs for Tableau dashboard |
41 | | -│ ├── customer_behavior.py # Behavior analysis summary |
42 | | -│ |
43 | | -├── sql/ # Structured SQL Queries |
44 | | -│ ├── ddl/ # Schema definition (Create, Constraints) |
45 | | -│ │ ├── 01_create_database.sql |
46 | | -│ │ ├── 02_create_tables.sql |
47 | | -│ │ ├── 03_constraints.sql |
48 | | -│ │ |
49 | | -│ ├── dml/ # Data Manipulation (Insert, Update, Delete) |
50 | | -│ │ ├── 00_truncate_tables.sql |
51 | | -│ │ |
52 | | -│ ├── dql/ # Queries & Analysis |
53 | | -│ │ ├── a_checks/ # Data Validation & Structure |
54 | | -│ │ │ ├── 01_check_constraints.sql |
55 | | -│ │ │ ├── 02_check_all_foreign_keys.sql |
56 | | -│ │ │ ├── 03_check_table_dependencies.sql |
57 | | -│ │ │ ├── 04_check_indexes_primary_keys.sql |
58 | | -│ │ │ ├── 05_check_privileges.sql |
59 | | -│ │ │ └── 06_null_value_check.sql # Check for NULLs in key columns |
60 | | -│ │ │ |
61 | | -│ │ ├── b_aggregations/ # Aggregation & Statistical Analysis |
62 | | -│ │ │ ├── 06_table_counts.sql |
63 | | -│ │ │ ├── 07_check_total_records.sql |
64 | | -│ │ │ ├── 08_counts_the_number_of_products.sql |
65 | | -│ │ │ ├── 09_min_max_and_average_price.sql |
66 | | -│ │ │ ├── 10_stock_statistics.sql |
67 | | -│ │ │ └── 11_sales_by_category.sql |
68 | | -│ │ │ |
69 | | -│ │ ├── c_transactions/ # Orders & Transactions |
70 | | -│ │ │ ├── 11_top_expensive_orders.sql |
71 | | -│ │ │ ├── 12_orders_by_month.sql |
72 | | -│ │ │ ├── 13_random_orders_check.sql |
73 | | -│ │ │ ├── 14_customers_orders_join.sql |
74 | | -│ │ │ ├── 15_transaction_amount_summary.sql |
75 | | -│ │ │ ├── 16_transactions_by_payment.sql |
76 | | -│ │ │ ├── 17_daily_transaction_volume.sql |
77 | | -│ │ │ ├── 18_top_10_biggest_transactions.sql |
78 | | -│ │ │ ├── 19_top_10_biggest_customers.sql |
79 | | -│ │ │ └── 20_avg_transaction_per_customer.sql |
80 | | -│ │ │ |
81 | | -│ │ ├── d_joins/ # Multi-table Joins & Relationships |
82 | | -│ │ │ ├── 20_join_customers_orders_products.sql |
83 | | -│ │ │ ├── 21_join_orders_transactions.sql |
84 | | -│ │ │ ├── 22_top_10_customers_by_spent.sql |
85 | | -│ │ │ ├── 23_avg_order_value.sql |
86 | | -│ │ │ ├── 24_returning_customers.sql |
87 | | -│ │ │ ├── 25_bonus.sql |
88 | | -│ │ │ ├── 26_cleaned_bonus.sql |
89 | | -│ |
90 | | -├── environment.yaml # Conda environment setup |
91 | | -├── requirements.txt # pip packages (for production or alt install) |
92 | | -├── .gitignore # Git exclusions |
93 | | -├── LICENSE # Project License (e.g., MIT) |
94 | | -├── README.md # Project Documentation |
| 25 | +| Question | Approach | |
| 26 | +|----------|----------| |
| 27 | +| Who are the top 10 customers by revenue? | Multi-table JOIN + ORDER BY total spend | |
| 28 | +| Which payment methods dominate transactions? | GROUP BY payment_method with % share | |
| 29 | +| What is the monthly transaction volume trend? | DATE_TRUNC + LAG window function | |
| 30 | +| Which product categories drive most sales? | JOIN products × transactions + aggregation | |
| 31 | +| How many customers are returning vs one-time? | Subquery filtering order count > 1 | |
| 32 | + |
| 33 | +--- |
| 34 | + |
| 35 | +## SQL — Advanced Patterns |
| 36 | + |
| 37 | +### Window Function: Customer Revenue Ranking |
| 38 | + |
| 39 | +```sql |
| 40 | +SELECT |
| 41 | + c.customer_name, |
| 42 | + SUM(t.amount) AS total_spent, |
| 43 | + RANK() OVER (ORDER BY SUM(t.amount) DESC) AS revenue_rank, |
| 44 | + ROUND(SUM(t.amount) / SUM(SUM(t.amount)) OVER () * 100, 1) AS pct_of_total |
| 45 | +FROM customers c |
| 46 | +JOIN transactions t ON c.customer_id = t.customer_id |
| 47 | +GROUP BY c.customer_name |
| 48 | +ORDER BY revenue_rank; |
95 | 49 | ``` |
96 | 50 |
|
97 | | -## Features |
| 51 | +### CTE: Monthly Revenue with Month-over-Month Change |
| 52 | + |
| 53 | +```sql |
| 54 | +WITH monthly_revenue AS ( |
| 55 | + SELECT |
| 56 | + DATE_TRUNC('month', transaction_date) AS month, |
| 57 | + SUM(amount) AS revenue, |
| 58 | + COUNT(DISTINCT customer_id) AS active_customers |
| 59 | + FROM transactions |
| 60 | + GROUP BY 1 |
| 61 | +) |
| 62 | +SELECT |
| 63 | + month, |
| 64 | + revenue, |
| 65 | + active_customers, |
| 66 | + ROUND(revenue - LAG(revenue) OVER (ORDER BY month), 0) AS mom_change |
| 67 | +FROM monthly_revenue |
| 68 | +ORDER BY month; |
| 69 | +``` |
98 | 70 |
|
99 | | -✅ **SQL-Powered Analytics** with structured queries and joins. |
100 | | -✅ **Excel + Tableau** for reporting and visualization. |
101 | | -✅ **Python Automation** for data generation and preprocessing. |
102 | | -✅ **Scalable Architecture** for BI & Data Analysis. |
| 71 | +### Multi-Table JOIN: Full Customer Order Profile |
| 72 | + |
| 73 | +```sql |
| 74 | +SELECT |
| 75 | + c.customer_name, |
| 76 | + COUNT(DISTINCT o.order_id) AS total_orders, |
| 77 | + SUM(t.amount) AS total_spent, |
| 78 | + ROUND(AVG(t.amount), 2) AS avg_transaction, |
| 79 | + MAX(t.transaction_date) AS last_purchase |
| 80 | +FROM customers c |
| 81 | +JOIN orders o ON c.customer_id = o.customer_id |
| 82 | +JOIN transactions t ON o.order_id = t.order_id |
| 83 | +GROUP BY c.customer_name |
| 84 | +ORDER BY total_spent DESC |
| 85 | +LIMIT 10; |
| 86 | +``` |
103 | 87 |
|
104 | 88 | --- |
105 | 89 |
|
106 | | -## Tech Stack |
| 90 | +## Project Structure |
107 | 91 |
|
108 | | -- **PostgreSQL** – SQL-based data storage & queries |
109 | | -- **Tableau** – Interactive dashboards and reporting |
110 | | -- **Excel** – Static reports and aggregated insights |
111 | | -- **Python** – Data automation and preprocessing |
112 | | -- **GitHub Actions** – CI/CD for automation |
| 92 | +``` |
| 93 | +mlops_sql_project/ |
| 94 | +├── sql/ |
| 95 | +│ ├── ddl/ # schema, tables, constraints |
| 96 | +│ ├── dml/ # data inserts & resets |
| 97 | +│ └── dql/ |
| 98 | +│ ├── a_checks/ # data validation (nulls, FK, indexes) |
| 99 | +│ ├── b_aggregations/ # counts, min/max, category stats |
| 100 | +│ ├── c_transactions/ # order & payment analysis |
| 101 | +│ └── d_joins/ # multi-table joins, top customers |
| 102 | +├── python/ |
| 103 | +│ ├── generate_customers.py |
| 104 | +│ ├── generate_orders.py |
| 105 | +│ ├── generate_products.py |
| 106 | +│ ├── generate_transactions.py |
| 107 | +│ ├── sales_summary.py |
| 108 | +│ └── customer_behavior.py |
| 109 | +├── Tableau/ |
| 110 | +│ ├── sales_dashboard.twb |
| 111 | +│ ├── transaction_analysis.twb |
| 112 | +│ └── customer_insights.twb |
| 113 | +├── excel/ |
| 114 | +│ ├── sales_summary.xlsx |
| 115 | +│ ├── transactions_overview.xlsx |
| 116 | +│ └── customer_behavior.xlsx |
| 117 | +└── env/ # DB config, logging, settings |
| 118 | +``` |
113 | 119 |
|
114 | 120 | --- |
115 | 121 |
|
116 | | -## Setup & Installation |
| 122 | +## Architecture |
117 | 123 |
|
118 | | -### 1️⃣ Clone the repository |
119 | | - |
120 | | -```bash |
121 | | -git clone https://github.com/your-username/mlops_sql_project.git |
122 | | -cd mlops_sql_project |
| 124 | +``` |
| 125 | +PostgreSQL (4 tables: customers, orders, products, transactions) |
| 126 | + └── SQL layer: DDL → DML → DQL (26+ queries) |
| 127 | + └── Python automation (data generation + export) |
| 128 | + ├── Tableau dashboards (sales, transactions, customers) |
| 129 | + └── Excel reports (summary exports) |
123 | 130 | ``` |
124 | 131 |
|
| 132 | +--- |
125 | 133 |
|
| 134 | +## How to Run |
126 | 135 |
|
127 | | -2️⃣ Create a virtual environment (Optional) |
128 | 136 | ```bash |
| 137 | +# 1. Clone and set up environment |
| 138 | +git clone https://github.com/evgeniimatveev/business-sql-analytics.git |
| 139 | +cd business-sql-analytics |
129 | 140 | conda env create -f environment.yaml |
130 | 141 | conda activate mlops_env |
131 | | -``` |
132 | | -```bash |
133 | | -python -m venv venv |
134 | | -source venv/bin/activate # On macOS/Linux |
135 | | -venv\Scripts\activate # On Windows |
136 | | -pip install -r requirements.txt |
137 | | -``` |
138 | | - |
139 | | ---- |
140 | | -## Future Plans |
141 | | -✅ Advanced SQL optimization |
142 | | -✅ Improved Tableau dashboards |
143 | | -✅ CI/CD for SQL workflow automation |
144 | 142 |
|
| 143 | +# 2. Configure DB credentials |
| 144 | +cp env/.env.example env/.env |
145 | 145 |
|
146 | | ---- |
| 146 | +# 3. Generate synthetic data |
| 147 | +python python/generate_customers.py |
| 148 | +python python/generate_orders.py |
| 149 | +python python/generate_products.py |
| 150 | +python python/generate_transactions.py |
147 | 151 |
|
148 | | -## 📜 License |
149 | | -This project is distributed under the **MIT License**. Feel free to use the code! 🚀 |
| 152 | +# 4. Run SQL analytics |
| 153 | +# Open sql/dql/ queries in DBeaver or psql |
| 154 | +``` |
150 | 155 |
|
151 | 156 | --- |
152 | 157 |
|
153 | | -## 📢 Stay Connected! |
154 | | -💻 **GitHub Repository:** [Evgenii Matveev](https://github.com/evgeniimatveev) |
155 | | -🌐 **Portfolio:** [Data Science Portfolio](https://www.datascienceportfol.io/evgeniimatveevusa) |
156 | | -📌 **LinkedIn:** [Evgenii Matveev](https://www.linkedin.com/in/evgenii-matveev-510926276/) |
| 158 | +## Stack |
157 | 159 |
|
| 160 | +| Layer | Technology | |
| 161 | +|-------|-----------| |
| 162 | +| Database | PostgreSQL | |
| 163 | +| Analytics | SQL (CTEs, Window Functions, Multi-table JOINs) | |
| 164 | +| Automation | Python (Faker, pandas, psycopg2) | |
| 165 | +| Visualization | Tableau | |
| 166 | +| Reporting | Excel | |
| 167 | +| CI/CD | GitHub Actions | |
158 | 168 |
|
159 | 169 | --- |
160 | 170 |
|
161 | | -🔥 **If you like this project, don't forget to star ⭐ the repository!** 🔥 |
| 171 | +## Connect |
| 172 | + |
| 173 | +- GitHub: [evgeniimatveev](https://github.com/evgeniimatveev) |
| 174 | +- Portfolio: [datascienceportfol.io/evgeniimatveevusa](https://www.datascienceportfol.io/evgeniimatveevusa) |
| 175 | +- LinkedIn: [Evgenii Matveev](https://www.linkedin.com/in/evgenii-matveev-510926276/) |
0 commit comments