Skip to content

Commit d3719ad

Browse files
docs: rewrite README with advanced SQL examples, findings table, clean structure
1 parent 3c5d45b commit d3719ad

1 file changed

Lines changed: 138 additions & 124 deletions

File tree

README.md

Lines changed: 138 additions & 124 deletions
Original file line numberDiff line numberDiff line change
@@ -1,161 +1,175 @@
1-
# MLOps SQL Project 🚀
1+
# Business SQL Analytics — PostgreSQL · Python · Tableau · Excel
22

3-
![SQL](https://img.shields.io/badge/SQL-PostgreSQL-blue) ![Tableau](https://img.shields.io/badge/Tableau-Visualization-orange) ![Excel](https://img.shields.io/badge/Excel-Reports-green) ![Python](https://img.shields.io/badge/Python-Automation-yellow) ![Status](https://img.shields.io/badge/Status-Active-brightgreen) ![License](https://img.shields.io/badge/License-MIT-lightgrey)
3+
![SQL](https://img.shields.io/badge/SQL-PostgreSQL-blue?logo=postgresql&logoColor=white)
4+
![Python](https://img.shields.io/badge/Python-Automation-yellow?logo=python&logoColor=black)
5+
![Tableau](https://img.shields.io/badge/Tableau-Visualization-orange?logo=tableau&logoColor=white)
6+
![Excel](https://img.shields.io/badge/Excel-Reports-green?logo=microsoftexcel&logoColor=white)
7+
![CI/CD](https://img.shields.io/badge/GitHub_Actions-CI%2FCD-black?logo=githubactions&logoColor=white)
8+
![License](https://img.shields.io/badge/License-MIT-lightgrey)
9+
![Status](https://img.shields.io/badge/Status-Active-brightgreen)
410

511
---
612

7-
## Overview
13+
## What This Project Does
814

9-
This repository provides a **structured MLOps project** that integrates **SQL, Python, Tableau**, and **Excel-based analysis**.
10-
It includes **SQL data queries**, **Python scripts for automation**, and **Tableau dashboards for visualization**.
15+
End-to-end business analytics pipeline built on a simulated retail dataset: **customers, orders, products, and transactions** across 4 normalized tables.
16+
17+
**26+ SQL queries** organized from data validation → aggregations → transaction analysis → multi-table joins, with Python automation for data generation and Tableau dashboards for stakeholder reporting.
18+
19+
**Pipeline:** `PostgreSQL → SQL Analytics → Python Export → Tableau + Excel`
1120

1221
---
1322

14-
## Project Structure
23+
## Business Questions & Findings
1524

16-
```plaintext
17-
mlops_sql_project/
18-
├── env/ # Environment Configuration
19-
│ ├── .env # Environment variables (e.g. DB credentials)
20-
│ ├── db_config.yaml # Database configuration (host, port, user, etc.)
21-
│ ├── logging_config.yaml # Logging configuration (for Python logs)
22-
│ └── settings.json # General project settings
23-
24-
├── Excel/ # Excel-based Analysis (exported reports or manual exploration)
25-
│ ├── transactions_overview.xlsx
26-
│ ├── sales_summary.xlsx
27-
│ ├── customer_behavior.xlsx
28-
29-
├── Tableau/ # Tableau Dashboards & Reports
30-
│ ├── sales_dashboard.twb # KPI dashboard for sales
31-
│ ├── transaction_analysis.twb # Visualization of transaction flow
32-
│ ├── customer_insights.twb # Customer segmentation & behavior
33-
34-
├── python/ # Data Generation & Automation
35-
│ ├── generate_customers.py # Generates random customer data
36-
│ ├── generate_orders.py # Generates random orders
37-
│ ├── generate_products.py # Generates product catalog
38-
│ ├── generate_transactions.py # Simulates transaction history
39-
│ ├── transactions_overview.py # Generates Excel summary
40-
│ ├── sales_summary.py # KPIs for Tableau dashboard
41-
│ ├── customer_behavior.py # Behavior analysis summary
42-
43-
├── sql/ # Structured SQL Queries
44-
│ ├── ddl/ # Schema definition (Create, Constraints)
45-
│ │ ├── 01_create_database.sql
46-
│ │ ├── 02_create_tables.sql
47-
│ │ ├── 03_constraints.sql
48-
│ │
49-
│ ├── dml/ # Data Manipulation (Insert, Update, Delete)
50-
│ │ ├── 00_truncate_tables.sql
51-
│ │
52-
│ ├── dql/ # Queries & Analysis
53-
│ │ ├── a_checks/ # Data Validation & Structure
54-
│ │ │ ├── 01_check_constraints.sql
55-
│ │ │ ├── 02_check_all_foreign_keys.sql
56-
│ │ │ ├── 03_check_table_dependencies.sql
57-
│ │ │ ├── 04_check_indexes_primary_keys.sql
58-
│ │ │ ├── 05_check_privileges.sql
59-
│ │ │ └── 06_null_value_check.sql # Check for NULLs in key columns
60-
│ │ │
61-
│ │ ├── b_aggregations/ # Aggregation & Statistical Analysis
62-
│ │ │ ├── 06_table_counts.sql
63-
│ │ │ ├── 07_check_total_records.sql
64-
│ │ │ ├── 08_counts_the_number_of_products.sql
65-
│ │ │ ├── 09_min_max_and_average_price.sql
66-
│ │ │ ├── 10_stock_statistics.sql
67-
│ │ │ └── 11_sales_by_category.sql
68-
│ │ │
69-
│ │ ├── c_transactions/ # Orders & Transactions
70-
│ │ │ ├── 11_top_expensive_orders.sql
71-
│ │ │ ├── 12_orders_by_month.sql
72-
│ │ │ ├── 13_random_orders_check.sql
73-
│ │ │ ├── 14_customers_orders_join.sql
74-
│ │ │ ├── 15_transaction_amount_summary.sql
75-
│ │ │ ├── 16_transactions_by_payment.sql
76-
│ │ │ ├── 17_daily_transaction_volume.sql
77-
│ │ │ ├── 18_top_10_biggest_transactions.sql
78-
│ │ │ ├── 19_top_10_biggest_customers.sql
79-
│ │ │ └── 20_avg_transaction_per_customer.sql
80-
│ │ │
81-
│ │ ├── d_joins/ # Multi-table Joins & Relationships
82-
│ │ │ ├── 20_join_customers_orders_products.sql
83-
│ │ │ ├── 21_join_orders_transactions.sql
84-
│ │ │ ├── 22_top_10_customers_by_spent.sql
85-
│ │ │ ├── 23_avg_order_value.sql
86-
│ │ │ ├── 24_returning_customers.sql
87-
│ │ │ ├── 25_bonus.sql
88-
│ │ │ ├── 26_cleaned_bonus.sql
89-
90-
├── environment.yaml # Conda environment setup
91-
├── requirements.txt # pip packages (for production or alt install)
92-
├── .gitignore # Git exclusions
93-
├── LICENSE # Project License (e.g., MIT)
94-
├── README.md # Project Documentation
25+
| Question | Approach |
26+
|----------|----------|
27+
| Who are the top 10 customers by revenue? | Multi-table JOIN + ORDER BY total spend |
28+
| Which payment methods dominate transactions? | GROUP BY payment_method with % share |
29+
| What is the monthly transaction volume trend? | DATE_TRUNC + LAG window function |
30+
| Which product categories drive most sales? | JOIN products × transactions + aggregation |
31+
| How many customers are returning vs one-time? | Subquery filtering order count > 1 |
32+
33+
---
34+
35+
## SQL — Advanced Patterns
36+
37+
### Window Function: Customer Revenue Ranking
38+
39+
```sql
40+
SELECT
41+
c.customer_name,
42+
SUM(t.amount) AS total_spent,
43+
RANK() OVER (ORDER BY SUM(t.amount) DESC) AS revenue_rank,
44+
ROUND(SUM(t.amount) / SUM(SUM(t.amount)) OVER () * 100, 1) AS pct_of_total
45+
FROM customers c
46+
JOIN transactions t ON c.customer_id = t.customer_id
47+
GROUP BY c.customer_name
48+
ORDER BY revenue_rank;
9549
```
9650

97-
## Features
51+
### CTE: Monthly Revenue with Month-over-Month Change
52+
53+
```sql
54+
WITH monthly_revenue AS (
55+
SELECT
56+
DATE_TRUNC('month', transaction_date) AS month,
57+
SUM(amount) AS revenue,
58+
COUNT(DISTINCT customer_id) AS active_customers
59+
FROM transactions
60+
GROUP BY 1
61+
)
62+
SELECT
63+
month,
64+
revenue,
65+
active_customers,
66+
ROUND(revenue - LAG(revenue) OVER (ORDER BY month), 0) AS mom_change
67+
FROM monthly_revenue
68+
ORDER BY month;
69+
```
9870

99-
**SQL-Powered Analytics** with structured queries and joins.
100-
**Excel + Tableau** for reporting and visualization.
101-
**Python Automation** for data generation and preprocessing.
102-
**Scalable Architecture** for BI & Data Analysis.
71+
### Multi-Table JOIN: Full Customer Order Profile
72+
73+
```sql
74+
SELECT
75+
c.customer_name,
76+
COUNT(DISTINCT o.order_id) AS total_orders,
77+
SUM(t.amount) AS total_spent,
78+
ROUND(AVG(t.amount), 2) AS avg_transaction,
79+
MAX(t.transaction_date) AS last_purchase
80+
FROM customers c
81+
JOIN orders o ON c.customer_id = o.customer_id
82+
JOIN transactions t ON o.order_id = t.order_id
83+
GROUP BY c.customer_name
84+
ORDER BY total_spent DESC
85+
LIMIT 10;
86+
```
10387

10488
---
10589

106-
## Tech Stack
90+
## Project Structure
10791

108-
- **PostgreSQL** – SQL-based data storage & queries
109-
- **Tableau** – Interactive dashboards and reporting
110-
- **Excel** – Static reports and aggregated insights
111-
- **Python** – Data automation and preprocessing
112-
- **GitHub Actions** – CI/CD for automation
92+
```
93+
mlops_sql_project/
94+
├── sql/
95+
│ ├── ddl/ # schema, tables, constraints
96+
│ ├── dml/ # data inserts & resets
97+
│ └── dql/
98+
│ ├── a_checks/ # data validation (nulls, FK, indexes)
99+
│ ├── b_aggregations/ # counts, min/max, category stats
100+
│ ├── c_transactions/ # order & payment analysis
101+
│ └── d_joins/ # multi-table joins, top customers
102+
├── python/
103+
│ ├── generate_customers.py
104+
│ ├── generate_orders.py
105+
│ ├── generate_products.py
106+
│ ├── generate_transactions.py
107+
│ ├── sales_summary.py
108+
│ └── customer_behavior.py
109+
├── Tableau/
110+
│ ├── sales_dashboard.twb
111+
│ ├── transaction_analysis.twb
112+
│ └── customer_insights.twb
113+
├── excel/
114+
│ ├── sales_summary.xlsx
115+
│ ├── transactions_overview.xlsx
116+
│ └── customer_behavior.xlsx
117+
└── env/ # DB config, logging, settings
118+
```
113119

114120
---
115121

116-
## Setup & Installation
122+
## Architecture
117123

118-
### 1️⃣ Clone the repository
119-
120-
```bash
121-
git clone https://github.com/your-username/mlops_sql_project.git
122-
cd mlops_sql_project
124+
```
125+
PostgreSQL (4 tables: customers, orders, products, transactions)
126+
└── SQL layer: DDL → DML → DQL (26+ queries)
127+
└── Python automation (data generation + export)
128+
├── Tableau dashboards (sales, transactions, customers)
129+
└── Excel reports (summary exports)
123130
```
124131

132+
---
125133

134+
## How to Run
126135

127-
2️⃣ Create a virtual environment (Optional)
128136
```bash
137+
# 1. Clone and set up environment
138+
git clone https://github.com/evgeniimatveev/business-sql-analytics.git
139+
cd business-sql-analytics
129140
conda env create -f environment.yaml
130141
conda activate mlops_env
131-
```
132-
```bash
133-
python -m venv venv
134-
source venv/bin/activate # On macOS/Linux
135-
venv\Scripts\activate # On Windows
136-
pip install -r requirements.txt
137-
```
138-
139-
---
140-
## Future Plans
141-
✅ Advanced SQL optimization
142-
✅ Improved Tableau dashboards
143-
✅ CI/CD for SQL workflow automation
144142

143+
# 2. Configure DB credentials
144+
cp env/.env.example env/.env
145145

146-
---
146+
# 3. Generate synthetic data
147+
python python/generate_customers.py
148+
python python/generate_orders.py
149+
python python/generate_products.py
150+
python python/generate_transactions.py
147151

148-
## 📜 License
149-
This project is distributed under the **MIT License**. Feel free to use the code! 🚀
152+
# 4. Run SQL analytics
153+
# Open sql/dql/ queries in DBeaver or psql
154+
```
150155

151156
---
152157

153-
## 📢 Stay Connected!
154-
💻 **GitHub Repository:** [Evgenii Matveev](https://github.com/evgeniimatveev)
155-
🌐 **Portfolio:** [Data Science Portfolio](https://www.datascienceportfol.io/evgeniimatveevusa)
156-
📌 **LinkedIn:** [Evgenii Matveev](https://www.linkedin.com/in/evgenii-matveev-510926276/)
158+
## Stack
157159

160+
| Layer | Technology |
161+
|-------|-----------|
162+
| Database | PostgreSQL |
163+
| Analytics | SQL (CTEs, Window Functions, Multi-table JOINs) |
164+
| Automation | Python (Faker, pandas, psycopg2) |
165+
| Visualization | Tableau |
166+
| Reporting | Excel |
167+
| CI/CD | GitHub Actions |
158168

159169
---
160170

161-
🔥 **If you like this project, don't forget to star ⭐ the repository!** 🔥
171+
## Connect
172+
173+
- GitHub: [evgeniimatveev](https://github.com/evgeniimatveev)
174+
- Portfolio: [datascienceportfol.io/evgeniimatveevusa](https://www.datascienceportfol.io/evgeniimatveevusa)
175+
- LinkedIn: [Evgenii Matveev](https://www.linkedin.com/in/evgenii-matveev-510926276/)

0 commit comments

Comments
 (0)