End-to-End Data Science Project (Linux + MySQL + Python)

End-to-End-Data-Science-Pipeline-Linux-Python-MySQL

Click the banner to view the full analysis report

End-to-End Data Science Project (Linux + MySQL + Python)

📌 Overview

This project demonstrates a full-stack data science workflow entirely on the Ubuntu Linux command line.

Instead of relying solely on notebooks, I built an automated pipeline that:

Ingests raw CSV data using Linux CLI tools (awk, sed).
Cleans & Normalizes data using Python scripts.
Loads structured data into a MySQL database.
Analyzes business KPIs using complex SQL queries.
Visualizes results using matplotlib.

📂 Project Structure

linux-data-science-project/
│
├── data/
│   ├── raw/                  # Original CSV file
│   └── processed/            # (Generated) Cleaned artifacts
│
├── scripts/
│   ├── 01_setup_env.sh       # Virtual env automation
│   ├── 02_etl_to_mysql.ipynb    # Python ETL (CSV -> MySQL)
│   
│
├── notebooks/
│   └── 03_analysis.ipynb        # Generates charts from SQL data
│
├── sql/
│   ├── schema.sql            # Database creation scripts
│   ├── queries.sql           # 10+ Business analytical queries
│
├── output_plots/             # Generated visualizations
├── LINUX_COMMANDS.md         # Documentation of CLI data exploration
├── SQL_SCENARIOS.md          # Business questions & SQL results
├── Makefile                  # Build automation commands
├── README.md                 # Project documentation

🛠 Architecture & Skills Demonstrated

Component	Tools Used	Skills Demonstrated
Data Exploration	Linux Terminal (`grep`, `awk`, `wc`)	CLI proficiency, stream processing
ETL Pipeline	Python (`pandas`, `sqlalchemy`)	Data cleaning, database connectors, automation
Database	MySQL	Schema design, relational modeling
Analytics	SQL	Window functions, aggregations, subqueries
Automation	GNU Make (`Makefile`)	Build automation, reproducible workflows

🚀 How to Run

Prerequisites
- Ubuntu/Linux OS (or WSL)
- MySQL Server installed and running
- Python 3.8+
Setup
1. Create a MySQL database named superstore_db:
```
CREATE DATABASE superstore_db;
```
1. Update database credentials in scripts/02_etl_to_mysql.ipynb:
```
DB_USER = 'your_username'
DB_PASS = 'your_password'
```
1. Run the Pipeline I have set up a Makefile to automate the entire process. Simply run:
```
# Sets up environment, cleans data, loads DB, and runs analysis
make pipeline
```

📊 Key Insights

Full analysis can be found in SQL_SCENARIOS.md.

Top Profit Center: Technology category yields the highest profit margins (17%).
Shipping Efficiency: "Standard Class" shipping averages 5.0 days vs 0.04 days for "Same Day".
Seasonality: Sales volume consistently spikes by 30% in November/December.

📄 Documentation

Linux Command Logs: How I explored the data using only the terminal.
SQL Business Scenarios: Detailed breakdown of 10 business questions and queries.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

End-to-End Data Science Project (Linux + MySQL + Python)

📌 Overview

📂 Project Structure

🛠 Architecture & Skills Demonstrated

🚀 How to Run

📊 Key Insights

📄 Documentation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data/raw		data/raw
notebooks		notebooks
output_plots		output_plots
outputs		outputs
scripts		scripts
sql		sql
LICENSE		LICENSE
LINUX_COMMANDS.md		LINUX_COMMANDS.md
Makefile		Makefile
README.md		README.md
SQL_SCENARIOS.md		SQL_SCENARIOS.md
project_cover_photo.png		project_cover_photo.png

Folders and files

Latest commit

History

Repository files navigation

End-to-End Data Science Project (Linux + MySQL + Python)

📌 Overview

📂 Project Structure

🛠 Architecture & Skills Demonstrated

🚀 How to Run

📊 Key Insights

📄 Documentation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages