SQL_Automated_Data_Cleaning_Demo.mp4
This project completely automates the data cleaning process for a large-scale e-commerce dataset containing over 500.000+ rows.
The goal was to eliminate manual data preparation, handle data type errors (like #N/A strings), and provide a clean, analysis-ready dataset updated daily.
- Database: SQL Server
- Automation: SQL Server Agent
- Language: T-SQL (Stored Procedures, CTEs)
- Future Integration: Ready for Power BI / Tableau visualization
Working with the raw e-commerce dataset presented several challenges:
- The
item_idandCustomer_IDcolumns contained#N/Atext values, causing type conversion failures. - Null values in critical categorical and financial columns.
- Duplicate records distorting sales metrics.
I developed a robust Stored Procedure (sp_DailySalesCleaning) that executes a "Full Refresh" ETL logic:
- Error Handling: Used
TRY_CASTto safely convert messy text columns intoINTwithout breaking the process. - Data Imputation: Handled
NULLvalues usingISNULL()for categories, prices, and quantities. - Deduplication: Utilized Common Table Expressions (CTEs) with
ROW_NUMBER()to identify and remove duplicate rows.
To ensure the data is always up-to-date, I configured a SQL Server Agent Job scheduled to run automatically at 02:00 AM every night. It drops the old cleaned table and recreates a fresh one from the latest raw data.