In this project, we developed an ETL pipeline using Apache Airflow to process delivery data and track delayed shipments. The pipeline downloads data from an AWS S3 bucket, cleans it using Spark/Spark SQL to identify missing delivery deadlines, and uploads the cleaned dataset back to S3. This ensures efficient delivery performance tracking.
ManoharVit/ECommerce-Dive-Deep-Sales-Analysis
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|