This project demonstrates a simple ETL (Extract, Transform, Load) pipeline using Python and PostgreSQL.
task1_d.json: A raw data file that is not in valid JSON form, containing Ruby hash syntax.db_connect.py: Handles the connection to the PostgreSQL database.ingest_data.py: A Python script that extracts the data, uses Regular Expressions to clean it into valid JSON, and loads it into theraw_bookstable.transformation.sql: A pure SQL script that standardizes currency symbols and generates abook_summarytable.
- Ensure PostgreSQL is installed and running.
- Update the database credentials in
db_connect.py. - Run
python ingest_data.pyto populate the database. - Execute
transformation.sqlin pgAdmin or viapsqlto generate the summary table.