Skip to content

Latest commit

 

History

History
15 lines (12 loc) · 857 Bytes

File metadata and controls

15 lines (12 loc) · 857 Bytes

Python to PostgreSQL ETL Pipeline

This project demonstrates a simple ETL (Extract, Transform, Load) pipeline using Python and PostgreSQL.

Files Included

  • task1_d.json: A raw data file that is not in valid JSON form, containing Ruby hash syntax.
  • db_connect.py: Handles the connection to the PostgreSQL database.
  • ingest_data.py: A Python script that extracts the data, uses Regular Expressions to clean it into valid JSON, and loads it into the raw_books table.
  • transformation.sql: A pure SQL script that standardizes currency symbols and generates a book_summary table.

How to Run

  1. Ensure PostgreSQL is installed and running.
  2. Update the database credentials in db_connect.py.
  3. Run python ingest_data.py to populate the database.
  4. Execute transformation.sql in pgAdmin or via psql to generate the summary table.