Skip to content

Latest commit

 

History

History

README.md

Docker and PostgreSQL: Data Engineering Workshop

In this workshop, we will explore Docker fundamentals and data engineering workflows using Docker containers. This workshop is part of Module 1 of the Data Engineering Zoomcamp.

Data Engineering is the design and development of systems for collecting, storing and analyzing data at scale.

Prerequisites

  • Basic understanding of Python
  • Basic SQL knowledge (helpful but not required)
  • Docker and Python installed on your machine
  • Git (optional)

Workshop Contents

  1. Introduction to Docker - What is Docker, why use it, basic commands
  2. Virtual Environments and Data Pipelines - Setting up Python environments with uv
  3. Dockerizing the Pipeline - Creating a Dockerfile for a simple pipeline
  4. Running PostgreSQL with Docker - Dockerizing PostgreSQL database
  5. NY Taxi Dataset and Data Ingestion - Working with real data, pandas, SQLAlchemy
  6. Creating the Data Ingestion Script - Converting notebook to Python script
  7. pgAdmin - Database Management Tool - Web-based database management
  8. Dockerizing the Ingestion Script - Containerizing the pipeline
  9. Docker Compose - Multi-container orchestration
  10. SQL Refresher - SQL joins, aggregations, and queries
  11. Cleanup - Cleaning up Docker resources