↑ Up | ← Previous | Next →
Now let's containerize the ingestion script so we can run it in Docker.
The pipeline/Dockerfile shows how to containerize the ingestion script:
FROM python:3.13.11-slim
COPY --from=ghcr.io/astral-sh/uv:latest /uv /bin/
WORKDIR /code
ENV PATH="/code/.venv/bin:$PATH"
COPY pyproject.toml .python-version uv.lock ./
RUN uv sync --locked
COPY ingest_data.py .
ENTRYPOINT ["uv", "run", "python", "ingest_data.py"]FROM python:3.13.11-slim: Start with slim Python 3.13 image for smaller sizeCOPY --from=ghcr.io/astral-sh/uv:latest /uv /bin/: Copy uv binary from official uv imageWORKDIR /code: Set working directory inside containerENV PATH="/code/.venv/bin:$PATH": Add virtual environment to PATHCOPY pyproject.toml .python-version uv.lock ./: Copy dependency files first (better caching)RUN uv sync --locked: Install all dependencies from lock file (ensures reproducible builds)COPY ingest_data.py .: Copy ingestion scriptENTRYPOINT ["uv", "run", "python", "ingest_data.py"]: Set entry point to run the ingestion script
cd pipeline
docker build -t taxi_ingest:v001 .docker run -it \
--network=pg-network \
taxi_ingest:v001 \
--pg-user=root \
--pg-pass=root \
--pg-host=pgdatabase \
--pg-port=5432 \
--pg-db=ny_taxi \
--target-table=yellow_taxi_trips- We need to provide the network for Docker to find the Postgres container. It goes before the name of the image.
- Since Postgres is running on a separate container, the host argument will have to point to the container name of Postgres (
pgdatabase). - You can drop the table in pgAdmin beforehand if you want, but the script will automatically replace the pre-existing table.
↑ Up | ← Previous | Next →