Skip to content

Latest commit

Β 

History

History
378 lines (270 loc) Β· 10.2 KB

File metadata and controls

378 lines (270 loc) Β· 10.2 KB

Dockerfile Standards and Guidelines

This document defines the standardized patterns and best practices for all Dockerfiles in the Discogsography project.

🎯 Overview

All Dockerfiles must follow these standards to ensure consistency, security, and maintainability across the project.

πŸ“ Structure Template

The standard template uses a two-stage build (builder + final). Services that require a build-time asset pipeline add a third stage before the builder stage.

Optional: CSS Build Stage (dashboard only)

The dashboard uses a dedicated Node stage to run the Tailwind v4 CLI and produce a minified stylesheet at image build time, eliminating any CDN dependency at runtime:

# ── CSS build stage ────────────────────────────────────────────────────────────
FROM node:24-slim AS css-builder

WORKDIR /build

# Copy only the files the CLI needs
COPY dashboard/tailwind.input.css ./
COPY dashboard/static/index.html ./static/index.html

# Install Tailwind v4 CLI + forms plugin, emit minified stylesheet
# hadolint ignore=DL3016
RUN npm install @tailwindcss/cli@^4 @tailwindcss/forms --save-dev && \
    ./node_modules/.bin/tailwindcss \
        --input tailwind.input.css \
        --output tailwind.css \
        --minify

The generated tailwind.css is copied into the final stage:

COPY --from=css-builder --chown=discogsography:discogsography /build/tailwind.css /app/dashboard/static/tailwind.css

Standard Two-Stage Template

# syntax=docker/dockerfile:1

# Build arguments
ARG PYTHON_VERSION=3.13
ARG UID=1000
ARG GID=1000

# === BUILDER STAGE ===
FROM python:${PYTHON_VERSION}-slim AS builder

# Install uv
COPY --from=ghcr.io/astral-sh/uv:0.11.8 /uv /bin/uv

# Set environment for build
ENV UV_SYSTEM_PYTHON=1 \
    UV_CACHE_DIR=/tmp/.cache/uv \
    PYTHONUNBUFFERED=1 \
    PYTHONDONTWRITEBYTECODE=1

WORKDIR /app

# Copy dependency files first for better caching
COPY pyproject.toml uv.lock README.md ./
COPY common/pyproject.toml ./common/
COPY <service>/pyproject.toml ./<service>/

# Install dependencies
RUN --mount=type=cache,target=/tmp/.cache/uv \
    uv sync --frozen --no-dev --extra <service>

# Copy source files
COPY common/ ./common/
COPY <service>/ ./<service>/

# === FINAL STAGE ===
FROM python:${PYTHON_VERSION}-slim

# Build arguments for labels
ARG BUILD_DATE
ARG BUILD_VERSION
ARG VCS_REF
ARG UID=1000
ARG GID=1000

# OCI Image Spec Annotations
# [Labels section - see below]

# Install security updates and service-specific packages
# [Package installation section - see below]

# Create user and directories
# [User creation section - see below]

WORKDIR /app

# Copy from builder
COPY --from=builder --chown=discogsography:discogsography /app /app

# Install uv for runtime
COPY --from=ghcr.io/astral-sh/uv:0.11.8 /uv /bin/uv

# Create startup script
# [Startup script section - see below]

# Health check
# [Health check section - see below]

USER discogsography:discogsography

# Environment variables
# [Environment section - see below]

# Expose ports (if applicable)
# [Port exposure section - see below]

# Declare volumes
VOLUME ["/logs"]

# Security comment
# Security: This container should be run with:
# docker run --cap-drop=ALL --security-opt=no-new-privileges:true ...

CMD ["/app/start.sh"]

πŸ“‹ Section Standards

1. Build Arguments

Always define at the top:

ARG PYTHON_VERSION=3.13
ARG UID=1000
ARG GID=1000

2. OCI Labels

Standardized format with service-specific variations:

LABEL org.opencontainers.image.title="Discogsography <Service>" \
      org.opencontainers.image.description="<Service description>" \
      org.opencontainers.image.authors="Robert Wlodarczyk <robert@simplicityguy.com>" \
      org.opencontainers.image.url="https://github.com/SimplicityGuy/discogsography" \
      org.opencontainers.image.documentation="https://github.com/SimplicityGuy/discogsography/blob/main/README.md" \
      org.opencontainers.image.source="https://github.com/SimplicityGuy/discogsography" \
      org.opencontainers.image.vendor="SimplicityGuy" \
      org.opencontainers.image.licenses="MIT" \
      org.opencontainers.image.version="${BUILD_VERSION:-0.1.0}" \
      org.opencontainers.image.revision="${VCS_REF}" \
      org.opencontainers.image.created="${BUILD_DATE}" \
      org.opencontainers.image.base.name="docker.io/library/python:${PYTHON_VERSION}-slim" \
      com.discogsography.service="<service>" \
      com.discogsography.dependencies="<comma-separated-list>" \
      com.discogsography.python.version="${PYTHON_VERSION}"

Additional labels for database services:

  • com.discogsography.database="postgresql" (tableinator)
  • com.discogsography.database="neo4j" (graphinator)

3. Package Installation

Base template:

# Install security updates and curl for healthcheck
# hadolint ignore=DL3008
RUN apt-get update && \
    apt-get upgrade -y && \
    apt-get install -y --no-install-recommends \
        curl && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

Service-specific additions:

  • tableinator: Add libpq5 for PostgreSQL client libraries

4. User and Directory Creation

Standard format for all services:

# Create user and directories with configurable UID/GID
RUN groupadd -r -g ${GID} discogsography && \
    useradd -r -l -u ${UID} -g discogsography -m -s /bin/bash discogsography && \
    mkdir -p /tmp /app /logs && \
    chown -R discogsography:discogsography /tmp /app /logs

Additional directories:

  • extractor: Add /discogs-data directory

5. Startup Script

Standard format:

# Create startup script
# hadolint ignore=SC2016
RUN printf '#!/bin/sh\nset -e\nsleep "${STARTUP_DELAY:-0}"\nexec /app/.venv/bin/python -m <service>.<service> "$@"\n' > /app/start.sh && \
    chmod +x /app/start.sh

6. Health Check

HTTP-based (default):

HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
    CMD curl -f http://localhost:<port>/health || exit 1

Process-based (graphinator only):

HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
    CMD pgrep -f "python.*graphinator" > /dev/null || exit 1

7. Environment Variables

Base environment (all services):

ENV HOME=/home/discogsography \
    PYTHONUNBUFFERED=1 \
    PYTHONDONTWRITEBYTECODE=1 \
    UV_SYSTEM_PYTHON=1 \
    UV_NO_CACHE=1 \
    PATH="/app/.venv/bin:$PATH" \
    RABBITMQ_HOST="" \
    RABBITMQ_USERNAME="" \
    RABBITMQ_PASSWORD=""

Service-specific additions:

  • dashboard: All database connections
  • extractor: DISCOGS_ROOT="/discogs-data" and PERIODIC_CHECK_DAYS="15"
  • graphinator: Neo4j connections
  • tableinator: PostgreSQL connections

8. Port Exposure

Only expose ports for services with HTTP endpoints or health checks:

  • api: EXPOSE 8004 8005
  • dashboard: EXPOSE 8003
  • explore: EXPOSE 8006 8007
  • insights: EXPOSE 8008 8009
  • extractor: EXPOSE 8000
  • graphinator: EXPOSE 8001
  • tableinator: EXPOSE 8002
  • brainztableinator: EXPOSE 8010
  • brainzgraphinator: EXPOSE 8011

9. Volume Declaration

Standard volume:

VOLUME ["/logs"]

Additional volumes:

  • extractor: Add "/discogs-data"

πŸ”§ Service-Specific Requirements

Schema-Init

  • One-shot init container β€” no health check port, no restart: unless-stopped
  • Docker Compose restart: "no" so it exits after completing
  • Neo4j and PostgreSQL connection environment variables
  • Read-only filesystem with /tmp tmpfs mount
  • cap_drop: ALL (no Linux capabilities needed)

Dashboard

  • Three-stage build: css-builder (Node) β†’ builder (Python) β†’ final
  • css-builder stage runs Tailwind v4 CLI to produce dashboard/static/tailwind.css
  • Expose port 8003
  • All database connections in environment

API

  • Expose ports 8004 (service) and 8005 (health)
  • All database connections in environment

Extractor

  • Rust-based container using multi-stage build
  • Create /discogs-data directory
  • Add /discogs-data volume
  • Special environment variables for Discogs configuration

Graphinator

  • Process-based health check (no HTTP endpoint)
  • Neo4j connection environment variables

Tableinator

  • Install libpq5 for PostgreSQL
  • PostgreSQL connection environment variables

Insights

  • Expose ports 8008 (service) and 8009 (health)
  • Fetches data from API internal endpoints over HTTP
  • Uses Redis for caching
  • API_BASE_URL and REDIS_HOST environment variables

Brainzgraphinator

  • MusicBrainz data enrichment into Neo4j
  • Neo4j and RabbitMQ connection environment variables
  • Health check on port 8011

Brainztableinator

  • MusicBrainz data into PostgreSQL musicbrainz schema
  • Install libpq5 for PostgreSQL
  • PostgreSQL and RabbitMQ connection environment variables
  • Health check on port 8010

MCP Server

  • Exposes knowledge graph to AI assistants via API (no direct DB access)
  • API_BASE_URL environment variable

βœ… Quality Checklist

Before committing any Dockerfile:

  1. Structure: Follows the standard template order
  2. Comments: Includes all standard comments and hadolint ignores
  3. Labels: All OCI labels present with correct values
  4. Security: Security comment present at bottom
  5. Health Check: Appropriate health check for service type
  6. Environment: All required environment variables defined
  7. Volumes: /logs volume declared (plus service-specific)
  8. User: Runs as discogsography user
  9. Caching: Uses BuildKit cache mounts
  10. Linting: Passes hadolint validation

πŸ›‘οΈ Security Standards

  1. Non-root execution: All containers run as UID/GID 1000
  2. Minimal packages: Only install what's needed
  3. Security updates: Always run apt-get upgrade
  4. Clean up: Remove apt lists after installation
  5. Capability dropping: Document in security comment
  6. Read-only root: Can be enabled with tmpfs mounts

πŸ“ Maintenance

When updating Dockerfiles:

  1. Update this document if adding new patterns
  2. Apply changes consistently across all services
  3. Test builds for all services
  4. Update docker-compose.yml if needed
  5. Verify health checks still function