Skip to content

Commit a9845c6

Browse files
authored
feat: Python version matrix for Docker images + error-masking fix (#76)
* feat: add Python version matrix for Docker images Parameterize all 4 Dockerfiles with PYTHON_VERSION build arg. GPU images also accept PYTORCH_BASE to select the correct PyTorch base image per Python version. Add versioned Makefile targets: - build-all-versioned: builds 10 images (GPU 3.11/3.12, CPU 3.10-3.12) - build-wip-versioned: multi-platform push with latest alias - smoketest-versioned: verify Python version in each image GPU base image mapping: - Python 3.11: pytorch/pytorch:2.9.1-cuda12.8-cudnn9-runtime - Python 3.12: pytorch/pytorch:2.10.0-cuda12.8-cudnn9-runtime * fix: stop masking deployment errors by falling back to Live Serverless Previously, _load_generated_handler() silently returned None on any failure (missing file, import error, syntax error), causing deployed endpoints to fall back to the FunctionRequest/Live Serverless handler. This masked real deployment issues like Python version mismatches. Now deployed mode (FLASH_RESOURCE_NAME set) treats handler loading failures as fatal RuntimeError. Live Serverless mode skips the generated handler entirely since it only uses FunctionRequest protocol. * fix: update handler tests for RuntimeError + sync uv.lock Tests expected None returns but handler.py now raises RuntimeError in deployed mode. Updated all 8 TestLoadGeneratedHandler tests to use pytest.raises(RuntimeError). Also synced uv.lock to pick up latest runpod-flash version. * feat: add Python version to startup banner Include platform.python_version() in the worker boot banner for runtime version visibility during E2E testing. * fix(review): address PR #76 feedback - Add build-time Python version validation to GPU Dockerfiles - Restructure build-all-versioned to run setup once via internal targets - Add version assertion to smoketest-versioned (fail on mismatch) * fix(review): address PR #76 round 2 feedback - Add default case to pytorch_base() shell function in all Makefile targets - Guard test_handler.py import against FLASH_RESOURCE_NAME env var * fix: preserve base image torch by removing Python symlink and targeting running interpreter GPU Dockerfiles symlinked /usr/local/bin/python to /usr/bin/python3.X, switching to the system Python which lacks torch and other base image packages. Removed the symlink to preserve the pytorch base image's environment. Changed dependency_installer to use sys.executable instead of --system so runtime package installs go into the same site-packages as torch. * feat: versioned Docker image builds with single pytorch base image - CI matrix builds py3.10/3.11/3.12 for all image types - Removed per-version pytorch base mapping (single runpod/pytorch image) - GPU/LB builds amd64-only (pytorch base has no arm64 manifest) - Added --break-system-packages to CPU Dockerfiles - Auto-detect local Python version in Makefile for build-wip - Bumped runpod-flash dependency to >=1.7.0 - Removed .python-version (version comes from build args) * fix: format dependency_installer.py * docs: document Python version constraints and base image layout GPU workers are pinned to Python 3.12 (torch/CUDA only installed for 3.12 in base image). CPU workers support 3.10-3.12. Added base image Python layout details to architecture doc. * feat: pin GPU images to Python 3.12, default CPU to 3.12 - GPU Dockerfiles: remove PYTHON_VERSION ARG (base image is 3.12), add numpy install (excluded from tarballs by flash build) - CPU Dockerfiles: default PYTHON_VERSION from 3.11 to 3.12 - Makefile: GPU_PYTHON_VERSIONS reduced to 3.12 only, remove --build-arg PYTHON_VERSION from GPU/LB build targets * chore: update lockfile * fix: auto-detect local Python version for CPU image builds PYTHON_VERSION was hardcoded to 3.12 default, ignoring the user's local Python. Restore auto-detection so `make build-wip` with Python 3.10 produces py3.10-wip CPU images as expected. GPU images remain fixed at 3.12. * fix(review): address PR #76 feedback -- CI matrix, Dockerfiles, pip safety - Fix CI matrix duplication: use include-only for GPU jobs (3.12 only) - Remove duplicate python-version list from CPU job matrices - Add Python version validation to GPU Dockerfiles - Use python -m pip instead of bare pip in Dockerfiles - Use sys.executable for pip fallback in dependency_installer.py - Update docs to match uv pip --python sys.executable implementation - Correct image count comment in Makefile (8 not 10) * chore: bump runpod-flash dependency to 1.8.0
1 parent 4499808 commit a9845c6

16 files changed

Lines changed: 746 additions & 1005 deletions

.github/workflows/ci.yml

Lines changed: 99 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -88,6 +88,8 @@ jobs:
8888
platforms: linux/amd64
8989
push: false
9090
tags: flash-cpu:test
91+
build-args: |
92+
PYTHON_VERSION=3.11
9193
cache-from: type=gha
9294
cache-to: type=gha,mode=max
9395
load: true
@@ -115,6 +117,8 @@ jobs:
115117
platforms: linux/amd64
116118
push: false
117119
tags: flash-lb-cpu:test
120+
build-args: |
121+
PYTHON_VERSION=3.11
118122
cache-from: type=gha
119123
cache-to: type=gha,mode=max
120124
load: true
@@ -163,6 +167,11 @@ jobs:
163167
runs-on: ubuntu-latest
164168
needs: [release]
165169
if: needs.release.outputs.release_created
170+
strategy:
171+
matrix:
172+
include:
173+
- python-version: "3.12"
174+
is-default: true
166175
steps:
167176
- name: Clear Space
168177
run: |
@@ -188,14 +197,18 @@ jobs:
188197
username: ${{ secrets.DOCKERHUB_USERNAME }}
189198
password: ${{ secrets.DOCKERHUB_TOKEN }}
190199

191-
- name: Extract GPU metadata
192-
id: meta-gpu
193-
uses: docker/metadata-action@v5
194-
with:
195-
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
196-
tags: |
197-
type=semver,pattern={{version}},value=${{ needs.release.outputs.tag_name }}
198-
type=raw,value=latest,enable={{is_default_branch}}
200+
- name: Build versioned tags
201+
id: tags
202+
run: |
203+
VERSION="${{ needs.release.outputs.tag_name }}"
204+
VERSION="${VERSION#v}"
205+
PYVER="${{ matrix.python-version }}"
206+
IMAGE="${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}"
207+
TAGS="${IMAGE}:py${PYVER}-${VERSION},${IMAGE}:py${PYVER}-latest"
208+
if [ "${{ matrix.is-default }}" = "true" ]; then
209+
TAGS="${TAGS},${IMAGE}:${VERSION},${IMAGE}:latest"
210+
fi
211+
echo "tags=${TAGS}" >> "$GITHUB_OUTPUT"
199212
200213
- name: Set up uv
201214
uses: astral-sh/setup-uv@v4
@@ -210,24 +223,29 @@ jobs:
210223
with:
211224
context: .
212225
file: ./Dockerfile
213-
platforms: linux/amd64,linux/arm64
226+
platforms: linux/amd64
214227
push: true
215-
tags: ${{ steps.meta-gpu.outputs.tags }}
216-
labels: ${{ steps.meta-gpu.outputs.labels }}
217-
cache-from: type=gha
218-
cache-to: type=gha,mode=max
228+
tags: ${{ steps.tags.outputs.tags }}
229+
cache-from: type=gha,scope=gpu
230+
cache-to: type=gha,mode=max,scope=gpu
219231

220232
docker-prod-cpu:
221233
runs-on: ubuntu-latest
222234
needs: [release]
223235
if: needs.release.outputs.release_created
236+
strategy:
237+
matrix:
238+
include:
239+
- python-version: "3.10"
240+
is-default: false
241+
- python-version: "3.11"
242+
is-default: true
243+
- python-version: "3.12"
244+
is-default: false
224245
steps:
225246
- name: Clear Space
226247
run: |
227-
rm -rf /usr/share/dotnet
228-
rm -rf /opt/ghc
229-
rm -rf "/usr/local/share/boost"
230-
rm -rf "$AGENT_TOOLSDIRECTORY"
248+
rm -rf /usr/share/dotnet /opt/ghc /usr/local/share/boost "$AGENT_TOOLSDIRECTORY"
231249
docker system prune -af
232250
df -h
233251
@@ -249,14 +267,18 @@ jobs:
249267
username: ${{ secrets.DOCKERHUB_USERNAME }}
250268
password: ${{ secrets.DOCKERHUB_TOKEN }}
251269

252-
- name: Extract CPU metadata
253-
id: meta-cpu
254-
uses: docker/metadata-action@v5
255-
with:
256-
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}-cpu
257-
tags: |
258-
type=semver,pattern={{version}},value=${{ needs.release.outputs.tag_name }}
259-
type=raw,value=latest,enable={{is_default_branch}}
270+
- name: Build versioned tags
271+
id: tags
272+
run: |
273+
VERSION="${{ needs.release.outputs.tag_name }}"
274+
VERSION="${VERSION#v}"
275+
PYVER="${{ matrix.python-version }}"
276+
IMAGE="${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}-cpu"
277+
TAGS="${IMAGE}:py${PYVER}-${VERSION},${IMAGE}:py${PYVER}-latest"
278+
if [ "${{ matrix.is-default }}" = "true" ]; then
279+
TAGS="${TAGS},${IMAGE}:${VERSION},${IMAGE}:latest"
280+
fi
281+
echo "tags=${TAGS}" >> "$GITHUB_OUTPUT"
260282
261283
- name: Set up uv
262284
uses: astral-sh/setup-uv@v4
@@ -273,15 +295,21 @@ jobs:
273295
file: ./Dockerfile-cpu
274296
platforms: linux/amd64,linux/arm64
275297
push: true
276-
tags: ${{ steps.meta-cpu.outputs.tags }}
277-
labels: ${{ steps.meta-cpu.outputs.labels }}
278-
cache-from: type=gha
279-
cache-to: type=gha,mode=max
298+
tags: ${{ steps.tags.outputs.tags }}
299+
build-args: |
300+
PYTHON_VERSION=${{ matrix.python-version }}
301+
cache-from: type=gha,scope=cpu-py${{ matrix.python-version }}
302+
cache-to: type=gha,mode=max,scope=cpu-py${{ matrix.python-version }}
280303

281304
docker-prod-lb:
282305
runs-on: ubuntu-latest
283306
needs: [release]
284307
if: needs.release.outputs.release_created
308+
strategy:
309+
matrix:
310+
include:
311+
- python-version: "3.12"
312+
is-default: true
285313
steps:
286314
- name: Clear Space
287315
run: |
@@ -307,14 +335,18 @@ jobs:
307335
username: ${{ secrets.DOCKERHUB_USERNAME }}
308336
password: ${{ secrets.DOCKERHUB_TOKEN }}
309337

310-
- name: Extract Load Balancer metadata
311-
id: meta-lb
312-
uses: docker/metadata-action@v5
313-
with:
314-
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}-lb
315-
tags: |
316-
type=semver,pattern={{version}},value=${{ needs.release.outputs.tag_name }}
317-
type=raw,value=latest,enable={{is_default_branch}}
338+
- name: Build versioned tags
339+
id: tags
340+
run: |
341+
VERSION="${{ needs.release.outputs.tag_name }}"
342+
VERSION="${VERSION#v}"
343+
PYVER="${{ matrix.python-version }}"
344+
IMAGE="${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}-lb"
345+
TAGS="${IMAGE}:py${PYVER}-${VERSION},${IMAGE}:py${PYVER}-latest"
346+
if [ "${{ matrix.is-default }}" = "true" ]; then
347+
TAGS="${TAGS},${IMAGE}:${VERSION},${IMAGE}:latest"
348+
fi
349+
echo "tags=${TAGS}" >> "$GITHUB_OUTPUT"
318350
319351
- name: Set up uv
320352
uses: astral-sh/setup-uv@v4
@@ -329,17 +361,25 @@ jobs:
329361
with:
330362
context: .
331363
file: ./Dockerfile-lb
332-
platforms: linux/amd64,linux/arm64
364+
platforms: linux/amd64
333365
push: true
334-
tags: ${{ steps.meta-lb.outputs.tags }}
335-
labels: ${{ steps.meta-lb.outputs.labels }}
336-
cache-from: type=gha
337-
cache-to: type=gha,mode=max
366+
tags: ${{ steps.tags.outputs.tags }}
367+
cache-from: type=gha,scope=lb
368+
cache-to: type=gha,mode=max,scope=lb
338369

339370
docker-prod-lb-cpu:
340371
runs-on: ubuntu-latest
341372
needs: [release]
342373
if: needs.release.outputs.release_created
374+
strategy:
375+
matrix:
376+
include:
377+
- python-version: "3.10"
378+
is-default: false
379+
- python-version: "3.11"
380+
is-default: true
381+
- python-version: "3.12"
382+
is-default: false
343383
steps:
344384
- name: Clear Space
345385
run: |
@@ -365,14 +405,18 @@ jobs:
365405
username: ${{ secrets.DOCKERHUB_USERNAME }}
366406
password: ${{ secrets.DOCKERHUB_TOKEN }}
367407

368-
- name: Extract CPU Load Balancer metadata
369-
id: meta-lb-cpu
370-
uses: docker/metadata-action@v5
371-
with:
372-
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}-lb-cpu
373-
tags: |
374-
type=semver,pattern={{version}},value=${{ needs.release.outputs.tag_name }}
375-
type=raw,value=latest,enable={{is_default_branch}}
408+
- name: Build versioned tags
409+
id: tags
410+
run: |
411+
VERSION="${{ needs.release.outputs.tag_name }}"
412+
VERSION="${VERSION#v}"
413+
PYVER="${{ matrix.python-version }}"
414+
IMAGE="${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}-lb-cpu"
415+
TAGS="${IMAGE}:py${PYVER}-${VERSION},${IMAGE}:py${PYVER}-latest"
416+
if [ "${{ matrix.is-default }}" = "true" ]; then
417+
TAGS="${TAGS},${IMAGE}:${VERSION},${IMAGE}:latest"
418+
fi
419+
echo "tags=${TAGS}" >> "$GITHUB_OUTPUT"
376420
377421
- name: Set up uv
378422
uses: astral-sh/setup-uv@v4
@@ -389,7 +433,8 @@ jobs:
389433
file: ./Dockerfile-lb-cpu
390434
platforms: linux/amd64,linux/arm64
391435
push: true
392-
tags: ${{ steps.meta-lb-cpu.outputs.tags }}
393-
labels: ${{ steps.meta-lb-cpu.outputs.labels }}
394-
cache-from: type=gha
395-
cache-to: type=gha,mode=max
436+
tags: ${{ steps.tags.outputs.tags }}
437+
build-args: |
438+
PYTHON_VERSION=${{ matrix.python-version }}
439+
cache-from: type=gha,scope=lb-cpu-py${{ matrix.python-version }}
440+
cache-to: type=gha,mode=max,scope=lb-cpu-py${{ matrix.python-version }}

.python-version

Lines changed: 0 additions & 1 deletion
This file was deleted.

Dockerfile

Lines changed: 25 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,16 @@
1-
FROM pytorch/pytorch:2.9.1-cuda12.8-cudnn9-runtime
2-
# Python 3.12 included in this PyTorch image
1+
# Base image provides Python 3.12 (from runpod/pytorch:1.0.3-cu1281-torch291-ubuntu2204)
2+
FROM runpod/pytorch:1.0.3-cu1281-torch291-ubuntu2204
3+
4+
# Use the base image's Python as-is to preserve pre-installed packages (torch, cuda libs).
5+
# The pytorch base image provides its own Python with torch already installed.
6+
# Symlinking to /usr/bin/python3.X would switch to a bare system Python without torch.
7+
# Validate that the base image provides the expected Python version.
8+
ARG EXPECTED_PYTHON_VERSION=3.12
9+
RUN python --version && \
10+
actual=$(python -c "import sys; print(f'{sys.version_info.major}.{sys.version_info.minor}')") && \
11+
if [ "$actual" != "$EXPECTED_PYTHON_VERSION" ]; then \
12+
echo "ERROR: Expected Python $EXPECTED_PYTHON_VERSION but base image provides $actual" && exit 1; \
13+
fi
314

415
WORKDIR /app
516

@@ -30,9 +41,20 @@ RUN DEBIAN_FRONTEND=noninteractive apt-get update && apt-get install -y --no-ins
3041
&& rm -rf /var/lib/apt/lists/*
3142

3243
# Copy app code and install dependencies
44+
# Use --python to target the base image's Python (preserves torch in its site-packages)
3345
COPY README.md pyproject.toml uv.lock ./
3446
COPY src/ ./
3547
RUN uv export --format requirements-txt --no-dev --no-hashes > requirements.txt \
36-
&& uv pip install --system -r requirements.txt
48+
&& uv pip install --python $(which python) --break-system-packages -r requirements.txt
49+
50+
# Install numpy for the base image's Python version.
51+
# The runpod/pytorch image ships torch but not numpy. Flash build excludes numpy
52+
# from tarballs (BASE_IMAGE_PACKAGES) to save tarball space (~30 MB), so numpy
53+
# must be provided here in the base image.
54+
RUN python -m pip install --no-cache-dir numpy
55+
56+
# Verify torch and numpy are available from the base image
57+
RUN python -c "import torch; print(f'torch {torch.__version__} CUDA {torch.cuda.is_available()}')" \
58+
&& python -c "import numpy; print(f'numpy {numpy.__version__}')"
3759

3860
CMD ["python", "handler.py"]

Dockerfile-cpu

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
1-
FROM python:3.12-slim
1+
ARG PYTHON_VERSION=3.12
2+
FROM python:${PYTHON_VERSION}-slim
23

34
WORKDIR /app
45

@@ -27,6 +28,6 @@ RUN DEBIAN_FRONTEND=noninteractive apt-get update && apt-get install -y --no-ins
2728
COPY README.md pyproject.toml uv.lock ./
2829
COPY src/ ./
2930
RUN uv export --format requirements-txt --no-dev --no-hashes > requirements.txt \
30-
&& uv pip install --system -r requirements.txt
31+
&& uv pip install --system --break-system-packages -r requirements.txt
3132

3233
CMD ["python", "handler.py"]

Dockerfile-lb

Lines changed: 23 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,14 @@
1-
FROM pytorch/pytorch:2.9.1-cuda12.8-cudnn9-runtime
2-
# Python 3.12 included in this PyTorch image
1+
# Base image provides Python 3.12 (from runpod/pytorch:1.0.3-cu1281-torch291-ubuntu2204)
2+
FROM runpod/pytorch:1.0.3-cu1281-torch291-ubuntu2204
3+
4+
# Use the base image's Python as-is to preserve pre-installed packages (torch, cuda libs).
5+
# Validate that the base image provides the expected Python version.
6+
ARG EXPECTED_PYTHON_VERSION=3.12
7+
RUN python --version && \
8+
actual=$(python -c "import sys; print(f'{sys.version_info.major}.{sys.version_info.minor}')") && \
9+
if [ "$actual" != "$EXPECTED_PYTHON_VERSION" ]; then \
10+
echo "ERROR: Expected Python $EXPECTED_PYTHON_VERSION but base image provides $actual" && exit 1; \
11+
fi
312

413
WORKDIR /app
514

@@ -30,10 +39,21 @@ RUN DEBIAN_FRONTEND=noninteractive apt-get update && apt-get install -y --no-ins
3039
&& rm -rf /var/lib/apt/lists/*
3140

3241
# Copy app code and install dependencies
42+
# Use --python to target the base image's Python (preserves torch in its site-packages)
3343
COPY README.md pyproject.toml uv.lock ./
3444
COPY src/ ./
3545
RUN uv export --format requirements-txt --no-dev --no-hashes > requirements.txt \
36-
&& uv pip install --system -r requirements.txt
46+
&& uv pip install --python $(which python) --break-system-packages -r requirements.txt
47+
48+
# Install numpy for the base image's Python version.
49+
# The runpod/pytorch image ships torch but not numpy. Flash build excludes numpy
50+
# from tarballs (BASE_IMAGE_PACKAGES) to save tarball space (~30 MB), so numpy
51+
# must be provided here in the base image.
52+
RUN python -m pip install --no-cache-dir numpy
53+
54+
# Verify torch and numpy are available from the base image
55+
RUN python -c "import torch; print(f'torch {torch.__version__} CUDA {torch.cuda.is_available()}')" \
56+
&& python -c "import numpy; print(f'numpy {numpy.__version__}')"
3757

3858
EXPOSE 80
3959

Dockerfile-lb-cpu

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
1-
FROM python:3.12-slim
1+
ARG PYTHON_VERSION=3.12
2+
FROM python:${PYTHON_VERSION}-slim
23

34
WORKDIR /app
45

@@ -27,7 +28,7 @@ RUN DEBIAN_FRONTEND=noninteractive apt-get update && apt-get install -y --no-ins
2728
COPY README.md pyproject.toml uv.lock ./
2829
COPY src/ ./
2930
RUN uv export --format requirements-txt --no-dev --no-hashes > requirements.txt \
30-
&& uv pip install --system -r requirements.txt
31+
&& uv pip install --system --break-system-packages -r requirements.txt
3132

3233
EXPOSE 80
3334

0 commit comments

Comments
 (0)