Skip to content

[Enhancement] Multi-Image Support for Runner Environment Variants #4

@manascb1344

Description

@manascb1344

Summary

Currently, the runner uses a single, monolithic image built from ubuntu:22.04 with all tools baked in. This issue proposes adding multi-image support — the ability to define and select from multiple runner image variants based on workflow job labels, similar to how GitHub-hosted runners offer ubuntu-latest, windows-latest, etc.

Motivation

  • Different workflows need different tools: A Python ML workflow needs CUDA + PyTorch, a Node.js workflow needs npm/yarn, a Go workflow needs the Go toolchain
  • Image size vs. startup time trade-off: A one-size-fits-all image becomes bloated. Smaller, specialized images cold-start faster
  • GitHub parity: GitHub offers ubuntu-latest, ubuntu-22.04, windows-latest. Users expect label-based image selection
  • GPU workflows: CPU-only jobs shouldn't pay the cost of GPU driver libraries in the image

Proposed Approaches

Option A: Label-Based Image Selection (Recommended)

Workflows specify the desired image via runs-on labels:

jobs:
  python-job:
    runs-on: [self-hosted, modal, image:python]
  node-job:
    runs-on: [self-hosted, modal, image:node]
  ml-job:
    runs-on: [self-hosted, modal, image:ml, gpu:a100]

The webhook handler parses image:<name> from labels and selects the matching image definition.

Option B: Environment Variable Registry

Define available images via environment variables:

modal secret create github-secret \
  GITHUB_TOKEN=ghp_xxx \
  RUNNER_IMAGE_DEFAULT=ubuntu-22.04 \
  RUNNER_IMAGE_PYTHON=python-3.11 \
  RUNNER_IMAGE_ML=pytorch-2.3-cuda-12.1

Option C: External Registry Pull

Support pulling arbitrary images from Docker Hub / ECR / GCR at sandbox spawn time:

runs-on: [self-hosted, modal, image:docker-hub/catthehacker/ubuntu:runner-22.04]

Implementation Sketch

1. Define Image Registry in Code

# runner/services/sandbox_service.py

RUNNER_IMAGES = {
    "default": build_default_image(),      # Current ubuntu-based image
    "python": build_python_image(),        # + Python 3.10/3.11/3.12, pip, poetry
    "node": build_node_image(),            # + Node 18/20, npm, yarn, pnpm
    "go": build_go_image(),                # + Go 1.22, golangci-lint
    "rust": build_rust_image(),            # + Rust, cargo, clippy
    "ml": build_ml_image(),                # + PyTorch, CUDA, transformers
    "docker": build_docker_image(),        # Heavy Docker-in-Docker support
    "minimal": build_minimal_image(),      # Just runner binary, no extras
}

2. Image Resolution in Job Service

# runner/services/job_service.py

def resolve_image(job_labels: list[str]) -> modal.Image:
    for label in job_labels:
        if label.startswith("image:"):
            image_name = label.split(":", 1)[1]
            return RUNNER_IMAGES.get(image_name, RUNNER_IMAGES["default"])
    return RUNNER_IMAGES["default"]

3. Spawn Sandbox with Selected Image

# In spawn_sandbox, accept image parameter
async def spawn_sandbox(
    app: modal.App,
    jit_config: str,
    job_id: int | str,
    image: modal.Image,  # New parameter
    gpu_config=None,
) -> modal.Sandbox:
    sandbox_kwargs = dict(
        image=image,  # Use selected image instead of global runner_image
        ...
    )

Image Definitions (Examples)

Minimal Image

def build_minimal_image() -> modal.Image:
    return (
        modal.Image.from_registry("ubuntu:22.04")
        .apt_install("curl", "ca-certificates")
        .run_commands(f"curl -L https://github.com/actions/runner/releases/download/v{RUNNER_VERSION}/... | tar -xz -C /actions-runner")
    )

Python Image

def build_python_image() -> modal.Image:
    return (
        build_minimal_image()
        .apt_install("python3", "python3-pip", "python3-venv")
        .pip_install("poetry", "pipenv")
    )

ML Image

def build_ml_image() -> modal.Image:
    return (
        build_python_image()
        .pip_install("torch", "transformers", "datasets", "accelerate")
        .apt_install("libcuda1", "cuda-toolkit")
    )

Trade-offs to Consider

Aspect Single Image (Current) Multi-Image
Cold start Consistent Varies by image size
Cache efficiency One cache entry Multiple cache entries
Maintenance One image to update N images to update
Disk usage One image stored N images stored
User flexibility Low High
Complexity Low Medium

Next Steps

  • Decide on image registry approach (hardcoded, env vars, or both)
  • Define initial set of image variants
  • Implement label parsing in job_service.py
  • Add image selection to spawn_sandbox()
  • Update documentation with available images and labels
  • Add tests for image resolution logic

Labels: enhancement, feature-request, architecture
Priority: P3 (nice-to-have, future roadmap)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions