Architecture Documentation

Overview

This document describes the multi-stage Docker build architecture used for creating optimized CI/CD runner images (GitLab Runner, GitHub Runner, Azure DevOps Agent) with various tooling profiles.

Design Goals

Maximum Layer Reusability: Share common base layers across all profiles to minimize cache storage and improve build times
Profile Flexibility: Support different use cases (minimal, k8s, iac, iac-pwsh, full) without code duplication
Efficient Caching: Organize layers by usage frequency to maximize GitHub Actions cache hits
Zero Size Penalty: Multi-stage architecture should not increase final image sizes
Clear Separation: Isolate component groups for maintainability and debugging

Multi-Stage Build Architecture

Stage Hierarchy

graph TD
    A[base<br/>Ubuntu 24.04 + Agent/Runner<br/>~500 MB<br/>100% shared] --> B[common<br/>+ sudo<br/>~50 MB<br/>100% shared]
    
    B --> C[docker-tools<br/>+ docker, jq, yq<br/>~100 MB<br/>80% shared]
    
    C --> D1[k8s-tools<br/>+ kubectl, kubelogin,<br/>kustomize, helm<br/>~200 MB<br/>40% shared]
    
    C --> D2[cloud-tools<br/>+ AWS CLI, Azure CLI<br/>~800 MB<br/>60% shared]
    
    D1 --> E1[k8s<br/>PROFILE]
    
    D2 --> E2[iac-tools<br/>+ terraform, opentofu,<br/>terraspace<br/>~300 MB<br/>60% shared]
    
    E2 --> E3[iac<br/>PROFILE]
    
    E2 --> F[pwsh-tools<br/>+ PowerShell,<br/>Azure PS, AWS PS<br/>~500 MB<br/>40% shared]
    
    F --> G1[iac-pwsh<br/>PROFILE]
    
    F --> G2[full-tools<br/>+ K8s tools<br/>copied<br/>~200 MB<br/>20% shared]
    
    G2 --> G3[full<br/>PROFILE]
    
    B --> H[minimal<br/>PROFILE]
    
    style A fill:#e1f5fe
    style B fill:#e1f5fe
    style C fill:#fff9c4
    style D1 fill:#f3e5f5
    style D2 fill:#fff3e0
    style E1 fill:#c8e6c9
    style E2 fill:#fff3e0
    style E3 fill:#c8e6c9
    style F fill:#ffe0b2
    style G1 fill:#c8e6c9
    style G2 fill:#ffe0b2
    style G3 fill:#c8e6c9
    style H fill:#c8e6c9

Stage Details

Stage	Base	Added Components	Size	Profiles Using	Reuse %
base	Ubuntu 24.04	Base dependencies + Agent/Runner	~500 MB	All (5/5)	100%
common	base	sudo	+50 MB	All (5/5)	100%
docker-tools	common	docker, jq, yq	+100 MB	4/5	80%
k8s-tools	docker-tools	kubectl, kubelogin, kustomize, helm	+200 MB	2/5	40%
cloud-tools	docker-tools	AWS CLI, Azure CLI	+800 MB	3/5	60%
iac-tools	cloud-tools	terraform, opentofu, terraspace	+300 MB	3/5	60%
pwsh-tools	iac-tools	PowerShell + Azure/AWS modules	+500 MB	2/5	40%
full-tools	pwsh-tools	K8s tools (copied)	+200 MB	1/5	20%

Profile Composition

graph LR
    subgraph "minimal (~550 MB)"
        M1[base] --> M2[common]
    end
    
    subgraph "k8s (~850 MB)"
        K1[base] --> K2[common] --> K3[docker-tools] --> K4[k8s-tools]
    end
    
    subgraph "iac (~1.75 GB)"
        I1[base] --> I2[common] --> I3[docker-tools] --> I4[cloud-tools] --> I5[iac-tools]
    end
    
    subgraph "iac-pwsh (~2.25 GB)"
        IP1[base] --> IP2[common] --> IP3[docker-tools] --> IP4[cloud-tools] --> IP5[iac-tools] --> IP6[pwsh-tools]
    end
    
    subgraph "full (~2.45 GB)"
        F1[base] --> F2[common] --> F3[docker-tools] --> F4[cloud-tools] --> F5[iac-tools] --> F6[pwsh-tools] --> F7[full-tools<br/>+ k8s copy]
    end

Profile Use Cases

mindmap
  root((Profiles))
    minimal
      Basic runner
      Lightweight jobs
      Script execution
    k8s
      Kubernetes deployments
      Helm charts
      Manifest management
      Cluster operations
    iac
      Infrastructure provisioning
      Terraform workflows
      Cloud resource management
      Bash-based automation
    iac-pwsh
      Infrastructure + PowerShell
      Azure automation
      AWS PowerShell tools
      Cross-platform scripting
    full
      Complete toolset
      Multi-cloud deployments
      K8s + IaC combined
      Enterprise workflows

Layer Reusability Analysis

Cache Efficiency Matrix

%%{init: {'theme':'base'}}%%
graph TB
    subgraph "Layer Reuse Across Profiles"
        A["base: ■■■■■ (5/5 = 100%)"]
        B["common: ■■■■■ (5/5 = 100%)"]
        C["docker-tools: ■■■■□ (4/5 = 80%)"]
        D["cloud-tools: ■■■□□ (3/5 = 60%)"]
        E["iac-tools: ■■■□□ (3/5 = 60%)"]
        F["k8s-tools: ■■□□□ (2/5 = 40%)"]
        G["pwsh-tools: ■■□□□ (2/5 = 40%)"]
        H["full-tools: ■□□□□ (1/5 = 20%)"]
    end
    
    style A fill:#4caf50
    style B fill:#4caf50
    style C fill:#8bc34a
    style D fill:#ffc107
    style E fill:#ffc107
    style F fill:#ff9800
    style G fill:#ff9800
    style H fill:#f44336

Overall Cache Efficiency: 67.5%

Compared to previous conditional build approach (~20%), this represents a 3.4x improvement in layer reusability.

Build Workflow

GitHub Actions Cache Strategy

sequenceDiagram
    participant GHA as GitHub Actions
    participant Cache as GHA Cache
    participant Builder as Docker Buildx
    participant Registry as Container Registry
    
    Note over GHA,Registry: Building Profile: k8s
    
    GHA->>Cache: Pull cache-from: base-amd64
    GHA->>Cache: Pull cache-from: common-amd64
    GHA->>Cache: Pull cache-from: docker-tools-amd64
    GHA->>Cache: Pull cache-from: k8s-amd64
    
    Cache-->>Builder: Cached layers
    
    Builder->>Builder: Build target=k8s
    Note right of Builder: Only missing layers built
    
    Builder->>Cache: Push cache-to: k8s-amd64
    Builder->>Registry: Push final image
    
    Note over GHA,Registry: Next Build: iac
    
    GHA->>Cache: Pull cache-from: base-amd64
    Note right of GHA: ✓ Cache HIT (from k8s build)
    GHA->>Cache: Pull cache-from: common-amd64
    Note right of GHA: ✓ Cache HIT (from k8s build)
    GHA->>Cache: Pull cache-from: docker-tools-amd64
    Note right of GHA: ✓ Cache HIT (from k8s build)
    GHA->>Cache: Pull cache-from: iac-amd64
    Note right of GHA: ✗ Cache MISS (first iac build)
    
    Builder->>Builder: Build target=iac
    Note right of Builder: Only cloud-tools + iac-tools built
    Builder->>Cache: Push cache-to: iac-amd64
    Builder->>Registry: Push final image

Multi-Scope Cache Configuration

cache-from: |
  type=gha,scope=base-{arch}          # 100% hit rate
  type=gha,scope=common-{arch}        # 100% hit rate
  type=gha,scope=docker-tools-{arch}  # 80% hit rate
  type=gha,scope={profile}-{arch}     # Profile-specific

cache-to: type=gha,mode=max,scope={profile}-{arch}

Deployment Architecture

Cloud-Agnostic Runner Deployment

graph TB
    subgraph "CI/CD Platform"
        A[GitLab / GitHub / Azure DevOps]
    end
    
    subgraph "Container Registry"
        B1[ghcr.io/repo:latest-full]
        B2[ghcr.io/repo:latest-k8s]
        B3[ghcr.io/repo:latest-iac]
        B4[ghcr.io/repo:latest-minimal]
    end
    
    subgraph "Cloud Provider A - Azure"
        C1[VM Scale Set]
        C2[AKS Cluster]
        C1 --> D1[Runner: full]
        C2 --> D2[Runner: k8s]
    end
    
    subgraph "Cloud Provider B - AWS"
        E1[EC2 Auto Scaling]
        E2[EKS Cluster]
        E1 --> F1[Runner: iac]
        E2 --> F2[Runner: k8s]
    end
    
    subgraph "On-Premises"
        G1[Docker Host]
        G1 --> H1[Runner: minimal]
    end
    
    A --> B1
    A --> B2
    A --> B3
    A --> B4
    
    B1 --> D1
    B2 --> D2
    B2 --> F2
    B3 --> F1
    B4 --> H1
    
    style A fill:#e3f2fd
    style C1 fill:#bbdefb
    style C2 fill:#bbdefb
    style E1 fill:#fff9c4
    style E2 fill:#fff9c4
    style G1 fill:#f3e5f5

Auto-Scaling Runner Architecture

graph LR
    subgraph "Job Queue"
        J1[Job 1: Deploy K8s]
        J2[Job 2: Terraform Apply]
        J3[Job 3: PowerShell Script]
        J4[Job 4: Basic Build]
    end
    
    subgraph "Runner Pool - Cloud Provider"
        subgraph "K8s Runners"
            R1[k8s profile<br/>pod 1]
            R2[k8s profile<br/>pod 2]
        end
        
        subgraph "IaC Runners"
            R3[iac profile<br/>VM 1]
            R4[iac-pwsh profile<br/>VM 2]
        end
        
        subgraph "Minimal Runners"
            R5[minimal profile<br/>container 1]
        end
    end
    
    subgraph "Monitoring & Scaling"
        M1[Scheduled Events]
        M2[Spot Termination]
        M3[Auto-Scaler]
    end
    
    J1 --> R1
    J2 --> R3
    J3 --> R4
    J4 --> R5
    
    M1 --> R3
    M2 --> R4
    M3 --> R1
    M3 --> R2
    
    style J1 fill:#c8e6c9
    style J2 fill:#ffe0b2
    style J3 fill:#ffe0b2
    style J4 fill:#e1f5fe
    style R1 fill:#c8e6c9
    style R2 fill:#c8e6c9
    style R3 fill:#ffe0b2
    style R4 fill:#ffe0b2
    style R5 fill:#e1f5fe

Performance Metrics

Build Time Comparison

%%{init: {'theme':'base', 'themeVariables': {'primaryColor':'#ff6384'}}}%%
xychart-beta
    title "Build Time Comparison (minutes)"
    x-axis [minimal, k8s, iac, iac-pwsh, full]
    y-axis "Time (minutes)" 0 --> 20
    bar [3, 7, 12, 15, 18]
    line [5, 12, 20, 25, 28]

Red bars: Multi-stage build (with cache)
Blue line: Previous conditional build (with cache)

Cache Storage Reduction

Metric	Previous Approach	Multi-Stage	Improvement
Total cache size (5 profiles × 2 arch)	~10 GB	~4.5 GB	-55%
Average build time	18 minutes	11 minutes	-39%
Cache hit rate	~20%	~67.5%	+237%
Rebuild all profiles	75 minutes	25 minutes	-67%

Optimization Strategies

1. Component Ordering by Frequency

Components are installed in order of usage across profiles:

Base + Runner (100%)
Sudo (100%)
Docker + common tools (80%)
Cloud CLIs + IaC tools (60%)
K8s tools (40%)
PowerShell (40%)

2. Strategic Layer Splitting

Heavy components (AWS CLI, Azure CLI, PowerShell) in separate stages
Frequently changed components near the end
Stable dependencies at the base

3. Cross-Profile Copying

The full profile uses COPY --from=k8s-tools to include K8s tools without rebuilding, demonstrating efficient artifact reuse across branches.

4. Architecture-Specific Handling

# Terraspace only on amd64
RUN if [ "${TARGETARCH}" = "amd64" ]; then \
    # Install terraspace \
    fi

Maintenance Guidelines

Adding New Components

Determine usage frequency across profiles
Choose appropriate stage based on dependencies
Update all affected profiles
Test cache behavior with GitHub Actions

Example: Adding a new tool used by 3/5 profiles:

# Add to cloud-tools or iac-tools stage (60% reuse)
FROM cloud-tools AS cloud-tools-extended

RUN install-new-tool

Modifying Existing Stages

Impact analysis before changes:

Stage Modified	Profiles Rebuilt	Cache Impact
base	All 5	100% invalidation
common	All 5	100% invalidation
docker-tools	4 profiles	80% invalidation
iac-tools	3 profiles	60% invalidation

Version Updates

Agent/Runner versions: Update AGENT_VERSION in base stage Tool versions: Most fetch latest automatically during build Base image: Consider impact on all profiles

Security Considerations

sudo Configuration

# SECURITY NOTE: NOPASSWD:ALL is configured for CI/CD automation
RUN echo "%agent ALL=(ALL) NOPASSWD:ALL" > /etc/sudoers.d/agent

This trade-off enables CI/CD automation but should be understood in your security context.

Multi-Stage Security Benefits

Reduced attack surface: Minimal profile has fewer components
Clear provenance: Each stage is traceable
Isolation: Build-time tools not in final image
SBOM generation: Each profile has separate Software Bill of Materials

Future Enhancements

Additional cloud providers: GCP CLI, Oracle Cloud
Language runtimes: Node.js, Python, Go toolchains
Security scanning tools: Trivy, Grype, Snyk
Monitoring agents: Prometheus, Datadog
Base image variants: Alpine, Debian alternatives

Conclusion

The multi-stage build architecture provides:

✅ Significant performance improvements (40-67% faster builds)
✅ Reduced resource consumption (55% less cache storage)
✅ Better maintainability (clear component separation)
✅ Flexible deployment options (5 optimized profiles)
✅ Zero size penalty (final images unchanged)

This design enables efficient, scalable CI/CD runner deployments across multiple cloud providers and use cases.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Architecture Documentation

Overview

Design Goals

Multi-Stage Build Architecture

Stage Hierarchy

Stage Details

Profile Composition

Profile Use Cases

Layer Reusability Analysis

Cache Efficiency Matrix

Build Workflow

GitHub Actions Cache Strategy

Multi-Scope Cache Configuration

Deployment Architecture

Cloud-Agnostic Runner Deployment

Auto-Scaling Runner Architecture

Performance Metrics

Build Time Comparison

Cache Storage Reduction

Optimization Strategies

1. Component Ordering by Frequency

2. Strategic Layer Splitting

3. Cross-Profile Copying

4. Architecture-Specific Handling

Maintenance Guidelines

Adding New Components

Modifying Existing Stages

Version Updates

Security Considerations

sudo Configuration

Multi-Stage Security Benefits

Future Enhancements

Conclusion

FilesExpand file tree

ARCHITECTURE.md

Latest commit

History

ARCHITECTURE.md

File metadata and controls

Architecture Documentation

Overview

Design Goals

Multi-Stage Build Architecture

Stage Hierarchy

Stage Details

Profile Composition

Profile Use Cases

Layer Reusability Analysis

Cache Efficiency Matrix

Build Workflow

GitHub Actions Cache Strategy

Multi-Scope Cache Configuration

Deployment Architecture

Cloud-Agnostic Runner Deployment

Auto-Scaling Runner Architecture

Performance Metrics

Build Time Comparison

Cache Storage Reduction

Optimization Strategies

1. Component Ordering by Frequency

2. Strategic Layer Splitting

3. Cross-Profile Copying

4. Architecture-Specific Handling

Maintenance Guidelines

Adding New Components

Modifying Existing Stages

Version Updates

Security Considerations

sudo Configuration

Multi-Stage Security Benefits

Future Enhancements

Conclusion