This document describes the internal architecture of github-runner, including its component structure, data flow, concurrency model, and shutdown behavior.
github-runner is a single-binary process that manages one or more runner
pools. Each pool maps to a [[runners]] entry in the config file and
operates independently with its own executor type, concurrency level, and
GitHub API credentials.
flowchart TB
CLI["CLI (cobra)"] --> Manager["Runner Manager (1 per process)"]
subgraph Pools["Runner pools (one per [[runners]] entry)"]
W1["Worker 1 (job)"]
W2["Worker 2 (idle)"]
WN["Worker N (job)"]
end
Manager --> Pools
subgraph Layers["Execution + services"]
Executor["Executor layer<br/>shell / docker / kubernetes / firecracker"]
Cache["Cache layer<br/>local / s3 / gcs"]
Artifact["Artifact manager"]
GitHubAPI["GitHub API client"]
Metrics["Metrics"]
Health["Health"]
Logger["Logger"]
end
Pools --> Executor
Pools --> Cache
Pools --> Artifact
Manager --> GitHubAPI
Manager --> Metrics
Manager --> Health
Manager --> Logger
graph TD
CLI["CLI (cobra)"] --> Manager["Runner Manager"]
Manager --> Pool1["Runner Pool: docker-fast"]
Manager --> Pool2["Runner Pool: shell-local"]
Manager --> MetricsSrv["Metrics Server :9252"]
Manager --> HealthSrv["Health Server :8484"]
Manager --> SignalHandler["Signal Handler"]
Pool1 --> Poller1["Poller"]
Pool1 --> Worker1A["Worker 1"]
Pool1 --> Worker1B["Worker 2"]
Pool1 --> Worker1N["Worker N"]
Poller1 --> GitHubAPI["GitHub API Client"]
Worker1A --> Executor["Executor (docker)"]
Worker1A --> Heartbeat["Heartbeat Reporter"]
Worker1A --> SecretMasker["Secret Masker"]
Worker1A --> Hooks["Hook Chain"]
Executor --> Docker["Docker Engine API"]
Pool2 --> Poller2["Poller"]
Pool2 --> Worker2A["Worker 1"]
Worker2A --> ShellExec["Executor (shell)"]
GitHubAPI --> RateLimit["Rate Limiter"]
GitHubAPI --> Retry["Retry + Backoff"]
Manager --> ConfigWatcher["Config Watcher (fsnotify)"]
The top-level orchestrator. One instance per process. Responsibilities:
- Creates and supervises runner pools based on config
- Starts metrics and health HTTP servers
- Handles OS signals (SIGTERM/SIGINT for shutdown, SIGHUP for reload)
- Coordinates graceful shutdown with timeout enforcement
One pool per [[runners]] config entry. Each pool:
- Runs a Poller goroutine that queries the GitHub API for available jobs
- Maintains a buffered channel of jobs (buffer size = concurrency)
- Spawns Worker goroutines that pull from the channel
- Tracks active job count via
atomic.Int64
A worker executes a single job through its lifecycle:
- Registers secrets for log masking
- Starts heartbeat reporter in a background goroutine
- Transitions through the lifecycle state machine
- Runs pre-job hooks
- Calls
Executor.Prepare()to set up the environment - Executes steps sequentially via
Executor.Run() - Runs post-job hooks
- Reports final status to GitHub
- Calls
Executor.Cleanup()(always, even on failure)
Polls the GitHub API at a configured interval. Features:
- Exponential backoff on consecutive errors (up to 5 minutes)
- Automatic interval reset after successful poll
- Blocks on the job channel when all workers are busy
A state machine that enforces valid job state transitions:
stateDiagram-v2
[*] --> Queued
Queued --> Claimed
Claimed --> Preparing
Preparing --> Running
Running --> PostExec
PostExec --> Completed
PostExec --> Failed
Completed --> Cleanup
Failed --> Cleanup
Cleanup --> [*]
Queued --> Cancelled
Claimed --> Cancelled
Claimed --> Failed
Preparing --> Failed
Preparing --> Cancelled
Running --> Failed
Running --> Cancelled
PostExec --> Cancelled
Cancelled --> Cleanup
Every transition is validated, logged, and reported to GitHub.
main goroutine
└── Manager.Start()
├── signal handler (1 goroutine)
├── metrics server (1 goroutine)
├── health server (1 goroutine)
│
├── Pool "docker-fast" (1 goroutine)
│ ├── Poller (1 goroutine)
│ ├── Worker 0 (1 goroutine per active job)
│ │ ├── heartbeat (1 goroutine)
│ │ └── executor I/O (managed by executor)
│ ├── Worker 1
│ └── ...Worker N
│
└── Pool "shell-local" (1 goroutine)
├── Poller (1 goroutine)
└── Worker 0..M
| Resource | Mechanism | Notes |
|---|---|---|
| Job dispatch channel | Buffered channel | Size = pool concurrency |
| Active job count | atomic.Int64 |
Lock-free reads for metrics |
| Config state | sync.RWMutex |
Writer: config watcher. Readers: pools. |
| Lifecycle state | sync.RWMutex |
Per-job, no sharing between workers |
| Shutdown coordination | context.Context + sync.WaitGroup |
Cancel propagates to all goroutines |
| Secret masker patterns | sync.RWMutex |
Writers: AddSecret. Readers: Write/MaskString. |
| Cache index (local) | sync.RWMutex + file lock |
Mutex for in-process safety, flock for multi-process |
| Metrics counters | Prometheus client internals | Atomic internally, no additional sync |
| Rate limit state | sync.RWMutex |
Updated from response headers |
- Workers never share mutable state. Each worker gets its own masker, executor instance, and lifecycle tracker.
- All inter-goroutine communication uses channels or context cancellation.
- Every goroutine respects
ctx.Done()for clean shutdown. deferis used for all cleanup to ensure resources are released on every code path.
1. SIGTERM or SIGINT received
2. Root context cancelled → propagates to all pools and workers
3. Pollers stop accepting new jobs immediately
4. Health server reports not-ready (/readyz returns 503)
5. In-flight workers:
a. Current step completes (bounded by shutdown_timeout)
b. Executor.Cleanup() called
c. Job status reported to GitHub as cancelled
6. WaitGroup.Wait() blocks until all workers finish
7. Metrics and health servers shut down
8. Process exits with code 0
If shutdown_timeout expires:
- Warning logged with list of still-running jobs
- Process exits with code 1
sequenceDiagram
participant GH as GitHub API
participant R as Runner
participant E as Executor
R->>GH: Poll jobs (GET /jobs)
GH-->>R: Job payload
R->>E: Prepare()
R->>GH: Status in_progress
loop For each workflow step
R->>E: Run(step)
E-->>R: StepResult
R->>GH: Step status + heartbeat
end
R->>GH: Status completed
R->>E: Cleanup()
cmd/github-runner
└── internal/cli
├── internal/config
├── internal/runner
│ ├── internal/executor
│ ├── internal/github
│ ├── internal/hook
│ └── internal/secret
├── internal/metrics
├── internal/health
├── internal/log
└── internal/version
internal/executor
├── internal/executor/shell
├── internal/executor/docker
├── internal/executor/kubernetes
└── internal/executor/firecracker
internal/cache (standalone)
internal/artifact (standalone)
internal/job (depends on executor, api)
pkg/api (no internal dependencies)
No circular dependencies exist. pkg/api is the leaf package that all others
import. Internal packages import downward only.