The Admin UI is a Next.js app that talks to the worker admin gateway over
HTTP/JSON with mTLS. The gateway is a thin translation layer that fronts the
gRPC AdminService — all data flows through it. The UI does not access
Redis directly, so the durable backend remains pluggable.
./scripts/gen-admin-certs.sh
docker compose -f compose.admin.yaml up --buildThe compose file spins up Redis, the worker service, the admin gateway, and
the Next.js UI on https://localhost:3000 (with mTLS).
All endpoints are under /admin/v1/. Content type is application/json for
request/response bodies. Errors come back with a
{requestId, error: {code, message}} envelope.
| Method | Path | Purpose |
|---|---|---|
| GET | /health |
Service health + build info |
| GET | /overview |
Stats, coordination state, admin action counters |
| GET | /queues |
List all queues |
| GET | /queues/{name} |
Queue summary |
| POST | /queues/{name}/weight |
Update queue weight |
| DELETE | /queues/{name}/weight |
Reset queue weight to default |
| POST | /queues/{name}/pause |
Pause/resume one queue (approval) |
| GET | /schedules |
List schedules |
| POST | /schedules |
Create schedule |
| DELETE | /schedules/{name} |
Delete schedule |
| POST | /schedules/{name}/pause |
Pause/resume schedule (approval) |
| POST | /schedules/{name}/run |
Trigger immediate run (approval) |
| POST | /schedules/pause |
Pause/resume all schedules (approval) |
| GET | /schedules/events |
Schedule execution history |
| GET | /schedules/factories |
Registered schedule factories |
| GET | /jobs |
List jobs |
| POST | /jobs |
Upsert job |
| GET | /jobs/{name} |
Job definition |
| DELETE | /jobs/{name} |
Delete job |
| POST | /jobs/{name}/run |
Trigger one-off run (approval) |
| GET | /jobs/{name}/artifact/meta |
Tarball size/SHA256 metadata |
| GET | /jobs/{name}/artifact |
Stream tarball (or 302 to remote) |
| GET | /jobs/events |
Job execution history |
| GET | /dlq |
List DLQ entries (limit, offset, queue, handler, query) |
| GET | /dlq/{id} |
DLQ entry detail |
| POST | /dlq/replay |
Replay DLQ entries up to limit (approval) |
| POST | /dlq/replay/ids |
Replay specific IDs (approval) |
| POST | /pause |
Pause global dequeue |
| POST | /resume |
Resume global dequeue |
| GET | /audit |
Audit events (action, target, limit) |
| GET | /audit/export |
Export audit log as JSON / JSONL / CSV |
| GET | /metrics |
JSON metrics snapshot |
| GET | /metrics/prometheus |
Prometheus exposition format |
| GET | /events |
Server-Sent Events stream |
Endpoints marked approval propagate the X-Admin-Approval header to the
backend as the gRPC metadata key x-admin-approval; the backend may reject
the call if the approval token is missing or invalid when approval is
required. Other endpoints ignore the header.
The SSE stream emits an event every 10 seconds while the client is connected. Each tick flushes five event types in order:
| Event name | Payload |
|---|---|
overview |
{stats, coordination, actions} — same shape as GET /overview |
schedule_events |
{events: [...]} — latest schedule executions |
job_events |
{events: [...]} — latest job executions |
audit_events |
{events: [...]} — latest audit entries |
heartbeat |
{t: <ms>} — liveness tick for proxy/keep-alive |
Each event carries a monotonic id: header. Clients may resume after a
disconnect by sending Last-Event-ID: <n> (or the ?lastEventId= query
param); the gateway replays events from an in-memory ring of the last 300
events and then resumes live emission. SSE requires a
flush-capable transport; when behind a proxy that buffers, the UI falls back
to polling every 15 seconds.
X-Request-Id— optional. If present it is sanitized to[A-Za-z0-9_.-]and truncated to 128 chars before being forwarded to the backend as the gRPC metadatax-request-id. If absent or empty after sanitization the gateway generates a UUID. The effective value is echoed back in the response headers and included in every error envelope for end-to-end correlation.X-Admin-Approval— forwarded asx-admin-approvalgRPC metadata on the destructive endpoints flagged above. Format is opaque to the gateway; the backend decides what constitutes a valid approval token.
WORKER_ADMIN_API_URL(e.g.https://127.0.0.1:8081)WORKER_ADMIN_MTLS_CERT,WORKER_ADMIN_MTLS_KEY,WORKER_ADMIN_MTLS_CAWORKER_ADMIN_PASSWORD(required)WORKER_ADMIN_ALLOW_MOCK=falseNEXT_PUBLIC_WORKER_ADMIN_ORIGIN(optional override for SSR fetch)
WORKER_ADMIN_GRPC_ADDR— gRPC backend address (default127.0.0.1:50052)WORKER_ADMIN_HTTP_ADDR— HTTP listen address (default127.0.0.1:8081)WORKER_ADMIN_TLS_CERT,WORKER_ADMIN_TLS_KEY,WORKER_ADMIN_TLS_CA— mTLS server cert/key and client CA used to verify the UI.WORKER_ADMIN_JOB_TARBALL_DIR— local root for artifact downloads; must match the worker-service setting when gateway and worker are split.WORKER_ADMIN_AUDIT_EXPORT_MAX— upper bound on/audit/exportrows (default 5000).WORKER_ADMIN_READ_TIMEOUT,WORKER_ADMIN_WRITE_TIMEOUT— HTTP read/write timeouts.
WORKER_JOB_REPO_ALLOWLIST(comma-separated;*to allow all)WORKER_JOB_TARBALL_ALLOWLIST(comma-separated hostnames for HTTPS tarballs)WORKER_JOB_TARBALL_DIR(root for local tarballs, default/tmp)WORKER_JOB_TARBALL_MAX_BYTES(default 64MiB)WORKER_JOB_TARBALL_TIMEOUT(default 30s)WORKER_JOB_OUTPUT_BYTES(max combined stdout+stderr)
./scripts/gen-admin-certs.sh generates a local CA plus server and client
certs under ./certs/. The script writes:
ca.crt,ca.key— local CA (keep out of production)server.crt,server.key— gateway server certclient.crt,client.key— UI client cert
Wiring:
| Side | File / env var |
|---|---|
| Gateway server cert | WORKER_ADMIN_TLS_CERT=server.crt, WORKER_ADMIN_TLS_KEY=server.key |
| Gateway verifies client | WORKER_ADMIN_TLS_CA=ca.crt |
| UI client cert | WORKER_ADMIN_MTLS_CERT=client.crt, WORKER_ADMIN_MTLS_KEY=client.key |
| UI verifies server | WORKER_ADMIN_MTLS_CA=ca.crt |
For production, replace the self-signed CA with your organization's PKI and mount the files as secrets. The gateway refuses to start if the cert files are unreadable; no plaintext HTTP mode is supported.
- Overview + coordination health
- Queue list, weight update, pause/resume
- DLQ list, detail, replay (bulk or by ID)
- Schedules list, create/pause/run, event log
- Jobs list, create/run, job event log + artifact download
- Audit trail with filters and export (JSONL/JSON/CSV)
- Weights control relative scheduling share across queues.
- Pausing a queue stops durable dequeue for that queue only.
- Creating a queue sets weight + metadata; it does not enqueue tasks.
- Cron specs accept 5 fields (min hour dom mon dow) or
@everysyntax. - Run now triggers a single execution without changing the schedule.
- Pause keeps the schedule registered but stops new runs.
- Jobs can pull from a Git tag, an HTTPS tarball URL, or a local tarball path.
- Tarball URLs must be allowlisted with
WORKER_JOB_TARBALL_ALLOWLIST. - Tarball paths are resolved relative to
WORKER_JOB_TARBALL_DIR. - Provide
tarball_sha256to validate archive integrity. - Use
/jobs/{name}/artifact/metato fetch size + SHA256 without downloading;/jobs/{name}/artifactstreams the tarball or 302-redirects to a remote URL when the source is off-host. - Use the Crash-test preset in the Jobs form to populate a local tarball job
that fails by default for validation. Build the tarball with
__examples/job_runner_dummy/create-tarball.shso the Dockerfile is at the root. - Command overrides the image entrypoint; env keys come from the worker
service. Use
KEY=VALUEto pass explicit overrides per job. - Output is truncated to the configured max bytes for safety.
By default, job events are held in memory. To retain job events across restarts, configure the worker-service file store:
WORKER_JOB_EVENT_DIR(required, e.g./var/lib/go-worker/job-events)WORKER_JOB_EVENT_MAX_ENTRIES(per key; default 10000)WORKER_JOB_EVENT_CACHE_TTL(e.g.5s; default 10s)
Mount a persistent volume at WORKER_JOB_EVENT_DIR in production so restarts
do not wipe the history. The storage interface is pluggable; future backends
can target object storage (S3) without changing the UI contract.
Schedule events follow a different path: when the worker uses a durable backend, schedule execution history is recorded in that backend and is therefore shared across worker instances. The in-memory fallback only applies when no durable admin backend is available.
Audit events are durable-backend-first (Redis LPUSH + LTRIM) and can be
archived to disk via WORKER_ADMIN_AUDIT_ARCHIVE_DIR when enabled on the
worker service.
- Replay is at-least-once; ensure handlers are idempotent.
- Use filters to narrow by queue, handler, or search text.
- DLQ detail exposes payload size, metadata, and last error.
- Bulk replay (
/dlq/replay) and replay-by-ID (/dlq/replay/ids) both require theX-Admin-Approvalheader to be forwarded to the backend.
- Gateway unreachable: confirm the gateway URL and mTLS cert paths in
the container. Curl the gateway with
--cert,--key,--cacertto isolate whether the issue is TLS or routing. - 403 / Unauthenticated responses: verify the client cert is signed by
the CA that the gateway trusts (
WORKER_ADMIN_TLS_CA). - No data: ensure the worker service is running and handlers are registered.
- Events stale: SSE blocked by proxy; UI falls back to polling every
15s. Check
X-Accel-Buffering: nois preserved end-to-end and that no proxy is bufferingtext/event-stream. - Last-Event-ID replay empty: the in-memory ring holds only the last 300 events; clients offline longer than that cannot fully catch up via replay and will see current state only.
- Job events missing after restart: set
WORKER_JOB_EVENT_DIRso events persist. - Schedule events only show local runs: use a durable backend; shared schedule history is persisted through the durable admin backend, not the local process.
- Artifact download 404: the gateway's
WORKER_ADMIN_JOB_TARBALL_DIRmust match the worker-service value when the two run in separate processes; otherwise the gateway cannot locate tarballs produced by the worker. - Approval rejected: the backend expects
X-Admin-Approvalon every destructive endpoint marked above: DLQ replay, DLQ replay-by-ID,/jobs/{name}/run,/schedules/{name}/run,/queues/{name}/pause,/schedules/pause, and/schedules/{name}/pause. Make sure the UI and any direct API clients forward the header; the gateway propagates it to the backend as gRPC metadatax-admin-approvalbut does not generate one on its own.