This contributor guide explains the current Control Plane client protocol for building a web dashboard, VS Code extension, CI/CD integration, or other client.
Enforced client-side constraints live in cli-tui-rules.md and the relevant ADRs. This guide explains the protocol and expected usage patterns.
All client communication uses the Control Plane HTTP API exclusively. There is no SDK, no shared-memory bus, and no direct file-system access. The control plane is always the single source of truth.
- Prerequisites
- Authentication
- Startup and Connection Flow
- Submitting a Job
- Bootstrap: Loading Initial State
- Three Live Data Channels
- Reconnect Protocol
- Job Lifecycle and Management
- Data Schemas
- Complete Example Flows
- Error Handling
- Implementation Notes
Every request targets the Control Plane. There is no /api/v1 prefix.
| Deployment | Base URL |
|---|---|
| Standalone (zip install, single machine) | http://localhost:5100 |
| Dev / source build (Aspire-managed) | http://localhost:5100 |
| Cloud | Your deployment's HTTPS endpoint |
| Header | Required | Value |
|---|---|---|
Authorization |
Yes (Entra ID) | Bearer <token> |
Accept |
No | Default application/json; use text/event-stream for SSE |
Last-Event-ID |
Conditional | Required on SSE reconnect — see §7 |
Two schemes are supported. Which is active is a deployment configuration choice (Auth:Scheme); the client cannot choose.
Rules:
- MUST acquire a Bearer token scoped to the control plane's Entra App Registration (
Auth:ClientId). - MUST include
Authorization: Bearer <token>on every request. - MUST refresh the token before expiry. Tokens expire after ~1 hour.
- MUST NOT cache tokens across users.
Supported OAuth 2.0 flows: device code, authorization code, client credentials, managed identity.
Rules:
- MUST enable
Negotiate(Kerberos/NTLM) on the HTTP client (UseDefaultCredentials = true). - MUST NOT send an
Authorizationheader manually. - The client machine MUST be domain-joined.
The control plane derives identity from validated token claims. These drive job visibility — a client never sends a filter for its own identity.
| Scheme | tenant_id |
User identity |
|---|---|---|
| Entra ID | tid claim (GUID) |
oid claim (object ID) |
| Windows AD | AD domain FQDN | AD SID |
IF caller is ControlPlaneAdmin:
return all jobs
(optional: filter by ?tenantId=)
ELSE:
return jobs WHERE tenant_id = caller.tid
AND (visibility = 'Tenant'
OR submitted_by_oid = caller.oid)
A job that exists but is not visible to the caller returns 403 Forbidden — not 404. Do not treat 403 as "not found".
This sequence is the current client contract. Execute it in this order for every job view to avoid stale or incomplete UI state.
sequenceDiagram
participant C as Client
participant CP as Control Plane
C->>CP: 1. GET /jobs/{jobId}
CP-->>C: 200 job state (or 403 / 404)
note over C: check: job exists and caller has access
loop Poll until Running (2 s, max 60 s)
C->>CP: 2. GET /jobs/{jobId}
CP-->>C: state == Queued / Leased
end
C->>CP: 3. GET /jobs/{jobId}/bootstrap
CP-->>C: 200 { Tasks, Snapshot, Metrics, LastEventSequence }
note over CP: always 200 — fields may be null
note over C: 4. Apply bootstrap data (see §5)
C->>CP: 5. GET /jobs/{jobId}/progress?follow=true<br/>Last-Event-ID: LastEventSequence
CP-->>C: SSE stream open (Channel 1 — live events)
loop Poll every ~5 s
C->>CP: 6. GET /jobs/{jobId}/telemetry
CP-->>C: 200 JobMetrics or 204 No Content
end
CP-->>C: event: job-ended
note over C: 7. Cancel both channels<br/>GET /jobs/{jobId} → terminal state
Step 1 — Verify access
GET /jobs/{jobId}
200→ proceed to step 2403→ show "access denied"; stop404→ show "job not found"; stop
Step 2 — Wait for Running
Poll GET /jobs/{jobId} with 2 s interval, max 60 s, until state is Running. If the job reaches a terminal state (Completed, Failed, Cancelled) without entering Running, render terminal state immediately.
Step 3 — Fetch bootstrap (mandatory)
GET /jobs/{jobId}/bootstrap
Always returns 200. Fields inside the response body are nullable — see §5 for how to handle each. This call MUST happen before opening the SSE stream.
Step 4 — Apply bootstrap data (mandatory before SSE)
Populate task list, project table, and counter display from the bootstrap response. See §5.
Step 5 — Open SSE stream
GET /jobs/{jobId}/progress?follow=true
Accept: text/event-stream
Last-Event-ID: <bootstrap.LastEventSequence>
The Last-Event-ID header causes the control plane to replay all ring-buffered events with sequence > LastEventSequence, then stream new ones. This fills any gap between the bootstrap fetch and the SSE connect. MUST be included on every initial connect, not only on reconnect.
Step 6 — Begin metrics polling
GET /jobs/{jobId}/telemetry
Poll every ~5 s. 204 No Content = metrics not yet available; display zeros and retry. Stop polling when SSE emits event: job-ended.
Step 7 — Handle terminal state
On event: job-ended: cancel Channel 1 (SSE) and Channel 2 (polling), call GET /jobs/{jobId} once to read the final state, and render the terminal view.
{
"jobId": "550e8400-e29b-41d4-a716-446655440000",
"configVersion": "2.0",
"kind": "Inventory",
"connectors": ["AzureDevOps"],
"package": {
"packageUri": "file:///D:/exports/run-001",
"createPackage": false
},
"diagnostics": {
"minimumLevel": "Information"
},
"resume": {
"mode": "Auto"
},
"configPayload": "{ ...raw JSON of migration-config.json... }"
}| Field | Type | Required | Notes |
|---|---|---|---|
jobId |
UUID v4 | Yes | Generated by the client. Must be globally unique. |
configVersion |
string | Yes | Always "2.0" for new jobs. |
kind |
enum | Yes | Inventory, Export, Prepare, Import, Migrate, Dependencies |
connectors |
string[] | Yes | AzureDevOps, TeamFoundationServer, Simulated. Empty array = any agent. |
package.packageUri |
URI string | Yes | Local: file:///D:/path. Cloud: https://<account>.blob.core.windows.net/.... MUST be a URI — bare paths are rejected. |
package.createPackage |
bool | No | true = pack after export / unpack before import. Default false. |
diagnostics.minimumLevel |
string | No | Trace, Debug, Information, Warning, Error, Critical. Default Information. |
resume.mode |
enum | No | Auto (default) = resume from cursor. ForceFresh = delete module cursors and restart. |
configPayload |
string | No | Raw JSON of migration-config.json. Carries all source/target endpoints and credentials. |
kind semantics:
| Kind | Requires Source | Requires Target | Package Access |
|---|---|---|---|
Inventory |
Yes | No | Writes to |
Export |
Yes | No | Writes to |
Prepare |
No | Yes | Reads from, writes validation artefacts |
Import |
No | Yes | Reads from |
Migrate |
Yes | Yes | Writes then reads |
Dependencies |
Yes | No | Writes to |
POST /jobs
Content-Type: application/json
Authorization: Bearer <token>
{ ...job body... }
Response: 201 Created, Location: /jobs/{jobId}.
The job enters Queued state immediately. An available Migration Agent picks it up within seconds.
After receiving 201, execute the startup flow in §3.
The bootstrap response is the single most important call for correctly initialising a live view. It provides a consistent atomic snapshot that prevents empty tables and zero counters on first render — including for jobs that are resuming from a previous run.
{
"tasks": {
"tasks": [ ...JobTask[] ... ],
"pushedAt": "2026-05-06T10:00:00Z",
"forKind": "Inventory"
},
"snapshot": {
"timestamp": "2026-05-06T10:05:00Z",
"organisations": [
{
"url": "https://dev.azure.com/myorg",
"name": "myorg",
"projects": [
{
"name": "ProjectA",
"status": "Completed",
"discovery": {
"inventory": {
"workItemsTotal": 1500,
"revisionsTotal": 12000,
"repositoriesTotal": 3
}
},
"migration": null
}
]
}
]
},
"metrics": {
"timestamp": "2026-05-06T10:05:00Z",
"scope": { "elapsedMs": 300000 },
"discovery": {
"inventory": {
"workItemsTotal": 1500,
"revisionsTotal": 12000,
"repositoriesTotal": 3,
"checkpointsSaved": 47
}
},
"migration": null
},
"lastEventSequence": 4823
}| Field | Rule |
|---|---|
tasks |
MUST populate the task/module list once from bootstrap. Use status, knownTotal, completedCount as the initial task-row state. After bootstrap, treat later SSE ProgressEvent records carrying taskId + taskStatus as partial patches to the stored task row; merge knownTotal and completedCount when present. If tasks is null: wait for the first bootstrap response that includes them. |
snapshot |
MUST pre-populate per-project rows (org → project → counters + status). MUST be applied as insert-only — never overwrite a row already updated by a live SSE event. If snapshot is null or snapshot.Organisations is empty: leave the project table empty; rows appear as SSE events arrive. |
metrics |
MUST seed aggregate counters. Treat as "last known" until the first successful Channel 2 poll. If null: display zeros. |
lastEventSequence |
MUST pass as Last-Event-ID on the SSE subscription. If 0 or absent: open SSE with no Last-Event-ID (stream starts from current position). MUST NOT pass a stale value from a previous session. |
Any bootstrap field may be null under these conditions:
| Condition | Likely null fields |
|---|---|
Job is Queued or Leased (agent not yet running) |
tasks, snapshot, metrics, lastEventSequence = 0 |
| Agent started but has not yet completed the plan build | tasks, snapshot, metrics |
| Agent started but no project has completed yet | snapshot |
| Control plane restarted (in-memory state cleared) | tasks, snapshot, metrics |
null means "data not yet available". Treat it as empty initial state and wait for SSE events to fill in the gaps. MUST NOT treat null as an error.
Maintain these channels for the lifetime of a job view. They are independent and MUST run concurrently.
| Channel | Endpoint | Mechanism | What it carries |
|---|---|---|---|
| Channel 1 | GET /jobs/{jobId}/progress?follow=true |
SSE push | Real-time ProgressEvent records — stage transitions, cursor positions, task status updates |
| Channel 2 | GET /jobs/{jobId}/telemetry |
HTTP polling (~5 s) | Aggregate JobMetrics — all counter values shown in the UI |
| Channel 3 | GET /jobs/{jobId}/snapshot |
HTTP polling (~5 min) or on-demand | Per-org/project JobSnapshot — high-cardinality breakdown |
Channel 3 is optional. The bootstrap response already contains the latest snapshot; only use Channel 3 if your UI maintains a persistent per-project table that needs periodic refresh.
GET /jobs/{jobId}/progress?follow=true
Accept: text/event-stream
Last-Event-ID: <lastEventSequence>
id: 4824
data: {"module":"WorkItems","stage":"ExportRevisions","message":"Exported 312/1500","timestamp":"2026-05-06T10:05:01Z","eventSequence":4824,"taskId":"capture.workitems.myorg.projecta","taskStatus":"Running","knownTotal":1500,"completedCount":312,"lastCheckpointAt":"2026-05-06T10:04:55Z","metrics":null}
: heartbeat
event: job-ended
data: {"state":"Completed"}
id:line =eventSequence. MUST store as the newlastEventSequenceon every received event.: heartbeatlines have noid:ordata:. MUST be discarded silently. Do not treat as an error or reset the reconnect timer.event: job-ended= terminal state signal. Parsedataas{"state":"<value>"}. Cancel both channels and do a finalGET /jobs/{jobId}.- All other events are data-only (no explicit
event:field). DeserialisedataasProgressEvent.
| Field | UI action |
|---|---|
module + stage |
Update the module row's stage label |
taskId + taskStatus |
Update the matching task's status indicator |
knownTotal |
Patch the matching task row's knownTotal when present |
completedCount |
Update task progress bar (completedCount / task.knownTotal) |
lastCheckpointAt |
Show "last saved" timestamp |
message |
Append to the live log / diagnostic panel |
metrics |
See critical note below — MUST NOT use for counter display |
ProgressEvent.Metrics is only populated by the TFS subprocess (.NET 4.8.1). For all .NET 10 jobs it is always null. Reading counters from this field silently displays zeros for every .NET 10 job. MUST use Channel 2 (GET /jobs/{jobId}/telemetry) for all counter values.
GET /jobs/{jobId}/telemetry
200 + JobMetrics→ update all counter displays204 No Content→ display zeros, retry on next poll cycle; MUST NOT treat as an error
Poll every ~5 seconds. Stop when Channel 1 emits event: job-ended.
GET /jobs/{jobId}/snapshot
200 + JobSnapshot→ refresh per-project rows204 No Content→ no snapshot yet; retain existing rows
Poll every ~5 minutes, or on-demand when a user navigates to a project detail view. MUST apply as insert-or-update (not delete-and-replace) to avoid flickering.
The reconnect sequence is the same as the startup sequence in §3 with one extra rule: if any in-memory state is being reused, apply bootstrap as insert-only.
When a client reconnects (browser tab returns, network resumes, explicit user action):
Decision tree:
flowchart TD
A[Client reconnects] --> B[GET /jobs/jobId]
B --> C{Response?}
C -->|403 or 404| D[Show error — stop]
C -->|state = Completed\nFailed or Cancelled| E[Render terminal state — stop]
C -->|state = Running\nLeased or Paused| F[GET /jobs/jobId/bootstrap]
F --> G{In-memory state\nstill valid?}
G -->|Yes — reuse| H[Apply bootstrap insert-only\nnever overwrite SSE-sourced rows]
G -->|No — cleared| I[Apply bootstrap fully]
H --> J[Open SSE\nLast-Event-ID = new LastEventSequence]
I --> J
J --> K[Resume Channel 2 metrics polling]
The Last-Event-ID tells the control plane the highest sequence the client has already processed. The control plane replays all ring-buffered events with sequence > Last-Event-ID. This fills the gap that occurred during disconnection.
Ring buffer capacity: 1000 events. If the client was disconnected long enough for the buffer to have wrapped, some events are permanently gone from the stream. The bootstrap snapshot provides the current authoritative state regardless.
delay = min(1s × 2^attempt, 30s)
Reset attempt to 0 when a reconnect succeeds and the first event or heartbeat is received.
stateDiagram-v2
[*] --> Queued : POST /jobs
Queued --> Leased : Agent picks up
Queued --> Cancelled : POST /cancel
Leased --> Running : Agent starts executing
Running --> Completed : POST /lease/{id}/complete
Running --> Failed : POST /lease/{id}/fail
Running --> Paused : POST /jobs/{id}/pause\n(agent checkpoints + releases lease)
Paused --> Queued : POST /jobs/{id}/resume
Paused --> Cancelled : POST /jobs/{id}/cancel
Completed --> [*]
Failed --> [*]
Cancelled --> [*]
| State | Channels needed | Suggested UI |
|---|---|---|
Queued |
None (poll /jobs/{jobId}) |
Spinner — waiting for agent |
Leased |
None (poll /jobs/{jobId}) |
Spinner — agent starting |
Running |
Channel 1 + Channel 2 | Live progress table and counters |
Paused |
None | Static view + Resume button |
Completed |
None | Final counters; Download logs button |
Failed |
None | Error detail from GET /jobs/{jobId} |
Cancelled |
None | Cancelled message |
| Action | Method | Path | Auth |
|---|---|---|---|
| List jobs | GET |
/jobs |
Caller's jobs + Tenant-visible jobs |
| Get job | GET |
/jobs/{jobId} |
Submitter or admin |
| Pause | POST |
/jobs/{jobId}/pause |
Submitter or admin |
| Resume | POST |
/jobs/{jobId}/resume |
Submitter or admin |
| Cancel | POST |
/jobs/{jobId}/cancel |
Submitter or admin |
| Download logs | GET |
/jobs/{jobId}/logs/download |
Submitter or admin |
| Diagnostics (snapshot) | GET |
/jobs/{jobId}/diagnostics |
Submitter or admin |
| Diagnostics (stream) | GET |
/jobs/{jobId}/diagnostics?follow=true |
Submitter or admin |
GET /jobs accepts optional query parameters:
?tenantId=<guid>— admin-only, filter by tenant?state=<value>— filter by job state
{
module: string // Module name, e.g. "WorkItems"
stage: string // Stage label, e.g. "ExportRevisions"
message: string | null // Human-readable status
timestamp: ISO8601 // UTC event time
eventSequence: long // Monotonic sequence; use as Last-Event-ID
lastCheckpointAt: ISO8601 | null // UTC time of last checkpoint save
nextCheckpointDueAt: ISO8601 | null // Estimated next checkpoint time; null = per-item (always safe)
taskId: string | null // JobTask.Id this event is attributed to
taskStatus: JobTaskStatus | null // New task status; null if not a task-lifecycle event
knownTotal: long | null // Authoritative total for the task when runtime/package state knows it
completedCount: long | null // Running completed count for the task
metrics: JobMetrics | null // ⚠️ MUST NOT use for display — see §6 Channel 1
}{
timestamp: ISO8601
scope: {
elapsedMs: long // Elapsed job time in milliseconds
}
discovery: { // Non-null for Inventory / Dependencies jobs
inventory: {
workItemsTotal: long
revisionsTotal: long
repositoriesTotal: long
checkpointsSaved: long
} | null
} | null
migration: { // Non-null for Export / Import / Migrate jobs
workItems: { ... } | null
attachments: { ... } | null
// ... other migration counters
} | null
}Exactly one of discovery or migration is non-null per job.
{
timestamp: ISO8601
organisations: OrgSnapshot[]
}
OrgSnapshot {
url: string // e.g. "https://dev.azure.com/myorg"
name: string // Display name
projects: ProjectSnapshot[]
}
ProjectSnapshot {
name: string
status: "Pending" | "InProgress" | "Completed" | "Failed"
discovery: DiscoveryCounters | null // Non-null for discovery jobs
migration: MigrationCounters | null // Non-null for migration jobs
}{
tasks: JobTaskList | null // null until agent pushes plan
snapshot: JobSnapshot | null // null until agent pushes first snapshot
metrics: JobMetrics | null // null until agent pushes first metrics
lastEventSequence: long // 0 if no events yet; pass as Last-Event-ID
}{
tasks: JobTask[]
pushedAt: ISO8601
forKind: "Inventory" | "Export" | "Prepare" | "Import" | "Migrate" | "Dependencies" | null
}{
id: string // "capture.workitems.myorg.projecta"
name: string // "WorkItems Inventory"
taskKind: TaskKind
phase: string | null // Display hint: "export", "import", etc.
organisationUrl: string | null
projectName: string | null
order: int // 0-based execution order
status: JobTaskStatus
knownTotal: long | null // Total items if known from bootstrap or later task-scoped SSE patches
completedCount: long | null // Running count; updated via task-scoped SSE patches
startedAt: ISO8601 | null
completedAt: ISO8601 | null
skipReason: string | null // Non-null only when status = Skipped
dependsOn: string[] | null // Task IDs that must complete first
}TaskKind values: Capture, Analyse, Export, Prepare, Import, Validate
JobTaskStatus values: Pending, Running, Completed, Failed, Skipped
1. Build Job: kind=Inventory, connectors=[AzureDevOps], jobId=<new UUID>
2. POST /jobs → 201 Created
3. Poll GET /jobs/{jobId} every 2 s → wait until state==Running (max 60 s)
4. GET /jobs/{jobId}/bootstrap → tasks=null, snapshot=null, metrics=null,
lastEventSequence=0 (agent just starting)
5. Open SSE: GET /jobs/{jobId}/progress?follow=true (Last-Event-ID: 0)
6. Begin polling GET /jobs/{jobId}/telemetry every 5 s
7. First SSE events: taskId set, taskStatus=Running, stage advancing
8. Channel 2 returns non-null JobMetrics as agent pushes first metrics burst
9. SSE event: job-ended → cancel channels → GET /jobs/{jobId} → state==Completed
1. GET /jobs/{jobId} → state==Running
2. GET /jobs/{jobId}/bootstrap → tasks (updated statuses), snapshot (latest per-project
state), metrics (latest counters), lastEventSequence=9201
3. Rebuild project table from snapshot (insert-only), update task list, seed counters from metrics
4. Open SSE: Last-Event-ID: 9201
→ control plane replays buffered events 9202..9300 (if still in ring buffer), then live stream
5. Resume Channel 2 polling
1. GET /jobs/{jobId} → state==Running
2. GET /jobs/{jobId}/bootstrap → snapshot shows projects completed before pause,
tasks show pre-pause statuses,
lastEventSequence=4823 (no new events yet since resume)
3. Apply bootstrap — project table pre-populated from snapshot (no blank rows)
4. Open SSE: Last-Event-ID: 4823
→ new events (4824+) stream in as agent continues from cursor
5. Begin Channel 2 polling
| HTTP status | Meaning | Required client action |
|---|---|---|
400 Bad Request |
Invalid jobId format or malformed request body |
Fix the request; do not retry |
401 Unauthorized |
Missing or expired token | Refresh token; retry once |
403 Forbidden |
Job exists but caller cannot see it | Show "access denied"; stop; MUST NOT retry with different credentials silently |
404 Not Found |
Job does not exist | Show "not found"; stop |
204 No Content |
Telemetry / snapshot not yet available | Not an error — display zeros; retry on next poll cycle |
503 Service Unavailable |
Control plane starting up | Retry with exponential back-off (start 1 s) |
| SSE connection drop | Network blip | Reconnect with exponential back-off; pass Last-Event-ID — see §7 |
| Heartbeat only, no events for >60 s | Job is Queued or Leased, no activity yet | Normal; do not reconnect |
For human-readable diagnostic logs (not structured counters), subscribe to:
GET /jobs/{jobId}/diagnostics?follow=true
Accept: text/event-stream
Accepts ?level= filter: Trace, Debug, Information, Warning, Error, Critical. Default: Information. This is independent of Channel 1 — a separate SSE stream carrying unstructured log records. The TUI uses this stream for its Log Panel. It is optional for other clients.
Common client mistakes to avoid:
- opening SSE before calling
GET /jobs/{jobId}/bootstrap - reading counter values from
ProgressEvent.Metricsinstead ofGET /jobs/{jobId}/telemetry - treating
204 No Contentfrom/telemetryor/snapshotas a failure instead of an empty current snapshot - treating
403 Forbiddenas if the job does not exist - polling
GET /jobs/{jobId}indefinitely without a timeout or terminal-state check - overwriting live SSE-sourced project rows with a later bootstrap fetch
- inventing a fourth live-data channel instead of using bootstrap, telemetry polling, and SSE
- attempting to connect directly to an in-process sink or read package files instead of using the Control Plane API
- reusing a stale
lastEventSequencefrom an earlier session without first fetching bootstrap - showing
JobMetrics.migrationvalues for discovery jobs orJobMetrics.discoveryvalues for migration jobs
| Document | Purpose |
|---|---|
| docs/control-plane.md | Full API surface, authentication, data store, and authorisation rules |
| docs/cli-guide.md | CLI implementation — reference client using this flow |
| docs/tui-guide.md | TUI implementation — reference client using this flow |
| .agents/30-context/domains/job-lifecycle.md | Job definition schema and field semantics |
| .agents/30-context/domains/telemetry-model.md | Three-channel model — agent-side contract |