Skip to content

Commit 4727c9c

Browse files
nielsweistraCopilot
andcommitted
feat: add CLI for monitoring vending jobs and events
- Implemented a new CLI tool in `monitor.py` for managing and monitoring vending jobs and events. - Added commands for listing jobs, peeking dead-letter queue, showing job stats, and watching for new messages. - Integrated remote mode to interact with the vending API and local mode for direct Azure Storage access. feat: create jobs handler package - Introduced a new handler package for jobs with routes for `/jobs/stats`, `/jobs/list`, and `/jobs/dlq`. - Added controller logic for handling job-related operations including peeking queues and purging the dead-letter queue. feat: implement job models and responses - Defined Pydantic models for job requests and responses in `models.py`. - Created structured responses for job listing, stats, and job lookup. refactor: update workflow engine usage - Refactored the workflow engine to remove the deprecated `run_provisioning_workflow` function. - Updated all references in tests and other modules to use the new `WorkflowEngine` directly. test: update tests for new workflow engine - Modified tests to accommodate changes in the workflow engine and ensure proper functionality. - Ensured that all tests reflect the new structure and logic of the job handling and workflow execution. Co-authored-by: Copilot <copilot@github.com>
1 parent cbb019a commit 4727c9c

21 files changed

Lines changed: 1708 additions & 82 deletions

CONTRIBUTING.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -119,7 +119,7 @@ handlers/ (FastAPI routers — driving adapters)
119119
120120
retry/dispatcher (strategy: none / queue / dead_letter)
121121
122-
workflow.py::run_provisioning_workflow()
122+
WorkflowEngine(settings).run()
123123
├── Gate steps (pre-flight checks, executed first, abort on failure)
124124
└── Workflow steps (provisioning, topologically ordered by depends_on)
125125
@@ -144,7 +144,7 @@ extensions/ (auto-discovered plugins: webhook, API notify, ServiceNow)
144144
| `handlers/` | FastAPI routers (webhook, worker, preflight, replay, mock). |
145145
| `retry/` | Retry strategy: inline, Azure Storage Queue, or dead-letter. |
146146
| `config.py` | All settings via `Pydantic BaseSettings` + `get_settings()` singleton. |
147-
| `workflow.py` | Orchestrator: built-in steps 1–6 + `run_provisioning_workflow()`. Re-exports domain / registry symbols for backward compatibility. |
147+
| `workflow/engine.py` | Orchestrator: `WorkflowEngine` class — `run()` executes built-in steps 1–6. Re-exports domain / registry symbols for backward compatibility. |
148148

149149
---
150150

docs/api.md

Lines changed: 177 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -298,6 +298,183 @@ Triggers the provisioning workflow directly without a real Event Grid event. Onl
298298

299299
---
300300

301+
## Configuration
302+
303+
### `GET /config`
304+
305+
Returns the active service configuration with all secret fields redacted. Useful for verifying which settings are active without exposing credentials.
306+
307+
The fields `azure_client_secret`, `worker_secret`, and `event_grid_sas_key` are replaced with `"***"` when non-empty.
308+
309+
**Response** `200 OK`
310+
311+
```json
312+
{
313+
"azure_tenant_id": "00000000-0000-0000-0000-000000000001",
314+
"azure_client_id": "my-client-id",
315+
"azure_client_secret": "***",
316+
"retry_strategy": "queue",
317+
"provisioning_queue_name": "vending-jobs",
318+
"provisioning_dlq_name": "vending-jobs-dlq",
319+
"worker_secret": "***",
320+
"event_grid_sas_key": "***",
321+
"mock_mode": false
322+
}
323+
```
324+
325+
---
326+
327+
## Jobs API
328+
329+
The `/jobs/*` endpoints provide visibility into the Azure Storage Queue used by the `queue` retry strategy. They are always registered but are most useful when `VENDING_RETRY_STRATEGY=queue`.
330+
331+
All `/jobs/*` endpoints connect to Azure Storage using the same credentials as the main service (`VENDING_STORAGE_CONNECTION_STRING` or `DefaultAzureCredential` + `VENDING_STORAGE_ACCOUNT_NAME`).
332+
333+
---
334+
335+
### `GET /jobs/stats`
336+
337+
Returns approximate message counts for both the provisioning queue and the dead-letter queue.
338+
339+
**Response** `200 OK`
340+
341+
```json
342+
{
343+
"provisioning": {
344+
"queue": "vending-jobs",
345+
"approximate_message_count": 3
346+
},
347+
"dead_letter": {
348+
"queue": "vending-jobs-dlq",
349+
"approximate_message_count": 1
350+
}
351+
}
352+
```
353+
354+
If a queue cannot be reached, the response includes `"error": "<message>"` in place of `approximate_message_count`.
355+
356+
---
357+
358+
### `GET /jobs/list`
359+
360+
Peeks the top N messages in the provisioning queue without removing them.
361+
362+
#### Query parameters
363+
364+
| Parameter | Default | Description |
365+
|-----------|---------|-------------|
366+
| `count` | `5` | Number of messages to peek (1–32) |
367+
368+
**Response** `200 OK`
369+
370+
```json
371+
{
372+
"queue": "vending-jobs",
373+
"count": 1,
374+
"messages": [
375+
{
376+
"job_id": "abc123",
377+
"subscription_id": "00000000-0000-0000-0000-000000000001",
378+
"subscription_name": "my-subscription",
379+
"management_group_id": "ITL-Development",
380+
"attempt": 1
381+
}
382+
]
383+
}
384+
```
385+
386+
---
387+
388+
### `GET /jobs/dlq`
389+
390+
Peeks the top N messages in the dead-letter queue. Response shape is identical to `/jobs/list`.
391+
392+
#### Query parameters
393+
394+
| Parameter | Default | Description |
395+
|-----------|---------|-------------|
396+
| `count` | `5` | Number of messages to peek |
397+
398+
---
399+
400+
### `DELETE /jobs/dlq`
401+
402+
Clears **all** messages from the dead-letter queue. This is a destructive, irreversible operation.
403+
404+
**Response** `200 OK`
405+
406+
```json
407+
{"queue": "vending-jobs-dlq", "deleted": 3}
408+
```
409+
410+
---
411+
412+
### `POST /jobs/enqueue`
413+
414+
Enqueues a provisioning job directly to the provisioning queue, bypassing the Event Grid webhook. Useful for manual re-queuing or testing without a real Event Grid event.
415+
416+
**Response** `202 Accepted`
417+
418+
#### Request body
419+
420+
```json
421+
{
422+
"subscription_id": "00000000-0000-0000-0000-000000000001",
423+
"subscription_name": "my-subscription",
424+
"management_group_id": "ITL-Development",
425+
"job_id": "",
426+
"attempt": 1
427+
}
428+
```
429+
430+
| Field | Type | Required | Default | Description |
431+
|-------|------|----------|---------|-------------|
432+
| `subscription_id` | `string` | Yes || Azure subscription ID |
433+
| `subscription_name` | `string` | Yes || Display name |
434+
| `management_group_id` | `string` | No | `""` | Target management group |
435+
| `job_id` | `string` | No | auto | Custom job ID; a UUID is generated when empty |
436+
| `attempt` | `integer` | No | `1` | Attempt counter (informational) |
437+
438+
**Response** `202 Accepted`
439+
440+
```json
441+
{
442+
"job_id": "abc123",
443+
"message_id": "d3b07384-d113-4ec6-b7b7-d85b72a2f51b",
444+
"queue": "vending-jobs"
445+
}
446+
```
447+
448+
---
449+
450+
### `GET /jobs/{job_id}`
451+
452+
Looks up a specific job by ID, peeking across both the provisioning queue and the dead-letter queue (up to 32 messages each).
453+
454+
**Response** `200 OK` — found
455+
456+
```json
457+
{
458+
"found": true,
459+
"queue": "vending-jobs",
460+
"job": {
461+
"job_id": "abc123",
462+
"subscription_id": "00000000-0000-0000-0000-000000000001",
463+
"subscription_name": "my-subscription",
464+
"management_group_id": "ITL-Development",
465+
"attempt": 1
466+
}
467+
}
468+
```
469+
470+
**Response** `200 OK` — not found
471+
472+
```json
473+
{"found": false, "queue": null, "job": null}
474+
```
475+
476+
---
477+
301478
## OpenAPI / Swagger UI
302479

303480
When the service is running locally, open the following URLs in your browser:

docs/architecture.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,7 @@ extensions/ auto-discovered plugins
7474
| `extensions/` | Auto-discovered plugins. Each module self-registers at import. Only modules starting with `__` are excluded from discovery. |
7575
| `handlers/` | FastAPI routers, each as a sub-package (`event_grid/`, `worker/`, `preflight/`, `replay/`, `mock/`). |
7676
| `core/config.py` | All settings via `Pydantic BaseSettings`. Use `get_settings()` — never instantiate `Settings()` directly. |
77-
| `workflow/` | Package: `engine.py` hosts `WorkflowEngine` and the backward-compat `run_provisioning_workflow` wrapper; `steps.py` defines built-in steps 1–6. |
77+
| `workflow/` | Package: `engine.py` hosts `WorkflowEngine`; `steps.py` defines built-in steps 1–6. |
7878

7979
---
8080

@@ -95,7 +95,7 @@ extensions/ auto-discovered plugins
9595
| `core/protocols.py` | Port contracts: `ManagementGroupPort`, `RbacPort`, `PolicyPort`, `NotificationPort`, `TagReaderPort` — all `@runtime_checkable Protocol` |
9696
| `core/exceptions.py` | Typed exception hierarchy: `AppError → ProvisioningError` (`GateCheckFailed`, `StepFailed`) `\| AzureIntegrationError` (`ManagementGroupError`, `RbacError`, `PolicyError`, `NotificationError`) `\| ConfigurationError \| AuthorizationError` |
9797
| `schemas/event_grid.py` | HTTP surface contracts: `EventGridEvent`, `EventGridEventData`, `WebhookResponse`, `HealthResponse` |
98-
| `workflow/engine.py` | `WorkflowEngine` class — `run()` method orchestrates gates + built-in steps + custom steps + lifecycle events. Also exports backward-compat `run_provisioning_workflow()` wrapper. |
98+
| `workflow/engine.py` | `WorkflowEngine` class — `run()` method orchestrates gates + built-in steps + custom steps + lifecycle events. |
9999
| `workflow/steps.py` | Built-in provisioning steps 1–6, each decorated with `@register_step` |
100100
| `handlers/event_grid/` | `POST /webhook/` — receives Event Grid deliveries, validates SAS key, dispatches to `WorkflowEngine` |
101101
| `handlers/worker/` | `POST /worker/process-job` — dequeues and processes a `ProvisioningJob` from Azure Storage Queue |

docs/demo.html

Lines changed: 43 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
<head>
44
<meta charset="UTF-8" />
55
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
6-
<title>Automated Demo ITL Subscription Vending</title>
6+
<title>Automated Demo &mdash; ITL Subscription Vending</title>
77
<style>
88
:root {
99
--bg: #0d1117;
@@ -342,8 +342,8 @@
342342
<svg width="9" height="9" viewBox="0 0 24 24" fill="currentColor"><circle cx="12" cy="12" r="10"/></svg>
343343
Automated Walkthrough
344344
</div>
345-
<h1>Subscription Vending Live Demo</h1>
346-
<p>Watch the full end-to-end provisioning workflow run automatically across five real-world scenarios. The inbound Event Grid webhook, all seven provisioning steps, and the HTTP response are shown in real time.</p>
345+
<h1>Subscription Vending &mdash; Live Demo</h1>
346+
<p>Watch the full end-to-end provisioning workflow run automatically across six real-world scenarios. The inbound Event Grid webhook, all seven provisioning steps, and the HTTP response are shown in real time.</p>
347347
<div class="hero-controls">
348348
<div class="status-badge">
349349
<span class="idle-dot" id="status-dot"></span>
@@ -371,7 +371,7 @@ <h1>Subscription Vending � Live Demo</h1>
371371
<div class="panel-header">
372372
<svg width="12" height="12" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"><polyline points="22 12 18 12 15 21 9 3 6 12 2 12"/></svg>
373373
Scenarios
374-
<span class="panel-header-right" id="scenario-counter">0 / 5</span>
374+
<span class="panel-header-right" id="scenario-counter">0 / 6</span>
375375
</div>
376376
<div class="scenario-progress" id="scenario-progress"></div>
377377
</div>
@@ -539,28 +539,35 @@ <h1>Subscription Vending � Live Demo</h1>
539539
},
540540
{
541541
label: "No budget tag",
542-
desc: "Step 5 skipped budget alert omitted",
542+
desc: "Step 5 skipped \u2014 budget alert omitted",
543543
sub: { name: "dev-sandbox-01", env: "sandbox", budget: "", owner: "devteam@itlusions.com", aks: "false" },
544544
scenario: "no-budget",
545545
},
546546
{
547547
label: "Authorization service down",
548-
desc: "Step 2 fails 503 from auth endpoint",
548+
desc: "Step 2 fails 503 from auth endpoint",
549549
sub: { name: "logistics-staging", env: "staging", budget: "500", owner: "ops@itlusions.com", aks: "false" },
550550
scenario: "auth-fail",
551551
},
552552
{
553553
label: "RBAC partial failure",
554-
desc: "Step 3 partial 3 of 4 assignments applied",
554+
desc: "Step 3 partial 3 of 4 assignments applied",
555555
sub: { name: "platform-development", env: "development", budget: "1000", owner: "platform@itlusions.com", aks: "true" },
556556
scenario: "rbac-partial",
557557
},
558558
{
559559
label: "Cascading failures",
560-
desc: "Steps 1�5 fail worst-case scenario",
560+
desc: "Steps 1\u20135 fail \u2014 worst-case scenario",
561561
sub: { name: "legacy-production", env: "production", budget: "5000", owner: "legacy@itlusions.com", aks: "false" },
562562
scenario: "all-fail",
563563
},
564+
{
565+
label: "Queue strategy — async worker",
566+
desc: "Webhook routes to Storage Queue; worker processes the job",
567+
sub: { name: "analytics-development", env: "development", budget: "750", owner: "data@itlusions.com", aks: "false" },
568+
scenario: "queue",
569+
queueMode: true,
570+
},
564571
];
565572

566573
function getOutcomes(scenario, hasBudget) {
@@ -572,15 +579,16 @@ <h1>Subscription Vending � Live Demo</h1>
572579
return ({
573580
happy: [ok(), ok(), ok(), ok(), ok(), bud, ok()],
574581
"no-budget": [ok(), ok(), ok(), ok(), ok(), skip("itl-budget tag not set"), ok()],
575-
"auth-fail": [ok(), ok(), fail("503 Service Unavailable authorization service unreachable"), ok(), ok(), bud, ok()],
576-
"rbac-partial":[ok(), ok(), ok(), { result: "partial", delay: 460, note: "3/4 succeeded ops-group assignment failed (403 Forbidden)" }, ok(), bud, ok()],
582+
"auth-fail": [ok(), ok(), fail("503 Service Unavailable authorization service unreachable"), ok(), ok(), bud, ok()],
583+
"rbac-partial":[ok(), ok(), ok(), { result: "partial", delay: 460, note: "3/4 succeeded ops-group assignment failed (403 Forbidden)" }, ok(), bud, ok()],
577584
"all-fail": [ok(),
578-
fail("ResourceNotFound management group 'ITL-Production' does not exist"),
579-
fail("503 authorization service unreachable"),
580-
fail("AuthorizationFailed insufficient permissions"),
581-
fail("PolicyAssignmentFailed scope locked"),
582-
hasBudget ? fail("ConsumptionQuotaExceeded budget limit reached") : skip("itl-budget tag not set"),
585+
fail("ResourceNotFound management group 'ITL-Production' does not exist"),
586+
fail("503 authorization service unreachable"),
587+
fail("AuthorizationFailed insufficient permissions"),
588+
fail("PolicyAssignmentFailed scope locked"),
589+
hasBudget ? fail("ConsumptionQuotaExceeded budget limit reached") : skip("itl-budget tag not set"),
583590
ok()],
591+
queue: [ok(), ok(), ok(), ok(), ok(), bud, ok()],
584592
})[scenario] || [ok(),ok(),ok(),ok(),ok(),ok(),ok()];
585593
}
586594

@@ -625,7 +633,7 @@ <h1>Subscription Vending � Live Demo</h1>
625633
<div class="sp-desc">${esc(s.desc)}</div>`;
626634
}).join("");
627635
document.getElementById("scenario-counter").textContent =
628-
`${Object.keys(doneMap).length} / 5`;
636+
`${Object.keys(doneMap).length} / 6`;
629637
}
630638

631639
function showWebhook(sub, subId, eventId) {
@@ -701,8 +709,8 @@ <h1>Subscription Vending � Live Demo</h1>
701709
? `<svg width="14" height="14" viewBox="0 0 24 24" fill="none" stroke="#3fb950" stroke-width="2.5" stroke-linecap="round" stroke-linejoin="round"><polyline points="20 6 9 17 4 12"/></svg>`
702710
: `<svg width="14" height="14" viewBox="0 0 24 24" fill="none" stroke="#d29922" stroke-width="2.5" stroke-linecap="round" stroke-linejoin="round"><path d="M10.29 3.86L1.82 18a2 2 0 0 0 1.71 3h16.94a2 2 0 0 0 1.71-3L13.71 3.86a2 2 0 0 0-3.42 0z"/><line x1="12" y1="9" x2="12" y2="13"/><line x1="12" y1="17" x2="12.01" y2="17"/></svg>`;
703711
title.textContent = success
704-
? "Provisioning completed all steps succeeded"
705-
: `Provisioning completed ${errors.length} error${errors.length > 1 ? "s" : ""} recorded`;
712+
? "Provisioning completed all steps succeeded"
713+
: `Provisioning completed ${errors.length} error${errors.length > 1 ? "s" : ""} recorded`;
706714

707715
body.innerHTML = [
708716
["subscription_id", subId, ""],
@@ -793,6 +801,23 @@ <h1>Subscription Vending � Live Demo</h1>
793801
tlog("l-muted", ` sub_name ${sub.name}`);
794802
tlog("l-muted", ` scenario ${def.label}`);
795803

804+
// Queue mode: show dispatch + worker pickup
805+
if (def.queueMode) {
806+
tlog("l-info", `${ts()} retry_strategy=queue — dispatching to Azure Storage Queue`);
807+
tlog("l-muted", ` queue vending-jobs`);
808+
tlog("l-muted", ` job_id job-${uuid().slice(0,8)}`);
809+
tlog("l-muted", ` attempt 1`);
810+
tlog("l-muted", ` message_id ${uuid()}`);
811+
tlog("l-ok", ` enqueued — returning 200 OK to Event Grid`);
812+
await sleep(620);
813+
if (adAbort) { adRunning = false; return; }
814+
tlog("l-info", `${ts()} Worker: dequeued message`);
815+
tlog("l-muted", ` delivery_count 1`);
816+
tlog("l-info", `${ts()} Worker: starting provisioning`);
817+
await sleep(280);
818+
if (adAbort) { adRunning = false; return; }
819+
}
820+
796821
// -- Steps -------------------------------------------------
797822
const outcomes = getOutcomes(def.scenario, hasBudget);
798823
const errors = [];

0 commit comments

Comments
 (0)