Skip to content

Commit aac3f8c

Browse files
docs(prompts): enhance debug prompt with ADO MCP pipeline tools and diagnostic report (#457)
Rework the debug-ado-agentic-workflow.md prompt to: - Add Azure DevOps MCP prerequisites section advising users to set up the @azure-devops/mcp pipelines toolset in their IDE/agent context - Introduce MCP-first automated investigation flow using pipeline tools (get_build_definitions, get_builds, get_build_status, get_build_log, get_build_log_by_id, get_build_changes) to find and analyze failing builds without manual log copy-paste - Add failure classification table mapping timeline records to stage-specific diagnosis sections - Add last-successful-build comparison as first-class regression path - Add standardized diagnostic report template with summary, evidence, environment, analysis, and root cause sections - Make GitHub issue filing mandatory as the final action (priority: GitHub MCP > gh CLI > raw markdown + link) - Scope the agent to investigation and reporting only — no fix proposals - Preserve all existing Stage 1/2/3 diagnosis content as-is - Keep manual fallback flow for when MCP is unavailable Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1 parent 463dafa commit aac3f8c

1 file changed

Lines changed: 212 additions & 24 deletions

File tree

prompts/debug-ado-agentic-workflow.md

Lines changed: 212 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,25 @@
11
# Debug an Azure DevOps Agentic Pipeline
22

3-
You are now in **debug mode** for an `ado-aw` agentic pipeline. Your job is to help the user diagnose why their Azure DevOps agentic pipeline is failing, identify the root cause, and suggest targeted fixes. Work methodically — identify which stage failed first, then drill into stage-specific causes.
3+
You are now in **debug mode** for an `ado-aw` agentic pipeline. Your job is to **investigate** why an Azure DevOps agentic pipeline is failing, **identify the root cause**, and **produce a structured diagnostic report**. You are **not** responsible for proposing fixes, applying changes, or recompiling pipelines — your sole output is the diagnostic report. Work methodically — gather data first, identify which stage failed, then drill into stage-specific causes to find the root cause.
4+
5+
---
6+
7+
## Recommended: Azure DevOps MCP
8+
9+
> **This debugging prompt works best when you have access to the Azure DevOps MCP with the `pipelines` toolset.** This lets you directly query pipeline runs, retrieve build logs, and identify failing steps without asking the user to copy-paste logs manually.
10+
>
11+
> Configure the Azure DevOps MCP server (`@azure-devops/mcp`) in your current IDE or agent environment with the `pipelines` toolset enabled. The exact setup depends on your IDE/agent host — this is for the debugging assistant's local context, **not** for the failing ado-aw pipeline's front matter.
12+
>
13+
> Useful pipeline tools (or equivalents):
14+
> - **Find pipeline definitions**`mcp_ado_pipelines_get_build_definitions`
15+
> - **List recent builds**`mcp_ado_pipelines_get_builds` (filter by `resultFilter`, `statusFilter`, `definitions`)
16+
> - **Get build status/timeline**`mcp_ado_pipelines_get_build_status`
17+
> - **Retrieve full build logs**`mcp_ado_pipelines_get_build_log`
18+
> - **Get a specific step log**`mcp_ado_pipelines_get_build_log_by_id` (with `startLine`/`endLine`)
19+
> - **Get build changes**`mcp_ado_pipelines_get_build_changes`
20+
> - **Get pipeline run details**`mcp_ado_pipelines_get_run`, `mcp_ado_pipelines_list_runs`
21+
>
22+
> If these tools are not available, the [Manual Fallback](#manual-fallback) flow below still works — you just need the user to provide more information.
423
524
---
625

@@ -28,31 +47,141 @@ Additional optional jobs:
2847

2948
## Debugging Flow
3049

31-
Follow this sequence for every debugging session:
50+
### Step 1: Determine Available Tools
51+
52+
Check what tools you have access to:
53+
54+
1. **Azure DevOps MCP** — do you have access to pipeline tools (get builds, get build status, get build logs)? If yes, use the [Automated Investigation](#step-3-automated-investigation-mcp) path. If no, use [Manual Fallback](#manual-fallback).
55+
2. **GitHub MCP** — do you have access to GitHub tools (create issues, search repos)? Note this for the final [Issue Filing](#step-7-issue-filing) step.
56+
3. **Local repository** — can you read the user's local files (agent `.md` source, compiled `.lock.yml`)? This helps verify compilation state.
57+
58+
### Step 2: Establish the Target Run
59+
60+
Even with ADO MCP access, you need minimal context from the user:
61+
62+
- **If the user provided a run URL or build ID** → use it directly.
63+
- **If not** → ask for the ADO organization, project, and pipeline name (or definition ID).
64+
- **If multiple recent failed builds exist** → list them and ask the user which one to investigate. Prefer the most recent failure on the default branch unless the user specifies otherwise.
65+
66+
### Step 3: Automated Investigation (MCP)
67+
68+
If Azure DevOps MCP pipeline tools are available, follow this sequence:
69+
70+
#### 3a. Find the Pipeline Definition
71+
72+
Use `mcp_ado_pipelines_get_build_definitions` to locate the pipeline by name or definition ID.
73+
74+
#### 3b. Find the Failing Build
75+
76+
Use `mcp_ado_pipelines_get_builds` with the definition ID, filtering by `resultFilter: failed`. If the user gave a specific build ID, use that directly with `mcp_ado_pipelines_get_build_status`.
77+
78+
#### 3c. Get the Build Timeline
79+
80+
Use `mcp_ado_pipelines_get_build_status` to retrieve the build timeline. This shows every stage, job, and step with its result. Look for:
81+
82+
- The **first record** with a failed result — this is usually the root cause.
83+
- Any **warning records** immediately preceding the failure.
84+
- **Skipped or cancelled** stages/jobs (which indicate upstream dependencies failed).
85+
- **Queued indefinitely** states (which indicate pool or resource issues).
86+
87+
#### 3d. Classify the Failure
88+
89+
Map the failing timeline record to one of these categories:
90+
91+
| Failed Stage/Job | Category | Jump to |
92+
|-----------------|----------|---------|
93+
| `Setup` | Pre-agent failure | [Setup/Teardown Failures](#setupteardown-failures) |
94+
| `Agent` — download/setup steps | Infrastructure failure | [AWF Container Startup](#awf-container-startup-failures) |
95+
| `Agent` — MCPG/MCP steps | Tool routing failure | [MCPG Issues](#mcp-gateway-mcpg-issues) |
96+
| `Agent` — engine/run step | Agent runtime failure | [Stage 1: Agent Failures](#stage-1-agent-failures) |
97+
| `Detection` | Threat analysis issue | [Stage 2: Detection Failures](#stage-2-detection-failures) |
98+
| `Execution` | Safe output execution issue | [Stage 3: Execution Failures](#stage-3-execution-failures) |
99+
| `Teardown` | Post-execution failure | [Setup/Teardown Failures](#setupteardown-failures) |
100+
| Pipeline queued/cancelled | Resource/authorization issue | [Common Cross-Stage Issues](#common-cross-stage-issues) |
101+
102+
#### 3e. Retrieve Failing Logs
103+
104+
Use `mcp_ado_pipelines_get_build_log` to get the full build log listing, then `mcp_ado_pipelines_get_build_log_by_id` with the specific log ID of the failing step. Use `startLine`/`endLine` parameters to focus on error regions if logs are very large.
105+
106+
Also retrieve logs for:
107+
- The step that failed
108+
- The step immediately before the failure (for context)
109+
- Any steps with warnings
110+
111+
#### 3f. Compare Against Last Successful Build
112+
113+
This is often the fastest path to root cause for regressions:
114+
115+
1. Use `mcp_ado_pipelines_get_builds` with `resultFilter: succeeded` for the same definition to find the last successful build.
116+
2. Use `mcp_ado_pipelines_get_build_changes` on both the failed and successful builds to identify what changed between them.
117+
3. Check whether changes affect:
118+
- The agent source `.md` file
119+
- The compiled `.lock.yml` pipeline YAML
120+
- The ado-aw compiler version pin
121+
- Pipeline variables or service connection configuration
122+
- Pool or agent image configuration
123+
124+
#### 3g. Check Local Files (if accessible)
125+
126+
If you have access to the user's local repository:
127+
128+
- Find the agent source markdown file
129+
- Find the compiled `.lock.yml`
130+
- Run or recommend `ado-aw check <pipeline.lock.yml>` to verify compilation state
131+
- Compare the source front matter against the generated YAML for drift
132+
133+
### Step 4: Diagnose
134+
135+
Use the stage-specific sections below to identify the root cause based on the failing stage, logs, and error patterns you gathered. Your goal is to determine **what** failed and **why** — not to fix it.
136+
137+
### Step 5: Produce Diagnostic Report
138+
139+
After completing your investigation, produce a diagnostic report using the [Diagnostic Report Template](#diagnostic-report-template) below. This is your primary deliverable.
140+
141+
### Step 6: File the Issue
142+
143+
**This step is mandatory.** Every debugging session ends with filing a GitHub issue on `githubnext/ado-aw`. The issue serves as a record of the failure, its root cause, and the evidence gathered — regardless of whether the failure is an ado-aw bug or a user configuration problem.
144+
145+
Before filing:
146+
1. **Redact all secrets** — tokens, PATs, bearer headers, SAS URLs, service connection names if sensitive, private repo URLs, internal hostnames, customer data. Summarize redacted sections instead of quoting them.
147+
2. **Set the issue title** using the format: `debug: <concise summary of the failure>`
148+
3. **Set the issue body** to the diagnostic report produced in Step 5.
149+
4. **Apply a label** to categorize the root cause:
150+
- `bug` — compiler bug, runtime regression, or incorrect generated YAML
151+
- `documentation` — documented behavior doesn't match reality
152+
- `question` — unclear failure needing maintainer investigation
153+
- `user-configuration` — unauthorized service connection, missing pool, missing secret, invalid branch, tool not in allow-list, or expected threat-analysis block
154+
155+
**File the issue using the first available method (in priority order):**
156+
1. **GitHub MCP** — use the GitHub MCP tool to create the issue. **Ask the user to confirm before filing.**
157+
2. **GitHub CLI (`gh`)** — run `gh issue create --repo githubnext/ado-aw --title "..." --body "..." --label "..."`
158+
3. **Manual** — output the formatted issue title, body, and label as raw markdown. Then provide the filing link: `https://github.com/githubnext/ado-aw/issues/new`
159+
160+
---
161+
162+
## Manual Fallback
163+
164+
If Azure DevOps MCP pipeline tools are **not** available, follow this manual sequence:
32165

33166
1. **Gather information** — ask the user for:
34167
- The pipeline run URL or build ID
35-
- Error messages or log snippets
36-
- The agent source markdown file
37-
- The compiled pipeline YAML
168+
- Which job failed (Agent, Detection, Execution, Setup, Teardown)
169+
- Error messages or log snippets from the failing step
170+
- The agent source markdown file (or its path)
171+
- The compiled pipeline YAML (or its path)
38172

39173
2. **Identify which job failed** — check the job name in logs or the pipeline run summary:
40174
- `Agent` → see [Stage 1 Failures](#stage-1-agent-failures)
41175
- `Detection` → see [Stage 2 Failures](#stage-2-detection-failures)
42176
- `Execution` → see [Stage 3 Failures](#stage-3-execution-failures)
43177
- `Setup` / `Teardown` → see [Setup/Teardown Failures](#setupteardown-failures)
44178

45-
3. **Check for compilation drift** — before deep-diving into runtime errors, verify the pipeline YAML is in sync with its source markdown:
179+
3. **Check for compilation drift**:
46180
```bash
47181
ado-aw check <pipeline.lock.yml>
48182
```
49183

50-
4. **Apply the fix** — make the targeted change to the agent `.md` source file, then recompile:
51-
```bash
52-
ado-aw compile <agent.md>
53-
```
54-
55-
5. **Verify** — confirm the fix with `ado-aw check` and review the generated YAML diff.
184+
4. Continue from [Step 4: Diagnose](#step-4-diagnose) above.
56185

57186
---
58187

@@ -346,23 +475,82 @@ If downloads fail:
346475

347476
---
348477

349-
## Diagnostic Commands
478+
## Diagnostic Report Template
350479

351-
```bash
352-
# Verify pipeline YAML matches its source markdown
353-
ado-aw check <pipeline.lock.yml>
480+
Use this template for all diagnostic reports. Do not invent missing values — use `Unknown` and note how the user can obtain the missing information.
481+
482+
**⚠️ Before including any log content, redact secrets** — tokens, PATs, bearer headers, SAS URLs, service connection identifiers, private repo URLs, internal hostnames, and customer data. Summarize redacted sections instead of quoting them verbatim.
483+
484+
```markdown
485+
## Diagnostic Summary
486+
487+
- **Pipeline**: <name>
488+
- **Definition ID**: <id or Unknown>
489+
- **Build ID**: <id>
490+
- **Run URL**: <url>
491+
- **Result**: Failed / Partially succeeded / Cancelled
492+
- **Failing stage/job/step**: <stage> → <job> → <step>
493+
- **First failed timeline record**: <record name and type>
494+
- **Suspected root cause**: <brief description>
495+
- **Confidence**: High / Medium / Low
496+
497+
## Evidence
498+
499+
### Relevant log excerpts
500+
501+
<Sanitized log excerpts from the failing step and surrounding context.
502+
Include error messages, stack traces, and relevant warnings.
503+
Redact any secrets or sensitive information.>
354504
355-
# Recompile a single agent
356-
ado-aw compile <path/to/agent.md>
505+
### Timeline observations
357506
358-
# Recompile all detected agentic pipelines in the current directory
359-
ado-aw compile
507+
- <What the timeline showed — which stages ran, which failed, which were skipped>
508+
- <Any warnings or unusual patterns before the failure>
360509
361-
# Update GITHUB_TOKEN pipeline variable on ADO build definitions
362-
ado-aw configure
510+
### Changes since last successful build
363511
364-
# Dry-run configure to preview changes
365-
ado-aw configure --dry-run
512+
- <Files changed, if identified via get_build_changes>
513+
- <Whether agent .md, .lock.yml, compiler version, or config changed>
514+
- <Or: "No previous successful build found" / "Unknown — MCP not available">
515+
516+
## Environment
517+
518+
- **Agent source file**: <path or Unknown>
519+
- **Compiled pipeline YAML**: <path or Unknown>
520+
- **Compilation in sync**: Yes / No / Unknown (ado-aw check result)
521+
- **ado-aw version**: <version or Unknown>
522+
- **AWF version**: <version or Unknown>
523+
- **MCPG version**: <version or Unknown>
524+
- **Agent pool**: <pool name>
525+
- **OS/image**: <e.g., ubuntu-22.04>
526+
- **Engine/model**: <e.g., copilot / claude-opus-4.7>
527+
- **Relevant MCP servers**: <list or None>
528+
529+
## Analysis
530+
531+
- **Stage classification**: Stage 1 (Agent) / Stage 2 (Detection) / Stage 3 (Execution) / Setup / Teardown / Cross-stage
532+
- **Why this stage failed**: <detailed explanation>
533+
534+
## Root Cause
535+
536+
- **Root cause**: <clear description of what failed and why>
537+
- **Category**: Compiler bug / Runtime regression / User configuration / Infrastructure / Unknown
538+
- **Ruled-out causes**: <what you checked and eliminated>
539+
- **Related recent changes**: <commits, config changes, version updates>
540+
541+
## Issue
542+
543+
- **Title**: `debug: <concise summary>`
544+
- **Label**: bug / documentation / question / user-configuration
545+
```
546+
547+
---
548+
549+
## Diagnostic Commands
550+
551+
```bash
552+
# Verify pipeline YAML matches its source markdown
553+
ado-aw check <pipeline.lock.yml>
366554
```
367555

368556
---

0 commit comments

Comments
 (0)