Skip to content

[Claude] Improve troubleshoot-ci-build skill with scripts and heuristics#8181

Merged
lucaspimentel merged 26 commits into
masterfrom
lpimentel/ai-skills/troubleshoot-azure-devops-2
Feb 24, 2026
Merged

[Claude] Improve troubleshoot-ci-build skill with scripts and heuristics#8181
lucaspimentel merged 26 commits into
masterfrom
lpimentel/ai-skills/troubleshoot-azure-devops-2

Conversation

@lucaspimentel

@lucaspimentel lucaspimentel commented Feb 9, 2026

Copy link
Copy Markdown
Member

Summary of changes

Improves the /troubleshoot-ci-build skill by extracting analysis logic into a standalone PowerShell script, adding better failure heuristics, and detecting snapshot mismatches and build errors.

Reason for change

  • Standalone script: Lets developers run build analysis without the AI agent, supports automation via JSON output
  • Remove build comparison: All tests are assumed to pass in master — comparison didn't add value
  • Single-runtime flake detection: Test failing on one runtime but passing on others is a better flaky indicator
  • Snapshot mismatch detection: Proactively suggests UpdateSnapshotsFromBuild when snapshot diffs are detected (vs span count mismatches which indicate deeper issues)
  • Build error extraction: Surface compilation errors (CS/NU/NETSDK) and Nuke target exceptions directly
  • Security: Fix command injection vulnerability in Azure CLI invocation

Implementation details

PowerShell Script (tracer/tools/Get-AzureDevOpsBuildAnalysis.ps1):

  • Parameter sets: -BuildId or -PullRequest (mutually exclusive)
  • Optional -IncludeLogs, output formats: table (default) or json
  • Timeout/cancellation detection, single-runtime flake indicators
  • Extract-BuildErrors: Captures CS/NU/NETSDK/etc compilation errors and Nuke target exceptions from logs
  • Extract-FailedTests: Expanded patterns for xUnit, crash blocks, comma-separated profiler results, snapshot failures
  • Safe Azure CLI invocation: argument array + call operator (no Invoke-Expression)
  • Structured error messages with area, resource, exit code, and full command for debugging
  • Route parameters split into separate arguments (fixes misleading TF400813 auth errors)
  • #Requires -Version 5.1 directive for native version enforcement
  • Pure PowerShell — no jq/grep/sed dependencies

Skill changes (SKILL.md, failure-patterns.md):

  • Calls script instead of inline bash commands (~670 → ~500 lines)
  • Snapshot mismatch detection: distinguishes snapshot content diffs from span count mismatches
  • Suggests UpdateSnapshotsFromBuild Nuke target when appropriate
  • PR link and source branch included in skill output header
  • Unicode emoji requirement for output formatting
  • Slack alerting guidance for persistent CI failures (#apm-dotnet)
  • Updated decision tree and quick reference table
  • Added scripts-reference.md for script documentation

Example output

Running /troubleshoot-ci-build build 195754 produces:

# CI Failure Analysis for Build 195754

Status: ❌ Failed
Build: https://dev.azure.com/datadoghq/a51c4863-3eb4-4c5d-878a-58b41a049e4e/_build/results?buildId=195754

PR: https://github.com/DataDog/dd-trace-dotnet/pull/7628
Branch: lpimentel/APMSVLS-58-azfunc-host-parenting
Commit: 674e5d0c

## Quick Overview

Failed Tasks (7):
- Run Azure Functions tests (5 occurrences across Windows net6.0, net7.0, net8.0, net9.0, net10.0)
- docker-compose run IntegrationTests (Group 2) (2 occurrences)

Failed Jobs (7):
- windows net6.0 / windows net7.0 / windows net8.0 / windows net9.0 / windows net10.0 — Azure Functions stage
- DockerTest debian_net10.0_group2 — Linux integration tests
- DockerTest debian_netcoreapp3.1_group2 — Linux integration tests

Failed Stages (3):
- integration_tests_azure_functions
- integration_tests_arm64_debugger
- integration_tests_linux

Timed Out Jobs (1, canceled after ~60 min):
- Test alpine_net9.0_true (60.2 min)

Collateral Cancellations (2, < 5 min):
- linux ddtrace_ubuntu_9_0-noble
- linux ddtrace_ubuntu_9_0-bookworm-slim

Failed Tests (5 — all Azure Functions SubmitsTraces):
- AzureFunctionsTests+IsolatedRuntimeV4AspNetCore.SubmitsTraces
- AzureFunctionsTests+IsolatedRuntimeV4AspNetCoreV1.SubmitsTraces
- AzureFunctionsTests+IsolatedRuntimeV4HostLogsDisabled.SubmitsTraces
- AzureFunctionsTests+IsolatedRuntimeV4.SubmitsTraces
- AzureFunctionsTests+IsolatedRuntimeV4SdkV1.SubmitsTraces

Build Errors (1):
- Target "RunWindowsAzureFunctionsTests" threw an exception

### Initial Assessment

- 🔴 Azure Functions tests — All 5 SubmitsTraces test variants failed across all Windows
  runtimes. These are likely real failures related to the PR changes.
- 🟡 ARM64 debugger timeout — Test alpine_net9.0_true timed out at 60.2 min — likely
  flaky infra (single-runtime ARM64 timeout pattern).
- 🔵 Linux Group 2 failures (debian net10.0 + netcoreapp3.1) — Need more info to
  categorize; could be infra or unrelated.

## 🔍 What would you like to investigate?

1. Categorize failures — Deeper analysis of failure types
2. View specific logs — Download logs for failed tasks
3. Show Azure Functions test details — Focus on the AzFunc SubmitsTraces failures
4. Update snapshots — If failures are snapshot mismatches (needs verification first)

Test coverage

Manual verification:

  • Get-Help Get-AzureDevOpsBuildAnalysis.ps1 -Full parses correctly
  • Parameter sets validated (mutually exclusive -BuildId/-PullRequest)
  • Test failure patterns verified against real CI builds
  • Tested on Windows with PowerShell 7.5.x

Other details

Internal development tooling for engineering teams. Not included in any customer-facing artifacts.

"A flaky test walks into a bar. Sometimes." — Claude 🤖

@lucaspimentel lucaspimentel changed the title Refactor Claude skill, extract standalone scripts Extract Get-AzureDevOpsBuildAnalysis.ps1 from CI troubleshooting skill Feb 9, 2026
@dd-trace-dotnet-ci-bot

This comment was marked as off-topic.

@lucaspimentel lucaspimentel force-pushed the lpimentel/ai-skills/troubleshoot-azure-devops-2 branch 4 times, most recently from 5ebe4df to 0d0113b Compare February 11, 2026 23:45
@lucaspimentel lucaspimentel changed the title Extract Get-AzureDevOpsBuildAnalysis.ps1 from CI troubleshooting skill [Claude] Refactor CI troubleshooting skill to extract re-usable script Feb 12, 2026
@lucaspimentel lucaspimentel force-pushed the lpimentel/ai-skills/troubleshoot-azure-devops-2 branch 2 times, most recently from d432fad to 35dc707 Compare February 13, 2026 21:24
@lucaspimentel lucaspimentel changed the title [Claude] Refactor CI troubleshooting skill to extract re-usable script Extract CI troubleshooting logic into standalone PowerShell script Feb 17, 2026
@lucaspimentel lucaspimentel changed the title Extract CI troubleshooting logic into standalone PowerShell script [Claude] Extract CI troubleshooting logic into standalone PowerShell script Feb 17, 2026
@lucaspimentel lucaspimentel force-pushed the lpimentel/ai-skills/troubleshoot-azure-devops-2 branch from 35dc707 to 697bf66 Compare February 17, 2026 19:17
@lucaspimentel lucaspimentel changed the title [Claude] Extract CI troubleshooting logic into standalone PowerShell script [CI Skill] Improve troubleshoot-ci-build with scripts and heuristics Feb 17, 2026
@lucaspimentel lucaspimentel changed the title [CI Skill] Improve troubleshoot-ci-build with scripts and heuristics [Claude] Improve troubleshoot-ci-build skill with scripts and heuristics Feb 17, 2026
@lucaspimentel lucaspimentel added the AI Generated Largely based on code generated by an AI or LLM. This label is the same across all dd-trace-* repos label Feb 17, 2026
@lucaspimentel lucaspimentel marked this pull request as ready for review February 19, 2026 22:14
@lucaspimentel lucaspimentel requested review from a team as code owners February 19, 2026 22:14
@lucaspimentel lucaspimentel changed the title [Claude] Improve troubleshoot-ci-build skill with scripts and heuristics [Claude] Improve troubleshoot-ci-build skill with scripts and heuristics Feb 19, 2026
@lucaspimentel lucaspimentel changed the title [Claude] Improve troubleshoot-ci-build skill with scripts and heuristics [Claude] Improve troubleshoot-ci-build skill with scripts and heuristics Feb 19, 2026

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0054b306bc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread tracer/tools/Get-AzureDevOpsBuildAnalysis.ps1
Comment thread tracer/tools/Get-AzureDevOpsBuildAnalysis.ps1 Outdated
@NachoEchevarria

Copy link
Copy Markdown
Collaborator

Nice! Not for this PR, but I was thinking that we should have a single skill for CI analisys. Right now, we have this for Azure and the one from DD marketplace for gitlab and GH actions. It would be nice to have one for everything in our pipeline.

I actually asked yesterday to Claude to generate one, but did not really test it or even review it: master...nacho/analyzeCiSkill

@dd-trace-dotnet-ci-bot

This comment was marked as off-topic.

@lucaspimentel lucaspimentel force-pushed the lpimentel/ai-skills/troubleshoot-azure-devops-2 branch from eb560f0 to 7264897 Compare February 20, 2026 16:45
@pr-commenter

pr-commenter Bot commented Feb 20, 2026

Copy link
Copy Markdown

Benchmarks

Benchmark execution time: 2026-02-24 04:28:47

Comparing candidate commit 8b3c42a in PR branch lpimentel/ai-skills/troubleshoot-azure-devops-2 with baseline commit caa8d05 in branch master.

Found 5 performance improvements and 15 performance regressions! Performance is the same for 156 metrics, 16 unstable metrics.

scenario:Benchmarks.Trace.AgentWriterBenchmark.WriteAndFlushEnrichedTraces net6.0

  • 🟩 execution_time [-82.840ms; -82.784ms] or [-40.796%; -40.769%]

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.AllCycleSimpleBody netcoreapp3.1

  • 🟩 throughput [+37619.762op/s; +54632.366op/s] or [+5.595%; +8.125%]

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.ObjectExtractorMoreComplexBody netcoreapp3.1

  • 🟥 execution_time [+11.510ms; +14.900ms] or [+5.816%; +7.529%]

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.ObjectExtractorSimpleBody netcoreapp3.1

  • 🟩 execution_time [-22.793ms; -16.985ms] or [-10.475%; -7.806%]

scenario:Benchmarks.Trace.Asm.AppSecEncoderBenchmark.EncodeArgs net472

  • 🟥 throughput [-925.810op/s; -903.411op/s] or [-10.006%; -9.764%]

scenario:Benchmarks.Trace.AspNetCoreBenchmark.SendRequest net472

  • 🟥 throughput [-62982194.246op/s; -62051718.904op/s] or [-31.521%; -31.055%]

scenario:Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark.WriteAndFlushEnrichedTraces net472

  • 🟩 throughput [+105.468op/s; +132.534op/s] or [+9.973%; +12.532%]

scenario:Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark.WriteAndFlushEnrichedTraces net6.0

  • 🟥 throughput [-205.496op/s; -133.815op/s] or [-12.644%; -8.233%]

scenario:Benchmarks.Trace.CharSliceBenchmark.OptimizedCharSlice net472

  • 🟥 execution_time [+107.690µs; +112.110µs] or [+5.681%; +5.914%]
  • 🟥 throughput [-29.501op/s; -28.321op/s] or [-5.592%; -5.368%]

scenario:Benchmarks.Trace.CharSliceBenchmark.OptimizedCharSlice netcoreapp3.1

  • 🟥 execution_time [+350.998µs; +381.709µs] or [+12.541%; +13.638%]
  • 🟥 throughput [-42.894op/s; -39.762op/s] or [-12.005%; -11.129%]

scenario:Benchmarks.Trace.CharSliceBenchmark.OriginalCharSlice netcoreapp3.1

  • 🟥 execution_time [+351.290µs; +378.030µs] or [+9.014%; +9.700%]
  • 🟥 throughput [-22.688op/s; -21.207op/s] or [-8.842%; -8.265%]

scenario:Benchmarks.Trace.ElasticsearchBenchmark.CallElasticsearch netcoreapp3.1

  • 🟩 throughput [+29008.840op/s; +40302.430op/s] or [+6.228%; +8.652%]

scenario:Benchmarks.Trace.Iast.StringAspectsBenchmark.StringConcatBenchmark net6.0

  • 🟥 throughput [-4326.809op/s; -2818.917op/s] or [-18.625%; -12.134%]

scenario:Benchmarks.Trace.Iast.StringAspectsBenchmark.StringConcatBenchmark netcoreapp3.1

  • 🟥 throughput [-3560.158op/s; -1776.665op/s] or [-17.906%; -8.936%]

scenario:Benchmarks.Trace.Log4netBenchmark.EnrichedLog netcoreapp3.1

  • 🟥 execution_time [+25.066ms; +29.153ms] or [+14.316%; +16.649%]

scenario:Benchmarks.Trace.SerilogBenchmark.EnrichedLog net6.0

  • 🟥 throughput [-17959.211op/s; -13668.435op/s] or [-7.565%; -5.757%]

scenario:Benchmarks.Trace.SpanBenchmark.StartFinishSpan net472

  • 🟥 throughput [-67312.849op/s; -64229.568op/s] or [-6.037%; -5.761%]

Add PowerShell installation instructions and version requirements:
- Minimum: PowerShell 5.1 (Windows built-in)
- Recommended: PowerShell 7+ (cross-platform)
- Always prefer pwsh over powershell.exe when available

Add runtime version check to script with helpful error messages.

🤖 Co-Authored-By: Claude Code <noreply@anthropic.com>
Remove -CompareWithMaster and -CompareWithBuild from the build
analysis script and all skill docs. All tests are assumed to pass
in master. Add single-runtime failure as a flaky test indicator
(test fails on one runtime but passes on others).

🤖 Co-Authored-By: Claude Code <noreply@anthropic.com>
🤖 Co-Authored-By: Claude Code <noreply@anthropic.com>
When CI failures persist after retry, engineers should alert #apm-dotnet.

🤖 Co-Authored-By: Claude Code <noreply@anthropic.com>
Replace Invoke-Expression with safe argument array and call operator
to prevent command injection vulnerabilities in Azure CLI invocation.

🤖 Co-Authored-By: Claude Code <noreply@anthropic.com>
Add structured context (Area, Resource, Exit Code) to API failures
for easier debugging when calls fail.

🤖 Co-Authored-By: Claude Code <noreply@anthropic.com>
Add regex patterns to detect additional test failure formats:
- xUnit format: [xUnit.net ...] TestName [FAIL]
- Stack traces: at Namespace.Class.Method()
- Span count mismatches: Expected N spans but got M
- Snapshot verification failures

Improves test name extraction from CI error messages.

🤖 Co-Authored-By: Claude Code <noreply@anthropic.com>
Add the complete command line to error output for easier
debugging and manual reproduction of API failures.

🤖 Co-Authored-By: Claude Code <noreply@anthropic.com>
- Rename $args to $azArgs to avoid PowerShell automatic variable conflict
- Split route parameters by whitespace into separate arguments

The Azure CLI --route-parameters flag expects each parameter as a
separate argument, not a single space-separated string. Passing
"project=dd-trace-dotnet buildId=12345" as one argument caused
authentication errors. Now splits into individual arguments:
project=dd-trace-dotnet and buildId=12345

🤖 Co-Authored-By: Claude Code <noreply@anthropic.com>
Azure CLI's --route-parameters requires each key=value pair as a
separate argument. Passing them as a single space-separated string
causes misleading TF400813 authorization errors.

🤖 Co-Authored-By: Claude Code <noreply@anthropic.com>
Based on analysis of recent failed builds (196072, 195867, 195830, etc.):
- Add comma-separated xUnit pattern for profiler tests
- Add crash block detection for bare test names after host crash
- Remove stack trace pattern (extracted method names, not test names)

Verified against builds with each pattern type.

🤖 Co-Authored-By: Claude Code <noreply@anthropic.com>
Replace manual version validation with #Requires -Version 5.1
directive, which PowerShell enforces natively before execution.

🤖 Co-Authored-By: Claude Code <noreply@anthropic.com>
🤖 Co-Authored-By: Claude Code <noreply@anthropic.com>
🤖 Co-Authored-By: Claude Code <noreply@anthropic.com>
Add Extract-BuildErrors function to capture compilation errors (CS, NU,
NETSDK, etc.) and Nuke target exceptions. Add test host crash and
framework-specific failure patterns to Extract-FailedTests. Wire new
BuildErrors field into result object and table output.

🤖 Co-Authored-By: Claude Code <noreply@anthropic.com>
🤖 Co-Authored-By: Claude Code <noreply@anthropic.com>
Always return PSCustomObject to the pipeline and display
the human-readable summary via Write-Host. Callers can
pipe to ConvertTo-Json for JSON output.

🤖 Co-Authored-By: Claude Code <noreply@anthropic.com>
🤖 Co-Authored-By: Claude Code <noreply@anthropic.com>
🤖 Co-Authored-By: Claude Code <noreply@anthropic.com>
🤖 Co-Authored-By: Claude Code <noreply@anthropic.com>
Redirect stderr to a temp file instead of merging with
stdout via 2>&1. Prevents az/gh CLI warnings from
corrupting JSON output before ConvertFrom-Json parsing.

🤖 Co-Authored-By: Claude Code <noreply@anthropic.com>
Extract API reference and Windows CLI pitfalls to
references/cli-reference.md, reducing SKILL.md from
634 to 413 lines. Remove redundant README.md and
duplicate Supporting Files section. Make description
more trigger-friendly. Remove disable-model-invocation.

🤖 Co-Authored-By: Claude Code <noreply@anthropic.com>
- Add Bash(pwsh:*) to allowed-tools for auto-approval
- Replace inline categorization rules with pointer to failure-patterns.md
- Sharpen "load when" guidance for reference files
- Condense output format template from ~70 to ~15 lines
- Add table of contents to failure-patterns.md

🤖 Co-Authored-By: Claude Code <noreply@anthropic.com>
- Remove stale scratchpad references in cli-reference.md
- Remove redundant "Key Learnings" header in SKILL.md
- Remove redundant curl/grep commands from failure-patterns.md

🤖 Co-Authored-By: Claude Code <noreply@anthropic.com>
@lucaspimentel lucaspimentel force-pushed the lpimentel/ai-skills/troubleshoot-azure-devops-2 branch from 5bb3c9d to 8b3c42a Compare February 24, 2026 03:46
@lucaspimentel lucaspimentel merged commit 2622ed1 into master Feb 24, 2026
98 of 101 checks passed
@lucaspimentel lucaspimentel deleted the lpimentel/ai-skills/troubleshoot-azure-devops-2 branch February 24, 2026 18:23
@github-actions github-actions Bot added this to the vNext-v3 milestone Feb 24, 2026
@lucaspimentel lucaspimentel added the for-ai-agents 🤖 files used by AI agents, not humans label Feb 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AI Generated Largely based on code generated by an AI or LLM. This label is the same across all dd-trace-* repos for-ai-agents 🤖 files used by AI agents, not humans

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants