Commit 39c4dd6
AI-256: analyze-root-cause runs TSA first when an incident UUID is present (#79)
## Summary
Updates the `analyze-root-cause` skill so that, when intake yields a
Monte Carlo incident UUID, it auto-invokes the Troubleshooting Agent
(`run_troubleshooting_agent`) async right after intake and merges its
findings into the existing interactive investigation. Today the skill
never calls TSA at all — it's purely a manual investigation walker —
which means invoking it via MCP does not consume MC credits the same way
the UI Troubleshooting Agent does. This PR closes that gap.
Docs-only change to one skill. No code, no new MCP tools —
`alert_assessment`, `run_troubleshooting_agent`, and
`get_troubleshooting_agent_results` already ship in the default toolset.
## What this PR enables
- **Auto-invoke TSA at a new Step 1.5** — when intake produces an
incident UUID and the user hasn't opted out, the skill kicks off
`run_troubleshooting_agent(incident_id=..., async_mode=True)` before any
manual investigation begins. The tool's built-in idempotency means an
existing successful run is reused; `force_rerun=True` is reserved for
explicit user request.
- **Three explicit skip conditions** — TSA is intentionally not invoked
when (1) intake produced no incident UUID (the no-incident reference
path), (2) the user is asking a narrow scoped question like "is X stale
right now?", or (3) the user explicitly opts out ("skip TSA", "manual
only").
- **Parallel manual + TSA flow** — Step 2 onwards continues the
interactive investigation while TSA runs in the background. Two poll
points (Step 4 ~30s in, Step 7 ~60–90s after) gather TSA results without
blocking. If TSA hasn't returned by Step 7, the skill presents the
manual findings and tells the user TSA is still working.
- **Findings-merge guidance in Step 7** — covers four cases: TSA agrees,
TSA contradicts, TSA returns low-signal, TSA failed. Each case has a
presentation pattern so the agent doesn't have to improvise.
- **Credit-cost note in the MCP Tools table** — calls out that
`alert_assessment` and `run_troubleshooting_agent` consume MC credits
the same way the UI TSA does. This is the question that originally
motivated the ticket (Slack thread linked on AI-256).
- **Two new Important rules** — never invoke TSA without an incident
UUID; honor explicit user opt-outs.
- **Reference + README touch-ups** — `intake-no-incident.md` now
explicitly notes TSA is skipped on that path (and how to rejoin the main
flow if an alert is found). `README.md` flow diagram shows the TSA
branch with poll points and the skip conditions.
## Key Decisions
See
[AI-256](https://linear.app/montecarlodata/issue/AI-256/update-analyze-root-cause-skill-to-run-ta-first)
for the originating ask and [the parent Slack
thread](https://montecarloai.slack.com/archives/C0AM84B7F0D/p1778022753347149)
for the credit-consumption question that motivated it.
- **Skip `alert_assessment` in the auto-flow.** The `automated-triage`
skill uses `alert_assessment` as a cheap gate on every alert before
deciding whether to escalate to TSA. `analyze-root-cause` is a different
shape — it's invoked on a single incident the user already cares about,
so the user-facing latency cost of the extra ~2-min scoring step
outweighs the savings of skipping TSA on LOW-confidence alerts. Surfaced
`alert_assessment` in the tools table as available, but the auto-flow
goes straight to TSA. Revisit if cost/latency feedback says otherwise.
- **Async + parallel over sync.** TSA takes 4–8 minutes. Sync
(`async_mode=False`) would block the conversation for that long; "async
+ wait" would block silently. Async-with-parallel-manual-investigation
gives the user findings either way and lets TSA's deeper analysis fold
in when it lands.
- **No structured "narrow check" classifier.** The skill relies on agent
judgment with example user phrasings ("is X stale right now?", "what's
the row count of Y?"). If this proves too fuzzy in practice, follow-up
work could tighten it (e.g. require an explicit "investigate" verb to
gate TSA), but a classifier is overkill for v1.
- **Idempotency via tool default, not new flag.**
`run_troubleshooting_agent` already returns existing results when status
is `success` or `running`. The skill leans on that and explicitly
instructs the agent not to set `force_rerun=True` unless the user asks —
protecting against accidental billable re-runs.
- **No version bump.** Patterned after PR #76 (docs-only changes to a
single skill don't bump the plugin version). If you'd prefer a version
bump for visibility, easy to add.
## Test plan
- [x] Diff between the marketplace-installed copy
(`~/.claude/plugins/marketplaces/mc-marketplace/skills/analyze-root-cause/`)
and the updated source shows only intentional additions — no drift, no
accidental edits.
- [x] Walked through both the incident-UUID path and the no-incident
path mentally — flow is consistent in both directions.
- [ ] Smoke-test the skill in Claude Code against a real Monte Carlo
incident UUID — confirm the agent kicks off TSA at Step 1.5 and merges
findings in Step 7.
- [ ] Smoke-test the no-incident path — confirm TSA is never invoked.
- [ ] Smoke-test explicit opt-out ("skip TSA, just investigate
manually") — confirm TSA is not invoked.
## Checklist
- [x] Docs updated
- [na] Tests added (markdown-only change, no code)
- [na] Version bumped (docs-only single-skill change; matches PR #76
pattern)
- [na] Migration notes
- [x] No secrets or credentials in changes
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent c9a68e2 commit 39c4dd6
3 files changed
Lines changed: 66 additions & 11 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
32 | 32 | | |
33 | 33 | | |
34 | 34 | | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
35 | 40 | | |
36 | 41 | | |
37 | 42 | | |
| |||
48 | 53 | | |
49 | 54 | | |
50 | 55 | | |
51 | | - | |
52 | | - | |
53 | | - | |
54 | | - | |
55 | | - | |
56 | | - | |
57 | | - | |
58 | | - | |
59 | | - | |
60 | | - | |
61 | | - | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
62 | 70 | | |
63 | 71 | | |
| 72 | + | |
| 73 | + | |
64 | 74 | | |
65 | 75 | | |
66 | 76 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
71 | 71 | | |
72 | 72 | | |
73 | 73 | | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
74 | 79 | | |
75 | 80 | | |
76 | 81 | | |
| |||
98 | 103 | | |
99 | 104 | | |
100 | 105 | | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
101 | 129 | | |
102 | 130 | | |
| 131 | + | |
| 132 | + | |
103 | 133 | | |
104 | 134 | | |
105 | 135 | | |
| |||
132 | 162 | | |
133 | 163 | | |
134 | 164 | | |
| 165 | + | |
| 166 | + | |
135 | 167 | | |
136 | 168 | | |
137 | 169 | | |
| |||
152 | 184 | | |
153 | 185 | | |
154 | 186 | | |
| 187 | + | |
| 188 | + | |
155 | 189 | | |
156 | 190 | | |
157 | 191 | | |
| |||
160 | 194 | | |
161 | 195 | | |
162 | 196 | | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
163 | 204 | | |
164 | 205 | | |
165 | 206 | | |
| |||
170 | 211 | | |
171 | 212 | | |
172 | 213 | | |
| 214 | + | |
| 215 | + | |
Lines changed: 2 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
57 | 57 | | |
58 | 58 | | |
59 | 59 | | |
| 60 | + | |
| 61 | + | |
60 | 62 | | |
61 | 63 | | |
62 | 64 | | |
| |||
0 commit comments