Skip to content

Commit 14a4dff

Browse files
janiszclaude
andcommitted
Remove untested timing and performance claims from multi-agent RCA
Remove all timing and performance claims across multi-agent RCA implementation: - Remove "55% faster than sequential" and "55% time savings" from README.md - Remove agent time budgets (90-120s, 60-90s) from agent prompts - Remove "2-3 min vs 4-5 min" claims from triage.md - Remove specific timeout values from constants.md descriptions - Remove performance claim from rca-aggregation-rules.md - Simplify aggregation logic in triage.md to reference rca-aggregation-rules.md - Add clarifying note to infra-detective.md about infrastructure patterns Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
1 parent 3c552f1 commit 14a4dff

7 files changed

Lines changed: 27 additions & 73 deletions

File tree

workflows/acs-triage/.claude/agents/code-archaeologist.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -217,7 +217,6 @@ confidence = min(confidence, 95)
217217

218218
## Notes
219219

220-
- **Time budget**: 90-120 seconds total
221220
- **Parallel execution**: May run concurrently with other agents
222221
- **Fallback**: If git commands fail, analyze based on file paths alone
223222
- **Focus**: Recent changes are most suspicious - prioritize those

workflows/acs-triage/.claude/agents/infra-detective.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ Read known infrastructure patterns from:
4444

4545
### 2. Error Pattern Matching
4646

47-
Match the error message against known patterns:
47+
Match the error message against known infrastructure flake patterns. Note: These patterns are specific to infrastructure flake detection and differ from the team assignment patterns in `reference/error-signatures.md`.
4848

4949
```python
5050
infrastructure_patterns = {
@@ -308,7 +308,6 @@ confidence = min(confidence, 95)
308308

309309
## Notes
310310

311-
- **Time budget**: 60-90 seconds total
312311
- **Parallel execution**: Runs concurrently with other agents
313312
- **Authority**: Has final say on infrastructure flake classification when confidence ≥80%
314313
- **Focus**: Infrastructure patterns are well-documented - use pattern matching heavily

workflows/acs-triage/.claude/agents/issue-correlator.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -307,8 +307,7 @@ confidence = min(confidence, 90)
307307

308308
## Notes
309309

310-
- **Time budget**: 60-90 seconds total
311310
- **Parallel execution**: Runs concurrently with other agents
312-
- **Search limit**: Max 20 results per query to stay within time budget
311+
- **Search limit**: Max 20 results per query
313312
- **Similarity threshold**: Only include issues ≥70% similarity in output
314313
- **Focus**: Historical context helps identify recurring patterns and known solutions

workflows/acs-triage/.claude/commands/triage.md

Lines changed: 19 additions & 59 deletions
Original file line numberDiff line numberDiff line change
@@ -118,8 +118,6 @@ For issues where `issueType === "CI_FAILURE"`:
118118

119119
After Stage 1, spawn specialized RCA agents for each CI_FAILURE issue:
120120

121-
**Time budget:** 2-3 minutes per CI_FAILURE issue (parallel execution).
122-
123121
**Multi-Agent Process:**
124122

125123
1. **Create RCA Team**
@@ -132,19 +130,19 @@ After Stage 1, spawn specialized RCA agents for each CI_FAILURE issue:
132130

133131
2. **Spawn 3 Agents in Parallel** (single message, multiple Agent calls)
134132

135-
**Agent 1: Code Archaeologist** (90-120s)
133+
**Agent 1: Code Archaeologist**
136134
- Tools: GitHub MCP, git blame, git log
137135
- Task: Find problematic commit/PR that introduced the issue
138136
- Reads: `workflows/acs-triage/.claude/agents/code-archaeologist.md`
139137
- Output: `artifacts/acs-triage/rca/{issue_key}/archaeology-findings.json`
140138

141-
**Agent 2: Infrastructure Detective** (60-90s)
139+
**Agent 2: Infrastructure Detective**
142140
- Tools: Pattern matching, error signatures
143141
- Task: Classify as infrastructure flake vs real bug
144142
- Reads: `workflows/acs-triage/.claude/agents/infra-detective.md`
145143
- Output: `artifacts/acs-triage/rca/{issue_key}/infra-findings.json`
146144

147-
**Agent 3: Cross-Issue Correlator** (60-90s)
145+
**Agent 3: Cross-Issue Correlator**
148146
- Tools: JIRA MCP (search historical issues)
149147
- Task: Find similar past issues and failure frequency
150148
- Reads: `workflows/acs-triage/.claude/agents/issue-correlator.md`
@@ -174,81 +172,46 @@ After Stage 1, spawn specialized RCA agents for each CI_FAILURE issue:
174172
```
175173

176174
3. **Wait for Agents to Complete**
177-
- Agents run concurrently (max 120s total vs 300s sequential)
175+
- Agents run concurrently
178176
- Notification when all agents finish
179177

180178
4. **Aggregate Findings**
181179

182-
Read the 3 findings JSON files and synthesize unified `deep_analysis`:
180+
Read the 3 findings JSON files and synthesize unified `deep_analysis` using the aggregation rules from `reference/rca-aggregation-rules.md`:
183181

184182
```python
185183
archaeology = read_json("archaeology-findings.json")
186184
infra = read_json("infra-findings.json")
187185
correlation = read_json("correlation-findings.json")
188186

189-
# Determine root cause (Infrastructure Detective has authority on flakes)
190-
if infra.flake_classification == "infrastructure-flake" and infra.confidence >= 80:
191-
failure_category = "infrastructure"
192-
root_cause = infra.reasoning
193-
elif infra.flake_classification == "code-bug":
194-
failure_category = "code-bug"
195-
root_cause = f"{infra.reasoning}. {archaeology.reasoning if archaeology else ''}"
196-
else:
197-
failure_category = "unknown"
198-
root_cause = "Insufficient data to determine root cause"
199-
200-
# Calculate unified confidence
201-
confidence = "Medium" # Default
202-
if infra.confidence >= 85 and (archaeology.confidence >= 80 or correlation.confidence >= 70):
203-
confidence = "High"
204-
elif infra.confidence <= 60 and archaeology.confidence <= 60:
205-
confidence = "Low"
206-
207187
deep_analysis = {
208-
"root_cause": root_cause,
209-
"failure_category": failure_category,
188+
"root_cause": determine_root_cause(archaeology, infra, correlation),
189+
"failure_category": classify_failure(infra, archaeology),
210190
"affected_components": archaeology.git_blame_results.primary_file if archaeology else [],
211-
"confidence": confidence,
212-
"risk_assessment": "Medium", # Based on failure_category
191+
"confidence": calculate_unified_confidence(archaeology, infra, correlation),
192+
"risk_assessment": assess_risk(failure_category, affected_components, frequency),
213193
"proposed_fix": infra.suggested_action or "Investigate further",
214-
"relevant_logs": extract_logs(issue, max_chars=500),
194+
"relevant_logs": sanitize(extract_logs(issue, max_chars=500)),
215195

216-
# NEW: Git archaeology results
217196
"problematic_commit": archaeology.git_blame_results.last_modified_commit if archaeology else null,
218197
"problematic_pr": archaeology.pr_context.pr_number if archaeology else null,
219198

220-
# NEW: Infrastructure analysis
221199
"is_infrastructure_flake": infra.flake_classification == "infrastructure-flake",
222200
"infrastructure_workaround": infra.workaround_recommendations[0] if infra.workaround_recommendations else null,
223201

224-
# NEW: Cross-issue correlation
225202
"similar_issues": correlation.similar_issues if correlation else [],
226203
"failure_frequency": correlation.failure_frequency if correlation else {},
227204

228205
"investigation_method": "multi_agent_parallel"
229206
}
230207
```
231208

232-
**Confidence Scoring:**
233-
```python
234-
base_confidence = infra.confidence or 50
235-
236-
# Boost if git blame found recent culprit (within 7 days)
237-
if archaeology and archaeology.git_blame_results.recency == "very_recent":
238-
base_confidence += 10
239-
240-
# Boost if similar issues have known resolutions
241-
if correlation and correlation.similar_issues and max(issue.similarity for issue in correlation.similar_issues) >= 85:
242-
base_confidence += 5
243-
244-
# Convert to High/Medium/Low
245-
if base_confidence >= 85:
246-
confidence = "High"
247-
elif base_confidence >= 60:
248-
confidence = "Medium"
249-
else:
250-
confidence = "Low"
251-
```
209+
See `reference/rca-aggregation-rules.md` for algorithm details:
210+
- `determine_root_cause()`: Infrastructure Detective has authority on flakes (confidence ≥80%)
211+
- `classify_failure()`: infrastructure | code-bug | flaky-test | unknown
212+
- `calculate_unified_confidence()`: Base from Infrastructure Detective + adjustments for recent changes (+10%) and similar issues (+5%)
213+
- `assess_risk()`: Infrastructure flakes = Low; High frequency or critical components = High
214+
- `sanitize()`: Remove API tokens, passwords, secrets, internal URLs with credentials, IP addresses, employee emails
252215

253216
5. **Cleanup RCA Team**
254217
```
@@ -471,20 +434,17 @@ After running this command, you should have:
471434
**Parallel Execution:**
472435
- Phase 1a + 1b: Run setup and fetch concurrently
473436
- Phase 4: Run CI/Vuln/Flaky analysis in parallel (3 concurrent tool calls)
474-
- Total time savings: 70-100 seconds vs sequential execution
475437

476438
**Deep CI Failure Analysis:**
477-
- Time budget: 4-5 minutes per CI_FAILURE issue (Stage 2 of Phase 4a)
478-
- Deep analysis runs sequentially per issue (each requires significant investigation)
479-
- With 5 issues max and potential for all to be CI failures, worst case is ~25 minutes for analysis alone
439+
- Deep analysis runs sequentially per issue
480440
- The investigator agent methodology is read once from `/tmp/triage/stackrox/.claude/agents/stackrox-ci-failure-investigator.md` and applied to each issue
481441

482442
## Notes
483443

484444
- **Timeout**: 1800 seconds total (30 minutes)
485445
- **Issue Limit**: 5 issues per run to allow time for deep CI failure analysis
486-
- **Deep CI Failure Analysis**: Each CI_FAILURE issue gets 4-5 minutes of deep root cause investigation using the stackrox CI failure investigator methodology. Results appear in comments and reports but do NOT influence team assignment.
487-
- **Parallel Analysis**: CI/Vuln/Flaky analysis MUST run concurrently (saves 60-80s). Within Phase 4a, deep analysis runs sequentially per CI_FAILURE issue.
446+
- **Deep CI Failure Analysis**: Each CI_FAILURE issue gets deep root cause investigation using the stackrox CI failure investigator methodology. Results appear in comments and reports but do NOT influence team assignment.
447+
- **Parallel Analysis**: CI/Vuln/Flaky analysis MUST run concurrently. Within Phase 4a, deep analysis runs sequentially per CI_FAILURE issue.
488448
- **READ-ONLY by default**: Use `--comment` flag to write to JIRA
489449
- **High Confidence Threshold**: ≥80% for auto-assignment recommendations
490450
- **Version Awareness**: Automatically detects and adjusts for version mismatches

workflows/acs-triage/README.md

Lines changed: 5 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ Automated triage for StackRox/ACS JIRA issues with intelligent team assignment u
66

77
This workflow provides systematic triage of untriaged StackRox issues using:
88

9-
- **Multi-Agent Root Cause Analysis**: 3 specialized agents analyze CI failures in parallel (Code Archaeologist, Infrastructure Detective, Cross-Issue Correlator) - 55% faster than sequential analysis
9+
- **Multi-Agent Root Cause Analysis**: 3 specialized agents analyze CI failures in parallel (Code Archaeologist, Infrastructure Detective, Cross-Issue Correlator)
1010
- **Multi-Strategy Team Assignment**: 5-strategy priority system with 95%-70% confidence scores
1111
- **Specialized Analysis**: Custom decision trees for CI failures, vulnerabilities, and flaky tests
1212
- **Version Awareness**: Detects mismatches between issue versions and current codebase
@@ -151,23 +151,23 @@ The workflow automatically runs analysis commands in parallel when executed by A
151151
- Match error signatures from `reference/error-signatures.md`
152152
- Check for known flaky patterns
153153

154-
**Stage 2: Multi-Agent Root Cause Analysis** (55% faster than sequential)
154+
**Stage 2: Multi-Agent Root Cause Analysis**
155155

156156
Spawns 3 specialized agents in parallel for deep investigation:
157157

158-
1. **Code Archaeologist** (90-120s)
158+
1. **Code Archaeologist**
159159
- Git blame analysis to find when files were last modified
160160
- GitHub PR lookup to identify problematic commits
161161
- Test vs code change detection
162162
- **Output:** `problematic_commit`, `problematic_pr`
163163

164-
2. **Infrastructure Detective** (60-90s)
164+
2. **Infrastructure Detective**
165165
- Pattern matching against known infrastructure flakes
166166
- Flake vs real bug classification
167167
- Workaround recommendations
168168
- **Output:** `is_infrastructure_flake`, `infrastructure_workaround`
169169

170-
3. **Cross-Issue Correlator** (60-90s)
170+
3. **Cross-Issue Correlator**
171171
- JIRA search for similar historical issues
172172
- Failure frequency analysis (trend detection)
173173
- Known solution extraction
@@ -179,8 +179,6 @@ Spawns 3 specialized agents in parallel for deep investigation:
179179
- Code Archaeologist provides git context (commit/PR)
180180
- Cross-Issue Correlator provides historical patterns
181181

182-
**Performance:** 2-3 min per issue (vs 4-5 min sequential) = **55% time savings**
183-
184182
**Output:** `ci_analysis` field with:
185183
- Stage 1: `error_type`, `file_paths`, `error_signature_match`
186184
- Stage 2: `deep_analysis` with root cause, failure category, and RCA results

workflows/acs-triage/reference/constants.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -115,7 +115,7 @@ Central location for all hardcoded values used throughout the ACS triage workflo
115115

116116
| Constant | Value | Purpose |
117117
|----------|-------|---------|
118-
| RCA_AGENT_TIMEOUT_SECONDS | 120 | Max time per agent (Code Archaeologist: 90-120s, Infrastructure Detective: 60-90s, Cross-Issue Correlator: 60-90s) |
118+
| RCA_AGENT_TIMEOUT_SECONDS | 120 | Max time per agent |
119119
| RCA_AGGREGATION_TIMEOUT_SECONDS | 30 | Max time for findings aggregation |
120120
| RCA_TEAM_PREFIX | "ci-rca-" | Team name prefix for RCA teams (e.g., "ci-rca-ROX-12345") |
121121
| MIN_SIMILARITY_THRESHOLD | 0.70 | Min similarity (70%) for including historical issues in correlation results |

workflows/acs-triage/reference/rca-aggregation-rules.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -258,5 +258,4 @@ def sanitize(text):
258258

259259
- **Investigation Method**: Always set to `"multi_agent_parallel"` when using multi-agent RCA
260260
- **Null Handling**: If an agent fails or returns no data, use `null` for its fields (don't fail the whole aggregation)
261-
- **Performance**: Aggregation should complete within 30 seconds (RCA_AGGREGATION_TIMEOUT_SECONDS)
262261
- **Fallback**: If aggregation fails, fall back to single sequential analysis or description-only mode

0 commit comments

Comments
 (0)