Skip to content

Commit dd79eff

Browse files
committed
Replace authority hierarchy with synthesis approach for RCA
Change root cause determination from strict authority hierarchy to synthesizing all three agent reports into one plausible narrative. Key changes: - Combine all agent findings instead of using Infrastructure Detective as final authority - Add minority_report field to highlight dissenting perspectives (≥50% confidence) - Weight evidence by confidence but don't exclude lower-confidence insights - Update examples to show both consensus and conflicting scenarios - Rename 'Agent Authority Hierarchy' to 'Agent Contribution Weighting' This allows presenting the most plausible explanation while preserving alternative interpretations that may be valuable for investigation.
1 parent 4385938 commit dd79eff

2 files changed

Lines changed: 95 additions & 20 deletions

File tree

workflows/acs-triage/.claude/commands/triage.md

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -196,12 +196,16 @@ After Stage 1, spawn specialized RCA agents for each CI_FAILURE issue:
196196

197197
**Aggregation Process:**
198198
- Load findings from the three JSON files: archaeology-findings.json, infra-findings.json, and correlation-findings.json
199-
- Determine root cause by following the authority hierarchy (Infrastructure Detective has final authority on flakes with ≥80% confidence)
200-
- Classify failure category based on Infrastructure Detective's analysis (if confidence ≥80%), or use archaeology signals as fallback
199+
- **Synthesize root cause** by combining all three agents' findings into one coherent narrative that sounds plausible:
200+
- Weight evidence by agent confidence but don't exclude lower-confidence insights that add context
201+
- If agents agree, state the consensus
202+
- If agents provide complementary information, integrate it (e.g., "Infrastructure pattern X triggered by recent code change Y")
203+
- If agents disagree with reasonable confidence (≥50%), include main finding in root_cause and dissenting view in minority_report
204+
- Classify failure category based on Infrastructure Detective's analysis (weighted by confidence ≥80%), with archaeology providing supporting signals
201205
- Extract affected components from archaeology's git blame results if available
202206
- Calculate unified confidence starting from Infrastructure Detective's base score, adding +10% for very recent changes and +5% for high-similarity matches
203207
- Assess risk based on failure category (infrastructure flakes = Low, high frequency or critical components = High, otherwise Medium)
204-
- Extract proposed fix from Infrastructure Detective's suggested action
208+
- Extract proposed fix from Infrastructure Detective's suggested action or archaeology context
205209
- Sanitize and extract relevant logs (max 500 chars, removing tokens, passwords, internal URLs, IPs, and employee emails)
206210
- Include problematic commit and PR from archaeology if available
207211
- Flag infrastructure flakes and include workaround recommendations

workflows/acs-triage/reference/rca-aggregation-rules.md

Lines changed: 88 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -2,24 +2,37 @@
22

33
This document defines how findings from the three RCA agents (Code Archaeologist, Infrastructure Detective, Cross-Issue Correlator) are aggregated into a unified `deep_analysis` object.
44

5-
## Agent Authority Hierarchy
5+
## Agent Contribution Weighting
66

7-
When agents disagree on classification, use this authority hierarchy:
7+
When synthesizing findings from multiple agents:
88

9-
1. **Infrastructure Detective** - Has final authority on infrastructure flake classification when confidence ≥80%
10-
2. **Code Archaeologist** - Has authority on identifying problematic commits/PRs
11-
3. **Cross-Issue Correlator** - Provides supporting evidence, does not override other agents
9+
1. **Infrastructure Detective** - Primary source for pattern-based classification and infrastructure flake detection; weight increases with confidence ≥80%
10+
2. **Code Archaeologist** - Primary source for commit/PR attribution and code change context; weight increases with recency of changes
11+
3. **Cross-Issue Correlator** - Primary source for frequency trends and historical patterns; weight increases with high-similarity matches (≥85%)
12+
13+
**Integration Principle:** Combine all three perspectives rather than using strict hierarchy. When agents disagree, include the majority finding in the main root cause and note dissenting views in the minority report.
1214

1315
## Root Cause Determination
1416

1517
**Algorithm:**
1618

17-
1. If Infrastructure Detective classified this as an infrastructure flake with confidence ≥80%, use the Infrastructure Detective's reasoning as the root cause
18-
2. If Infrastructure Detective classified this as a code bug:
19-
- Start with Infrastructure Detective's reasoning
20-
- If Code Archaeologist found git blame results, append the archaeology reasoning
21-
- Return the combined root cause
22-
3. Otherwise (unknown classification), return "Insufficient data to determine root cause"
19+
Synthesize findings from all three agents into a single coherent root cause narrative:
20+
21+
1. **Combine all available evidence:**
22+
- Start with the most concrete findings (Infrastructure Detective's pattern analysis, Code Archaeologist's git blame results)
23+
- Incorporate frequency and historical context from Cross-Issue Correlator
24+
- Weight evidence by agent confidence levels, but don't exclude low-confidence insights that add context
25+
26+
2. **Generate unified root cause:**
27+
- Integrate all perspectives into one narrative that sounds plausible
28+
- If agents agree, state the consensus
29+
- If agents provide complementary information, weave it together (e.g., "Infrastructure pattern X triggered by recent code change Y")
30+
- If insufficient data across all agents, state "Insufficient data to determine root cause"
31+
32+
3. **Add minority report (if applicable):**
33+
- If agents disagree or provide alternative explanations with reasonable confidence (≥50%), include a "minority_report" field
34+
- Format: Brief statement of the alternative perspective with attribution (e.g., "Code Archaeologist suggests recent refactor in PR #123 may be a contributing factor")
35+
- This highlights uncertainty without committing to a single explanation when evidence is mixed
2336

2437
## Failure Category Classification
2538

@@ -63,7 +76,8 @@ When agents disagree on classification, use this authority hierarchy:
6376

6477
```json
6578
{
66-
"root_cause": "<unified root cause from determine_root_cause()>",
79+
"root_cause": "<unified narrative synthesizing all three agents' findings>",
80+
"minority_report": "<alternative perspectives from dissenting agents, if any; null if consensus>",
6781
"failure_category": "<from classify_failure()>",
6882
"affected_components": ["<from archaeology or existing ci_analysis>"],
6983
"confidence": "<High | Medium | Low from calculate_unified_confidence()>",
@@ -88,12 +102,13 @@ When agents disagree on classification, use this authority hierarchy:
88102

89103
### Scenario: Archaeology says code-bug, Infrastructure says flake
90104

91-
**Resolution:** Infrastructure Detective wins if confidence ≥80%
105+
**Resolution:** Synthesize both perspectives
92106

93107
**Decision Logic:**
94-
- If Infrastructure Detective confidence ≥80% → use Infrastructure Detective's classification
95-
- Otherwise, if Code Archaeologist confidence ≥70% → use "code-bug" (archaeology found recent code change)
96-
- Otherwise → use "unknown" (conflicting low-confidence signals)
108+
- **Primary finding (root_cause):** If Infrastructure Detective has confidence ≥80%, lead with their infrastructure flake classification but note the code change context from archaeology
109+
- Example: "Infrastructure timeout pattern detected (intermittent test runner issues). Recent code change in PR #123 may have increased susceptibility to timing issues."
110+
- **Minority report:** If Code Archaeologist has confidence ≥70%, note: "Code Archaeologist identifies recent change in PR #123 as potential root cause rather than infrastructure"
111+
- **If both have low confidence (<70%):** State "Conflicting signals - infrastructure pattern suggests flake, but recent code changes warrant investigation" and mark confidence as Low
97112

98113
### Scenario: Multiple similar issues with different root causes
99114

@@ -142,6 +157,8 @@ Before writing to `deep_analysis`, sanitize all text fields:
142157

143158
## Example Aggregation
144159

160+
### Example 1: Consensus Scenario
161+
145162
**Inputs:**
146163

147164
- **Archaeology**: Found commit `abc123` in PR #12345, 4 days ago (very recent), code-under-test changed
@@ -152,7 +169,8 @@ Before writing to `deep_analysis`, sanitize all text fields:
152169

153170
```json
154171
{
155-
"root_cause": "GraphQL schema validation error - template emits Boolean placeholders without resolvers. Recent code change (4 days ago) in PR #12345 likely introduced this regression.",
172+
"root_cause": "GraphQL schema validation error - template emits Boolean placeholders without resolvers. Recent code change (4 days ago) in PR #12345 likely introduced this regression, similar to previously resolved issue ROX-11111.",
173+
"minority_report": null,
156174
"failure_category": "code-bug",
157175
"affected_components": ["central/graphql/generator/codegen/codegen.go.tpl"],
158176
"confidence": "High",
@@ -185,6 +203,57 @@ Before writing to `deep_analysis`, sanitize all text fields:
185203
}
186204
```
187205

206+
### Example 2: Conflicting Perspectives
207+
208+
**Inputs:**
209+
210+
- **Archaeology**: No recent code changes in affected files (last change 3 months ago), confidence 60%
211+
- **Infrastructure**: Classified as infrastructure-flake with 85% confidence, intermittent test runner timeout pattern
212+
- **Correlation**: Found 3 similar issues with same timeout pattern in last 30 days, all resolved by retry
213+
214+
**Aggregated Output:**
215+
216+
```json
217+
{
218+
"root_cause": "Intermittent test runner timeout pattern consistent with infrastructure instability. Multiple similar failures in past 30 days (3 occurrences) all resolved by retry without code changes.",
219+
"minority_report": "Code Archaeologist notes no recent changes in affected test files (last change 3 months ago), suggesting this is not a regression but could indicate latent timing sensitivity in the test itself.",
220+
"failure_category": "infrastructure",
221+
"affected_components": ["qa/test/integration/sensor_test.go"],
222+
"confidence": "High",
223+
"risk_assessment": "Low",
224+
"proposed_fix": "Retry build - infrastructure timeout pattern",
225+
"relevant_logs": "timeout: test exceeded 10m deadline",
226+
227+
"problematic_commit": null,
228+
"problematic_pr": null,
229+
230+
"is_infrastructure_flake": true,
231+
"infrastructure_workaround": "Retry test execution with increased timeout threshold",
232+
233+
"similar_issues": [
234+
{
235+
"key": "ROX-12001",
236+
"similarity": 88,
237+
"root_cause": "Infrastructure timeout",
238+
"solution": "Retry succeeded"
239+
},
240+
{
241+
"key": "ROX-12055",
242+
"similarity": 85,
243+
"root_cause": "Test runner timeout",
244+
"solution": "Retry succeeded"
245+
}
246+
],
247+
"failure_frequency": {
248+
"count_30d": 3,
249+
"classification": "Medium",
250+
"trend": "increasing"
251+
},
252+
253+
"investigation_method": "multi_agent_parallel"
254+
}
255+
```
256+
188257
**Confidence Calculation:**
189258
- Base: 90 (Infrastructure Detective)
190259
- +10 (very recent code change)
@@ -196,3 +265,5 @@ Before writing to `deep_analysis`, sanitize all text fields:
196265
- **Investigation Method**: Always set to `"multi_agent_parallel"` when using multi-agent RCA
197266
- **Null Handling**: If an agent fails or returns no data, use `null` for its fields (don't fail the whole aggregation)
198267
- **Fallback**: If aggregation fails, fall back to single sequential analysis or description-only mode
268+
- **Minority Report**: Include dissenting perspectives with confidence ≥50% to highlight uncertainty; set to `null` when agents reach consensus
269+
- **Synthesis Over Hierarchy**: Combine all agent findings into a coherent narrative rather than strictly following authority hierarchy

0 commit comments

Comments
 (0)