You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Replace authority hierarchy with synthesis approach for RCA
Change root cause determination from strict authority hierarchy to
synthesizing all three agent reports into one plausible narrative.
Key changes:
- Combine all agent findings instead of using Infrastructure Detective
as final authority
- Add minority_report field to highlight dissenting perspectives (≥50%
confidence)
- Weight evidence by confidence but don't exclude lower-confidence
insights
- Update examples to show both consensus and conflicting scenarios
- Rename 'Agent Authority Hierarchy' to 'Agent Contribution Weighting'
This allows presenting the most plausible explanation while preserving
alternative interpretations that may be valuable for investigation.
Copy file name to clipboardExpand all lines: workflows/acs-triage/.claude/commands/triage.md
+7-3Lines changed: 7 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -196,12 +196,16 @@ After Stage 1, spawn specialized RCA agents for each CI_FAILURE issue:
196
196
197
197
**Aggregation Process:**
198
198
- Load findings from the three JSON files: archaeology-findings.json, infra-findings.json, and correlation-findings.json
199
-
- Determine root cause by following the authority hierarchy (Infrastructure Detective has final authority on flakes with ≥80% confidence)
200
-
- Classify failure category based on Infrastructure Detective's analysis (if confidence ≥80%), or use archaeology signals as fallback
199
+
-**Synthesize root cause** by combining all three agents' findings into one coherent narrative that sounds plausible:
200
+
- Weight evidence by agent confidence but don't exclude lower-confidence insights that add context
201
+
- If agents agree, state the consensus
202
+
- If agents provide complementary information, integrate it (e.g., "Infrastructure pattern X triggered by recent code change Y")
203
+
- If agents disagree with reasonable confidence (≥50%), include main finding in root_cause and dissenting view in minority_report
204
+
- Classify failure category based on Infrastructure Detective's analysis (weighted by confidence ≥80%), with archaeology providing supporting signals
201
205
- Extract affected components from archaeology's git blame results if available
202
206
- Calculate unified confidence starting from Infrastructure Detective's base score, adding +10% for very recent changes and +5% for high-similarity matches
203
207
- Assess risk based on failure category (infrastructure flakes = Low, high frequency or critical components = High, otherwise Medium)
204
-
- Extract proposed fix from Infrastructure Detective's suggested action
208
+
- Extract proposed fix from Infrastructure Detective's suggested action or archaeology context
205
209
- Sanitize and extract relevant logs (max 500 chars, removing tokens, passwords, internal URLs, IPs, and employee emails)
206
210
- Include problematic commit and PR from archaeology if available
207
211
- Flag infrastructure flakes and include workaround recommendations
Copy file name to clipboardExpand all lines: workflows/acs-triage/reference/rca-aggregation-rules.md
+88-17Lines changed: 88 additions & 17 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,24 +2,37 @@
2
2
3
3
This document defines how findings from the three RCA agents (Code Archaeologist, Infrastructure Detective, Cross-Issue Correlator) are aggregated into a unified `deep_analysis` object.
4
4
5
-
## Agent Authority Hierarchy
5
+
## Agent Contribution Weighting
6
6
7
-
When agents disagree on classification, use this authority hierarchy:
7
+
When synthesizing findings from multiple agents:
8
8
9
-
1.**Infrastructure Detective** - Has final authority on infrastructure flake classification when confidence ≥80%
10
-
2.**Code Archaeologist** - Has authority on identifying problematic commits/PRs
11
-
3.**Cross-Issue Correlator** - Provides supporting evidence, does not override other agents
9
+
1.**Infrastructure Detective** - Primary source for pattern-based classification and infrastructure flake detection; weight increases with confidence ≥80%
10
+
2.**Code Archaeologist** - Primary source for commit/PR attribution and code change context; weight increases with recency of changes
11
+
3.**Cross-Issue Correlator** - Primary source for frequency trends and historical patterns; weight increases with high-similarity matches (≥85%)
12
+
13
+
**Integration Principle:** Combine all three perspectives rather than using strict hierarchy. When agents disagree, include the majority finding in the main root cause and note dissenting views in the minority report.
12
14
13
15
## Root Cause Determination
14
16
15
17
**Algorithm:**
16
18
17
-
1. If Infrastructure Detective classified this as an infrastructure flake with confidence ≥80%, use the Infrastructure Detective's reasoning as the root cause
18
-
2. If Infrastructure Detective classified this as a code bug:
19
-
- Start with Infrastructure Detective's reasoning
20
-
- If Code Archaeologist found git blame results, append the archaeology reasoning
21
-
- Return the combined root cause
22
-
3. Otherwise (unknown classification), return "Insufficient data to determine root cause"
19
+
Synthesize findings from all three agents into a single coherent root cause narrative:
20
+
21
+
1.**Combine all available evidence:**
22
+
- Start with the most concrete findings (Infrastructure Detective's pattern analysis, Code Archaeologist's git blame results)
23
+
- Incorporate frequency and historical context from Cross-Issue Correlator
24
+
- Weight evidence by agent confidence levels, but don't exclude low-confidence insights that add context
25
+
26
+
2.**Generate unified root cause:**
27
+
- Integrate all perspectives into one narrative that sounds plausible
28
+
- If agents agree, state the consensus
29
+
- If agents provide complementary information, weave it together (e.g., "Infrastructure pattern X triggered by recent code change Y")
30
+
- If insufficient data across all agents, state "Insufficient data to determine root cause"
31
+
32
+
3.**Add minority report (if applicable):**
33
+
- If agents disagree or provide alternative explanations with reasonable confidence (≥50%), include a "minority_report" field
34
+
- Format: Brief statement of the alternative perspective with attribution (e.g., "Code Archaeologist suggests recent refactor in PR #123 may be a contributing factor")
35
+
- This highlights uncertainty without committing to a single explanation when evidence is mixed
23
36
24
37
## Failure Category Classification
25
38
@@ -63,7 +76,8 @@ When agents disagree on classification, use this authority hierarchy:
63
76
64
77
```json
65
78
{
66
-
"root_cause": "<unified root cause from determine_root_cause()>",
79
+
"root_cause": "<unified narrative synthesizing all three agents' findings>",
80
+
"minority_report": "<alternative perspectives from dissenting agents, if any; null if consensus>",
67
81
"failure_category": "<from classify_failure()>",
68
82
"affected_components": ["<from archaeology or existing ci_analysis>"],
69
83
"confidence": "<High | Medium | Low from calculate_unified_confidence()>",
@@ -88,12 +102,13 @@ When agents disagree on classification, use this authority hierarchy:
**Resolution:**Infrastructure Detective wins if confidence ≥80%
105
+
**Resolution:**Synthesize both perspectives
92
106
93
107
**Decision Logic:**
94
-
- If Infrastructure Detective confidence ≥80% → use Infrastructure Detective's classification
95
-
- Otherwise, if Code Archaeologist confidence ≥70% → use "code-bug" (archaeology found recent code change)
96
-
- Otherwise → use "unknown" (conflicting low-confidence signals)
108
+
-**Primary finding (root_cause):** If Infrastructure Detective has confidence ≥80%, lead with their infrastructure flake classification but note the code change context from archaeology
109
+
- Example: "Infrastructure timeout pattern detected (intermittent test runner issues). Recent code change in PR #123 may have increased susceptibility to timing issues."
110
+
-**Minority report:** If Code Archaeologist has confidence ≥70%, note: "Code Archaeologist identifies recent change in PR #123 as potential root cause rather than infrastructure"
111
+
-**If both have low confidence (<70%):** State "Conflicting signals - infrastructure pattern suggests flake, but recent code changes warrant investigation" and mark confidence as Low
97
112
98
113
### Scenario: Multiple similar issues with different root causes
99
114
@@ -142,6 +157,8 @@ Before writing to `deep_analysis`, sanitize all text fields:
142
157
143
158
## Example Aggregation
144
159
160
+
### Example 1: Consensus Scenario
161
+
145
162
**Inputs:**
146
163
147
164
-**Archaeology**: Found commit `abc123` in PR #12345, 4 days ago (very recent), code-under-test changed
@@ -152,7 +169,8 @@ Before writing to `deep_analysis`, sanitize all text fields:
152
169
153
170
```json
154
171
{
155
-
"root_cause": "GraphQL schema validation error - template emits Boolean placeholders without resolvers. Recent code change (4 days ago) in PR #12345 likely introduced this regression.",
172
+
"root_cause": "GraphQL schema validation error - template emits Boolean placeholders without resolvers. Recent code change (4 days ago) in PR #12345 likely introduced this regression, similar to previously resolved issue ROX-11111.",
@@ -185,6 +203,57 @@ Before writing to `deep_analysis`, sanitize all text fields:
185
203
}
186
204
```
187
205
206
+
### Example 2: Conflicting Perspectives
207
+
208
+
**Inputs:**
209
+
210
+
-**Archaeology**: No recent code changes in affected files (last change 3 months ago), confidence 60%
211
+
-**Infrastructure**: Classified as infrastructure-flake with 85% confidence, intermittent test runner timeout pattern
212
+
-**Correlation**: Found 3 similar issues with same timeout pattern in last 30 days, all resolved by retry
213
+
214
+
**Aggregated Output:**
215
+
216
+
```json
217
+
{
218
+
"root_cause": "Intermittent test runner timeout pattern consistent with infrastructure instability. Multiple similar failures in past 30 days (3 occurrences) all resolved by retry without code changes.",
219
+
"minority_report": "Code Archaeologist notes no recent changes in affected test files (last change 3 months ago), suggesting this is not a regression but could indicate latent timing sensitivity in the test itself.",
0 commit comments