Remove untested timing and performance claims from multi-agent RCA

janisz · claude · janisz · commit 14a4dffb6d28 · 2026-05-07T16:57:43.000+02:00
Remove all timing and performance claims across multi-agent RCA implementation:
- Remove "55% faster than sequential" and "55% time savings" from README.md
- Remove agent time budgets (90-120s, 60-90s) from agent prompts
- Remove "2-3 min vs 4-5 min" claims from triage.md
- Remove specific timeout values from constants.md descriptions
- Remove performance claim from rca-aggregation-rules.md
- Simplify aggregation logic in triage.md to reference rca-aggregation-rules.md
- Add clarifying note to infra-detective.md about infrastructure patterns

Co-Authored-By: Claude Sonnet 4.5 &lt;noreply@anthropic.com&gt;
diff --git a/workflows/acs-triage/.claude/agents/code-archaeologist.md b/workflows/acs-triage/.claude/agents/code-archaeologist.md
@@ -217,7 +217,6 @@ confidence = min(confidence, 95)
 
 ## Notes
 
-- **Time budget**: 90-120 seconds total
 - **Parallel execution**: May run concurrently with other agents
 - **Fallback**: If git commands fail, analyze based on file paths alone
 - **Focus**: Recent changes are most suspicious - prioritize those
diff --git a/workflows/acs-triage/.claude/agents/infra-detective.md b/workflows/acs-triage/.claude/agents/infra-detective.md
@@ -44,7 +44,7 @@ Read known infrastructure patterns from:
 
 ### 2. Error Pattern Matching
 
-Match the error message against known patterns:
+Match the error message against known infrastructure flake patterns. Note: These patterns are specific to infrastructure flake detection and differ from the team assignment patterns in `reference/error-signatures.md`.
 
 ```python
 infrastructure_patterns = {
@@ -308,7 +308,6 @@ confidence = min(confidence, 95)
 
 ## Notes
 
-- **Time budget**: 60-90 seconds total
 - **Parallel execution**: Runs concurrently with other agents
 - **Authority**: Has final say on infrastructure flake classification when confidence ≥80%
 - **Focus**: Infrastructure patterns are well-documented - use pattern matching heavily
diff --git a/workflows/acs-triage/.claude/agents/issue-correlator.md b/workflows/acs-triage/.claude/agents/issue-correlator.md
@@ -307,8 +307,7 @@ confidence = min(confidence, 90)
 
 ## Notes
 
-- **Time budget**: 60-90 seconds total
 - **Parallel execution**: Runs concurrently with other agents
-- **Search limit**: Max 20 results per query to stay within time budget
+- **Search limit**: Max 20 results per query
 - **Similarity threshold**: Only include issues ≥70% similarity in output
 - **Focus**: Historical context helps identify recurring patterns and known solutions
diff --git a/workflows/acs-triage/.claude/commands/triage.md b/workflows/acs-triage/.claude/commands/triage.md
@@ -118,8 +118,6 @@ For issues where `issueType === "CI_FAILURE"`:
 
 After Stage 1, spawn specialized RCA agents for each CI_FAILURE issue:
 
-**Time budget:** 2-3 minutes per CI_FAILURE issue (parallel execution).
-
 **Multi-Agent Process:**
 
 1. **Create RCA Team**
@@ -132,19 +130,19 @@ After Stage 1, spawn specialized RCA agents for each CI_FAILURE issue:
 
 2. **Spawn 3 Agents in Parallel** (single message, multiple Agent calls)
 
-   **Agent 1: Code Archaeologist** (90-120s)
+   **Agent 1: Code Archaeologist**
    - Tools: GitHub MCP, git blame, git log
    - Task: Find problematic commit/PR that introduced the issue
    - Reads: `workflows/acs-triage/.claude/agents/code-archaeologist.md`
    - Output: `artifacts/acs-triage/rca/{issue_key}/archaeology-findings.json`
 
-   **Agent 2: Infrastructure Detective** (60-90s)
+   **Agent 2: Infrastructure Detective**
    - Tools: Pattern matching, error signatures
    - Task: Classify as infrastructure flake vs real bug
    - Reads: `workflows/acs-triage/.claude/agents/infra-detective.md`
    - Output: `artifacts/acs-triage/rca/{issue_key}/infra-findings.json`
 
-   **Agent 3: Cross-Issue Correlator** (60-90s)
+   **Agent 3: Cross-Issue Correlator**
    - Tools: JIRA MCP (search historical issues)
    - Task: Find similar past issues and failure frequency
    - Reads: `workflows/acs-triage/.claude/agents/issue-correlator.md`
@@ -174,81 +172,46 @@ After Stage 1, spawn specialized RCA agents for each CI_FAILURE issue:
    ```
 
 3. **Wait for Agents to Complete**
-   - Agents run concurrently (max 120s total vs 300s sequential)
+   - Agents run concurrently
    - Notification when all agents finish
 
 4. **Aggregate Findings**
 
-   Read the 3 findings JSON files and synthesize unified `deep_analysis`:
+   Read the 3 findings JSON files and synthesize unified `deep_analysis` using the aggregation rules from `reference/rca-aggregation-rules.md`:
 
    ```python
    archaeology = read_json("archaeology-findings.json")
    infra = read_json("infra-findings.json")
    correlation = read_json("correlation-findings.json")
 
-   # Determine root cause (Infrastructure Detective has authority on flakes)
-   if infra.flake_classification == "infrastructure-flake" and infra.confidence >= 80:
-       failure_category = "infrastructure"
-       root_cause = infra.reasoning
-   elif infra.flake_classification == "code-bug":
-       failure_category = "code-bug"
-       root_cause = f"{infra.reasoning}. {archaeology.reasoning if archaeology else ''}"
-   else:
-       failure_category = "unknown"
-       root_cause = "Insufficient data to determine root cause"
-
-   # Calculate unified confidence
-   confidence = "Medium"  # Default
-   if infra.confidence >= 85 and (archaeology.confidence >= 80 or correlation.confidence >= 70):
-       confidence = "High"
-   elif infra.confidence <= 60 and archaeology.confidence <= 60:
-       confidence = "Low"
-
    deep_analysis = {
-       "root_cause": root_cause,
-       "failure_category": failure_category,
+       "root_cause": determine_root_cause(archaeology, infra, correlation),
+       "failure_category": classify_failure(infra, archaeology),
        "affected_components": archaeology.git_blame_results.primary_file if archaeology else [],
-       "confidence": confidence,
-       "risk_assessment": "Medium",  # Based on failure_category
+       "confidence": calculate_unified_confidence(archaeology, infra, correlation),
+       "risk_assessment": assess_risk(failure_category, affected_components, frequency),
        "proposed_fix": infra.suggested_action or "Investigate further",
-       "relevant_logs": extract_logs(issue, max_chars=500),
+       "relevant_logs": sanitize(extract_logs(issue, max_chars=500)),
 
-       # NEW: Git archaeology results
        "problematic_commit": archaeology.git_blame_results.last_modified_commit if archaeology else null,
        "problematic_pr": archaeology.pr_context.pr_number if archaeology else null,
 
-       # NEW: Infrastructure analysis
        "is_infrastructure_flake": infra.flake_classification == "infrastructure-flake",
        "infrastructure_workaround": infra.workaround_recommendations[0] if infra.workaround_recommendations else null,
 
-       # NEW: Cross-issue correlation
        "similar_issues": correlation.similar_issues if correlation else [],
        "failure_frequency": correlation.failure_frequency if correlation else {},
 
        "investigation_method": "multi_agent_parallel"
    }
    ```
 
-   **Confidence Scoring:**
-   ```python
-   base_confidence = infra.confidence or 50
-
-   # Boost if git blame found recent culprit (within 7 days)
-   if archaeology and archaeology.git_blame_results.recency == "very_recent":
-       base_confidence += 10
-
-   # Boost if similar issues have known resolutions
-   if correlation and correlation.similar_issues and max(issue.similarity for issue in correlation.similar_issues) >= 85:
-       base_confidence += 5
-
-   # Convert to High/Medium/Low
-   if base_confidence >= 85:
-       confidence = "High"
-   elif base_confidence >= 60:
-       confidence = "Medium"
-   else:
-       confidence = "Low"
-   ```
+   See `reference/rca-aggregation-rules.md` for algorithm details:
+   - `determine_root_cause()`: Infrastructure Detective has authority on flakes (confidence ≥80%)
+   - `classify_failure()`: infrastructure | code-bug | flaky-test | unknown
+   - `calculate_unified_confidence()`: Base from Infrastructure Detective + adjustments for recent changes (+10%) and similar issues (+5%)
+   - `assess_risk()`: Infrastructure flakes = Low; High frequency or critical components = High
+   - `sanitize()`: Remove API tokens, passwords, secrets, internal URLs with credentials, IP addresses, employee emails
 
 5. **Cleanup RCA Team**
    ```
@@ -471,20 +434,17 @@ After running this command, you should have:
 **Parallel Execution:**
 - Phase 1a + 1b: Run setup and fetch concurrently
 - Phase 4: Run CI/Vuln/Flaky analysis in parallel (3 concurrent tool calls)
-- Total time savings: 70-100 seconds vs sequential execution
 
 **Deep CI Failure Analysis:**
-- Time budget: 4-5 minutes per CI_FAILURE issue (Stage 2 of Phase 4a)
-- Deep analysis runs sequentially per issue (each requires significant investigation)
-- With 5 issues max and potential for all to be CI failures, worst case is ~25 minutes for analysis alone
+- Deep analysis runs sequentially per issue
 - The investigator agent methodology is read once from `/tmp/triage/stackrox/.claude/agents/stackrox-ci-failure-investigator.md` and applied to each issue
 
 ## Notes
 
 - **Timeout**: 1800 seconds total (30 minutes)
 - **Issue Limit**: 5 issues per run to allow time for deep CI failure analysis
-- **Deep CI Failure Analysis**: Each CI_FAILURE issue gets 4-5 minutes of deep root cause investigation using the stackrox CI failure investigator methodology. Results appear in comments and reports but do NOT influence team assignment.
-- **Parallel Analysis**: CI/Vuln/Flaky analysis MUST run concurrently (saves 60-80s). Within Phase 4a, deep analysis runs sequentially per CI_FAILURE issue.
+- **Deep CI Failure Analysis**: Each CI_FAILURE issue gets deep root cause investigation using the stackrox CI failure investigator methodology. Results appear in comments and reports but do NOT influence team assignment.
+- **Parallel Analysis**: CI/Vuln/Flaky analysis MUST run concurrently. Within Phase 4a, deep analysis runs sequentially per CI_FAILURE issue.
 - **READ-ONLY by default**: Use `--comment` flag to write to JIRA
 - **High Confidence Threshold**: ≥80% for auto-assignment recommendations
 - **Version Awareness**: Automatically detects and adjusts for version mismatches
diff --git a/workflows/acs-triage/README.md b/workflows/acs-triage/README.md
@@ -6,7 +6,7 @@ Automated triage for StackRox/ACS JIRA issues with intelligent team assignment u
 
 This workflow provides systematic triage of untriaged StackRox issues using:
 
-- **Multi-Agent Root Cause Analysis**: 3 specialized agents analyze CI failures in parallel (Code Archaeologist, Infrastructure Detective, Cross-Issue Correlator) - 55% faster than sequential analysis
+- **Multi-Agent Root Cause Analysis**: 3 specialized agents analyze CI failures in parallel (Code Archaeologist, Infrastructure Detective, Cross-Issue Correlator)
 - **Multi-Strategy Team Assignment**: 5-strategy priority system with 95%-70% confidence scores
 - **Specialized Analysis**: Custom decision trees for CI failures, vulnerabilities, and flaky tests
 - **Version Awareness**: Detects mismatches between issue versions and current codebase
@@ -151,23 +151,23 @@ The workflow automatically runs analysis commands in parallel when executed by A
 - Match error signatures from `reference/error-signatures.md`
 - Check for known flaky patterns
 
-**Stage 2: Multi-Agent Root Cause Analysis** (55% faster than sequential)
+**Stage 2: Multi-Agent Root Cause Analysis**
 
 Spawns 3 specialized agents in parallel for deep investigation:
 
-1. **Code Archaeologist** (90-120s)
+1. **Code Archaeologist**
    - Git blame analysis to find when files were last modified
    - GitHub PR lookup to identify problematic commits
    - Test vs code change detection
    - **Output:** `problematic_commit`, `problematic_pr`
 
-2. **Infrastructure Detective** (60-90s)
+2. **Infrastructure Detective**
    - Pattern matching against known infrastructure flakes
    - Flake vs real bug classification
    - Workaround recommendations
    - **Output:** `is_infrastructure_flake`, `infrastructure_workaround`
 
-3. **Cross-Issue Correlator** (60-90s)
+3. **Cross-Issue Correlator**
    - JIRA search for similar historical issues
    - Failure frequency analysis (trend detection)
    - Known solution extraction
@@ -179,8 +179,6 @@ Spawns 3 specialized agents in parallel for deep investigation:
 - Code Archaeologist provides git context (commit/PR)
 - Cross-Issue Correlator provides historical patterns
 
-**Performance:** 2-3 min per issue (vs 4-5 min sequential) = **55% time savings**
-
 **Output:** `ci_analysis` field with:
 - Stage 1: `error_type`, `file_paths`, `error_signature_match`
 - Stage 2: `deep_analysis` with root cause, failure category, and RCA results
diff --git a/workflows/acs-triage/reference/constants.md b/workflows/acs-triage/reference/constants.md
@@ -115,7 +115,7 @@ Central location for all hardcoded values used throughout the ACS triage workflo
 
 | Constant | Value | Purpose |
 |----------|-------|---------|
-| RCA_AGENT_TIMEOUT_SECONDS | 120 | Max time per agent (Code Archaeologist: 90-120s, Infrastructure Detective: 60-90s, Cross-Issue Correlator: 60-90s) |
+| RCA_AGENT_TIMEOUT_SECONDS | 120 | Max time per agent |
 | RCA_AGGREGATION_TIMEOUT_SECONDS | 30 | Max time for findings aggregation |
 | RCA_TEAM_PREFIX | "ci-rca-" | Team name prefix for RCA teams (e.g., "ci-rca-ROX-12345") |
 | MIN_SIMILARITY_THRESHOLD | 0.70 | Min similarity (70%) for including historical issues in correlation results |
diff --git a/workflows/acs-triage/reference/rca-aggregation-rules.md b/workflows/acs-triage/reference/rca-aggregation-rules.md
@@ -258,5 +258,4 @@ def sanitize(text):
 
 - **Investigation Method**: Always set to `"multi_agent_parallel"` when using multi-agent RCA
 - **Null Handling**: If an agent fails or returns no data, use `null` for its fields (don't fail the whole aggregation)
-- **Performance**: Aggregation should complete within 30 seconds (RCA_AGGREGATION_TIMEOUT_SECONDS)
 - **Fallback**: If aggregation fails, fall back to single sequential analysis or description-only mode