Paper Reference
- Title: ErrorProbe: Towards Self-Improving Error Diagnosis in Multi-Agent Systems
- Authors: Jiazheng Li, Emine Yilmaz, Bei Chen, Dieu-Thu Le
- Year: 2026
- URL: https://arxiv.org/abs/2604.17658
- Venue: ACL 2026 Findings
Paper Summary
Self-improving framework for semantic failure attribution that identifies responsible agents and originating error steps. Operates via a three-stage pipeline: anomaly detection → symptom-driven backward tracing → multi-agent validation through tool-grounded execution. Maintains episodic memory of verified diagnoses without expert annotation.
Proposed Feature
Implement intelligent backward failure attribution that traces from failure symptoms back to root causes:
Core Capabilities
- Symptom Detection: Automatically identify failure symptoms in agent output (errors, unexpected outputs, quality drops)
- Backward Tracing: Walk the causal chain backward from the failure symptom to identify the originating decision step
- Multi-Agent Attribution: When multiple agents are involved, identify which agent's decision caused the failure
- Learning Loop: Accumulate verified diagnoses as an episodic memory to improve future failure attribution
Technical Approach
- Add failure symptom classification to the SDK's event types
- Implement backward causal tracing algorithm on stored session data
- Build attribution visualization showing the causal chain from symptom to root cause
- Add episodic memory storage for verified diagnoses
Impact
Transforms Peaky Peek from a "see what happened" tool to a "tell me why it failed" tool. This is the #1 most requested capability for agent debuggers based on user research.
Labels
enhancement, paper-inspired, high-priority, analytics
Paper Reference
Paper Summary
Self-improving framework for semantic failure attribution that identifies responsible agents and originating error steps. Operates via a three-stage pipeline: anomaly detection → symptom-driven backward tracing → multi-agent validation through tool-grounded execution. Maintains episodic memory of verified diagnoses without expert annotation.
Proposed Feature
Implement intelligent backward failure attribution that traces from failure symptoms back to root causes:
Core Capabilities
Technical Approach
Impact
Transforms Peaky Peek from a "see what happened" tool to a "tell me why it failed" tool. This is the #1 most requested capability for agent debuggers based on user research.
Labels
enhancement, paper-inspired, high-priority, analytics