Skip to content

feat: Intelligent backward failure attribution (ErrorProbe) #186

@acailic

Description

@acailic

Paper Reference

  • Title: ErrorProbe: Towards Self-Improving Error Diagnosis in Multi-Agent Systems
  • Authors: Jiazheng Li, Emine Yilmaz, Bei Chen, Dieu-Thu Le
  • Year: 2026
  • URL: https://arxiv.org/abs/2604.17658
  • Venue: ACL 2026 Findings

Paper Summary

Self-improving framework for semantic failure attribution that identifies responsible agents and originating error steps. Operates via a three-stage pipeline: anomaly detection → symptom-driven backward tracing → multi-agent validation through tool-grounded execution. Maintains episodic memory of verified diagnoses without expert annotation.

Proposed Feature

Implement intelligent backward failure attribution that traces from failure symptoms back to root causes:

Core Capabilities

  • Symptom Detection: Automatically identify failure symptoms in agent output (errors, unexpected outputs, quality drops)
  • Backward Tracing: Walk the causal chain backward from the failure symptom to identify the originating decision step
  • Multi-Agent Attribution: When multiple agents are involved, identify which agent's decision caused the failure
  • Learning Loop: Accumulate verified diagnoses as an episodic memory to improve future failure attribution

Technical Approach

  • Add failure symptom classification to the SDK's event types
  • Implement backward causal tracing algorithm on stored session data
  • Build attribution visualization showing the causal chain from symptom to root cause
  • Add episodic memory storage for verified diagnoses

Impact

Transforms Peaky Peek from a "see what happened" tool to a "tell me why it failed" tool. This is the #1 most requested capability for agent debuggers based on user research.

Labels

enhancement, paper-inspired, high-priority, analytics

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions