The framework converts backend troubleshooting into a constrained execution loop:
- accept incident context
- select a runbook
- collect normalized evidence through adapters
- evaluate decision rules
- emit a structured report
Accepts an identifier plus user-facing symptom description.
Typical context locators:
trace_idrequest_idorder_idtask_idmessage_id
Important constraint: the framework should not depend on trace_id as the only entrypoint. In real systems, traces may be sampled, missing, or incomplete for async flows.
Adapters expose observability systems through stable read-only contracts.
Typical adapters:
- trace
- db readonly
- redis inspect
- mq inspect
- log search
- dependency call lookup
Adapters are responsible for:
- request validation
- access control
- source-specific normalization
- raw payload reference generation
The evidence layer turns raw adapter output into a common shape that the agent can reason over.
Without this layer, the project becomes a tool bundle rather than a debugging framework.
Runbooks define:
- prerequisites
- execution steps
- branch conditions
- minimum evidence requirements
- output expectations
Runbooks should constrain investigation order, not fully replace model reasoning.
The output should be a structured incident report with:
- observed symptom
- confirmed facts
- likely root cause
- alternative hypotheses
- evidence references
- recommended next actions
Recommended loop:
- Validate the incident input
- Choose the best matching runbook
- Execute required steps in order
- Normalize every tool result into evidence items
- Evaluate decision rules
- Stop when confidence threshold or runbook terminal state is reached
- Produce incident report
No adapter should mutate state in the MVP.
The agent can summarize and rank hypotheses, but the collection path should be largely inspectable and repeatable.
Runbooks should declare tool and step limits to avoid unbounded searching.
Adapters should support masking, allowlists, and query budgets.