You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Make AgentCI diffs easier to debug at the step level
AgentCI could tell users that an episode changed, but the public backlog
correctly identified that regression debugging was still too flat and too
shallow. This change adds structured step-level diff items, carries them
through regression results and JSON output, and exposes field-level step
changes in the HTML report so users can see exactly where a candidate run
starts to diverge.
Constraint: Preserve existing text diff prefixes so ignore rules such as metric:* keep working
Rejected: Replace flat diff items entirely with structured output | too disruptive for current CLI and regression consumers
Rejected: Limit the enhancement to HTML only | lower value than fixing the core compare pipeline once
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Keep future diff enrichments additive and compatibility-aware; downstream tooling may already depend on diff item prefixes
Tested: AgentCI unittest suite; direct CLI validation for diff JSON, assert-regression output, and generated HTML report field-level change rendering
Not-tested: Very large deeply nested payload diffs across long episodes
0 commit comments