Skip to content

Latest commit

 

History

History
17 lines (13 loc) · 1.28 KB

File metadata and controls

17 lines (13 loc) · 1.28 KB

Evaluation

veronica-core includes reproducible evaluation of runtime containment across four canonical runaway failure modes (retry amplification, recursive tools, multi-agent loops, WebSocket runaway):

  • Technical paper -- system design, threat model, formal safety guarantees (G1-G6), evaluation
  • Baseline comparison -- no containment vs veronica across four scenarios (avg 78.8% call reduction)
  • Ablation study -- incremental component contribution (BudgetEnforcer, AgentStepGuard, CircuitBreaker, RetryContainer)
  • Real incident reproduction -- five real-world failure scenarios with before/after comparison
  • Scale simulation -- 1 to 1000 concurrent agent chains (~83.1% reduction, ~12.63 us/chain overhead)
  • Reproducibility guide -- environment, commands, expected output, verification against paper claims

Supporting theory: