Commit 51e6a98
refactor(gatekeeper): async eval runner with structured classes and stats tracking
- Make check_run_script async (acompletion) and add check_run_script_with_stats
returning GatekeeperStats (tokens, cost, latency) alongside results
- Introduce EvalSuite and FileEval classes in run-eval.py, replacing global
variables and eliminating duplicate single-file/multi-file code paths
- Load all test cases up front so progress output shows global numbering
(e.g. [3/121]) as individual evals complete
- Add StatsAggregator dataclass for cleaner inference statistics reporting
- Add GatekeeperException with optional stats for timeout, output limit,
and parse failures
- Add gatekeeper.cost config option for custom per-token cost accounting
- Switch summary/stats tables from plain text to rich.Table, fix latent bug
where file=sys.stderr was accepted but ignored
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>1 parent 9ffa5e3 commit 51e6a98
3 files changed
Lines changed: 396 additions & 182 deletions
0 commit comments