Skip to content

Commit 51e6a98

Browse files
owtaylorclaude
andcommitted
refactor(gatekeeper): async eval runner with structured classes and stats tracking
- Make check_run_script async (acompletion) and add check_run_script_with_stats returning GatekeeperStats (tokens, cost, latency) alongside results - Introduce EvalSuite and FileEval classes in run-eval.py, replacing global variables and eliminating duplicate single-file/multi-file code paths - Load all test cases up front so progress output shows global numbering (e.g. [3/121]) as individual evals complete - Add StatsAggregator dataclass for cleaner inference statistics reporting - Add GatekeeperException with optional stats for timeout, output limit, and parse failures - Add gatekeeper.cost config option for custom per-token cost accounting - Switch summary/stats tables from plain text to rich.Table, fix latent bug where file=sys.stderr was accepted but ignored Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 9ffa5e3 commit 51e6a98

3 files changed

Lines changed: 396 additions & 182 deletions

File tree

0 commit comments

Comments
 (0)