Commit 51e6a98

and

committed

refactor(gatekeeper): async eval runner with structured classes and stats tracking

- Make check_run_script async (acompletion) and add check_run_script_with_stats returning GatekeeperStats (tokens, cost, latency) alongside results - Introduce EvalSuite and FileEval classes in run-eval.py, replacing global variables and eliminating duplicate single-file/multi-file code paths - Load all test cases up front so progress output shows global numbering (e.g. [3/121]) as individual evals complete - Add StatsAggregator dataclass for cleaner inference statistics reporting - Add GatekeeperException with optional stats for timeout, output limit, and parse failures - Add gatekeeper.cost config option for custom per-token cost accounting - Switch summary/stats tables from plain text to rich.Table, fix latent bug where file=sys.stderr was accepted but ignored Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

1 parent 9ffa5e3 commit 51e6a98Copy full SHA for 51e6a98

3 files changed

eval/gatekeeper
- run-eval.py
src/linux_mcp_server
- config.py
- gatekeeper
  - check_run_script.py

Comments

(0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit 51e6a98

Uh oh!

File tree

0 commit comments