Context
Follow-up to #1912, which migrated Waza eval references to Vally. That PR deliberately wired .github/workflows/eval.yml to a single hardcoded --eval-spec path to keep scope small and get CI green.
Goal
Replace the single --eval-spec flag with --suite <name> driven by .vally.yaml suite config, so CI runs a curated set of eval specs (not just one).
Target
- Inner loop (local dev):
vally-cli eval --eval-spec <one> — fast iteration on a single spec.
- Medium loop (CI):
vally-cli eval --suite pr — curated set targeting 5–10 minute wall-clock runtime.
- Outer loop (nightly / manual):
vally-cli eval --suite full or individual dispatches — full coverage.
Prereqs
Acceptance
.vally.yaml defines pr and full suites with explicit tag filters
.github/workflows/eval.yml Run evaluations step uses --suite pr
- CI wall-clock time for the
pr suite is 5–10 minutes
- Results artifact still uploads at the standard path
Related
Context
Follow-up to #1912, which migrated Waza eval references to Vally. That PR deliberately wired
.github/workflows/eval.ymlto a single hardcoded--eval-specpath to keep scope small and get CI green.Goal
Replace the single
--eval-specflag with--suite <name>driven by.vally.yamlsuite config, so CI runs a curated set of eval specs (not just one).Target
vally-cli eval --eval-spec <one>— fast iteration on a single spec.vally-cli eval --suite pr— curated set targeting 5–10 minute wall-clock runtime.vally-cli eval --suite fullor individual dispatches — full coverage.Prereqs
cost: freenow reflects reality, so suite filters behave correctly.prandfullsuites in.vally.yamlwith the right tag filters.Acceptance
.vally.yamldefinesprandfullsuites with explicit tag filters.github/workflows/eval.ymlRun evaluations step uses--suite prprsuite is 5–10 minutesRelated