feat(cli): add suite runner for scenario and trace directories (#85)#154
Open
nidheesh-p wants to merge 1 commit into
Open
feat(cli): add suite runner for scenario and trace directories (#85)#154nidheesh-p wants to merge 1 commit into
nidheesh-p wants to merge 1 commit into
Conversation
Add an `agent-harness suite` subcommand that runs a directory of scenarios against trace files and emits one aggregate summary plus optional per-scenario result JSON. Single-scenario `run` is unchanged. - Map each scenario to `<trace-dir>/<scenario_id>.json` by id. - Constrain scenario ids to a filename-safe charset and add a path containment check so a trace lookup can never escape `--trace-dir`. - Detect duplicate scenario ids; record per-scenario errors (missing trace, malformed trace, invalid scenario, duplicate id) without aborting the suite. - Fail an empty match / missing trace dir rather than passing vacuously. - Emit per-status counts and provenance (trace path, severity, category) in the summary, validated against schemas/suite_result.schema.json. - `--exit-on-fail` gates on any fail/error, composing with `run`. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #85.
Summary
Adds an
agent-harness suitesubcommand that runs a directory of scenarios against a directory of trace files and produces one aggregate result. Single-scenariorunbehavior is unchanged.validate.<trace-dir>/<scenario_id>.jsonby id.--out-dirplus an aggregatesummary.json, and the summary is always printed to stdout. Validated against the newschemas/suite_result.schema.json.--exit-on-failexits 1 on anyfail/error, composing the same way asrun. An empty match or missing--trace-direxits 1 rather than passing vacuously.Acceptance criteria (#85)
suitesubcommand;rununtouched)--exit-on-faildocs/ci-github-actions.md)Security / robustness hardening
This went through a security/governance review before implementation:
[A-Za-z0-9._-](Python validator +scenario.schema.json), with a path-containment check in the runner as defense-in-depth. A trace lookup can never escape--trace-dir.error(with anerror_reason) and the suite continues — one broken input never hides the rest.not_runcounts are surfaced so a green suite cannot hide a suite that tested nothing.Testing
suite_result.schema.jsoncontract test).ruff checkandmypyclean.--exit-on-failexit codes, out-dir writes, and missing-trace resilience.AI-assisted contribution disclosure
suitecommand, result models, schema, tests, and docs.🤖 Generated with Claude Code