Skip to content

feat(cli): add suite runner for scenario and trace directories (#85)#154

Open
nidheesh-p wants to merge 1 commit into
OWASP:mainfrom
nidheesh-p:feature/suite-runner
Open

feat(cli): add suite runner for scenario and trace directories (#85)#154
nidheesh-p wants to merge 1 commit into
OWASP:mainfrom
nidheesh-p:feature/suite-runner

Conversation

@nidheesh-p

Copy link
Copy Markdown

Closes #85.

Summary

Adds an agent-harness suite subcommand that runs a directory of scenarios against a directory of trace files and produces one aggregate result. Single-scenario run behavior is unchanged.

agent-harness suite scenarios/ --trace-dir traces/ --out-dir results/ --exit-on-fail
  • Discovery: reuses the same files/directories/globs logic as validate.
  • Mapping: each scenario maps to <trace-dir>/<scenario_id>.json by id.
  • Output: per-scenario result JSON into --out-dir plus an aggregate summary.json, and the summary is always printed to stdout. Validated against the new schemas/suite_result.schema.json.
  • Gating: --exit-on-fail exits 1 on any fail/error, composing the same way as run. An empty match or missing --trace-dir exits 1 rather than passing vacuously.

Acceptance criteria (#85)

  • Minimal suite command that keeps single-scenario behavior intact (new suite subcommand; run untouched)
  • Trace-file based suites
  • Per-scenario result JSON and an aggregate summary
  • Exit behavior composes with --exit-on-fail
  • Tests for partial failures and missing trace mappings
  • Documented directory conventions (docs/ci-github-actions.md)

Security / robustness hardening

This went through a security/governance review before implementation:

  • Path traversal: scenario ids are used as path components, so ids are now constrained to [A-Za-z0-9._-] (Python validator + scenario.schema.json), with a path-containment check in the runner as defense-in-depth. A trace lookup can never escape --trace-dir.
  • Duplicate ids, invalid scenario YAML, malformed trace JSON, and missing traces are each recorded as a per-scenario error (with an error_reason) and the suite continues — one broken input never hides the rest.
  • The summary carries per-status counts and provenance (trace path, severity, category) so it is a self-contained audit record; not_run counts are surfaced so a green suite cannot hide a suite that tested nothing.

Testing

  • 368 tests pass (new CLI integration tests, runner unit tests, and a suite_result.schema.json contract test).
  • ruff check and mypy clean.
  • Manual end-to-end run confirmed mixed pass/fail/error aggregation, --exit-on-fail exit codes, out-dir writes, and missing-trace resilience.

AI-assisted contribution disclosure

  • Tool: Claude Code (Opus 4.8).
  • AI-assisted parts: implementation of the suite command, result models, schema, tests, and docs.
  • Review: the author reviewed all output, ran the full test suite, ruff, mypy, and a manual end-to-end smoke test; the design was shaped by an explicit security/governance review pass.

🤖 Generated with Claude Code

Add an `agent-harness suite` subcommand that runs a directory of scenarios
against trace files and emits one aggregate summary plus optional
per-scenario result JSON. Single-scenario `run` is unchanged.

- Map each scenario to `<trace-dir>/<scenario_id>.json` by id.
- Constrain scenario ids to a filename-safe charset and add a path
  containment check so a trace lookup can never escape `--trace-dir`.
- Detect duplicate scenario ids; record per-scenario errors (missing
  trace, malformed trace, invalid scenario, duplicate id) without
  aborting the suite.
- Fail an empty match / missing trace dir rather than passing vacuously.
- Emit per-status counts and provenance (trace path, severity, category)
  in the summary, validated against schemas/suite_result.schema.json.
- `--exit-on-fail` gates on any fail/error, composing with `run`.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@nidheesh-p nidheesh-p requested a review from mertsatilmaz as a code owner June 19, 2026 21:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add suite-level runner for scenario and trace directories

1 participant