OWASP · nidheesh-p · Jun 19, 2026
@@ -14,7 +14,7 @@ pip install -e . && agent-harness run scenarios/goal_hijack/basic.yaml --dry-run
 
 ```python
 src/agent_harness/
-  cli.py          # Entry point. argparse-based. Subcommands: version, validate, run
+  cli.py          # Entry point. argparse-based. Subcommands: version, validate, run, suite
   scenario.py     # Loads & validates YAML scenarios (Scenario dataclass)
   trace.py        # Trace dataclass (messages, tool_calls, events)
   assertions.py   # Evaluates assertions against traces. Each assertion = one function

@@ -9,6 +9,14 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ### Added
 
+- **`suite` subcommand** — `agent-harness suite <paths...> --trace-dir <dir>`
+  runs a directory of scenarios against trace files (mapped by scenario id to
+  `<trace-dir>/<scenario_id>.json`) and emits one aggregate summary plus
+  optional per-scenario result JSON via `--out-dir`. Scenarios that cannot run
+  (missing trace, malformed trace, invalid scenario, duplicate id) are recorded
+  as per-scenario `error`s without aborting the suite, and `--exit-on-fail`
+  composes the same way as `run`. Output validates against the new
+  `schemas/suite_result.schema.json`. Single-scenario `run` is unchanged.
 - **`--junit-out` flag** — write assertion results as JUnit XML for CI
   systems while preserving the existing result JSON output.
 - **MCP host CLI wiring** — add `agent-harness run --mcp-host-target ...`
@@ -26,6 +34,15 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
   HTTP targets via `agent-harness run --live` (default 30). 
 - **`version` field on `schemas/scenario.schema.json` and `schemas/result.schema.json`** — the authoritative numeric state of each schema, per the versioning policy in `docs/schema-versioning.md`. Both schemas now carry `"version": 1`.
 
+### Changed
+
+- **Scenario `id` charset** — scenario ids are now constrained to
+  `[A-Za-z0-9._-]` (enforced by both the Python validator and
+  `schemas/scenario.schema.json`). Ids are used as filesystem path components
+  by the new `suite` runner, so this prevents an id from traversing paths
+  outside the configured trace or output directory. All bundled scenarios
+  already comply.
+
 ## [0.1.0] — 2026-05-17
 
 First packaged release. Consolidates the v0.0.x development series into

@@ -98,6 +98,76 @@ correctly, so both gate steps treat it as a CI failure.
 harness writes JSON → gate (flag or post-scan) decides exit code → job pass/fail
 ```
 
+## Running a whole suite at once
+
+`agent-harness suite` runs many scenarios against a directory of trace files in
+one invocation and emits a single aggregate summary. It keeps single-scenario
+`run` unchanged — use `suite` when you have a folder of scenarios to gate on.
+
+```bash
+agent-harness suite scenarios/ \
+  --trace-dir traces/ \
+  --out-dir results/ \
+  --exit-on-fail
+```
+
+### Directory conventions
+
+- **Scenarios**: the positional arguments accept scenario files, directories
+  (searched recursively for `.yaml`/`.yml`), and glob patterns — the same
+  discovery rules as `agent-harness validate`.
+- **Traces**: each scenario is mapped to a trace file by its **scenario id**:
+  `<trace-dir>/<scenario_id>.json`. For a scenario whose id is
+  `goal_hijack.basic_001`, the suite looks for
+  `<trace-dir>/goal_hijack.basic_001.json`. Mapping by id (rather than by file
+  path) keeps the mapping stable when scenario files move, and scenario ids are
+  constrained to a filename-safe charset (`[A-Za-z0-9._-]`) so a trace lookup
+  can never escape `--trace-dir`.
+
+> Note: this id-based convention is specific to `suite`. The example traces
+> under `examples/traces/` use descriptive names and are not laid out this way;
+> to use them with `suite`, copy or rename each to `<scenario_id>.json`.
+
+### Output
+
+- `--out-dir` writes one `<scenario_id>.json` per scenario that ran (the same
+  shape as `agent-harness run`), plus an aggregate `summary.json`.
+- The aggregate summary is always printed to stdout. It contains the overall
+  `result`, per-status `counts` (`total`, `pass`, `fail`, `error`, `not_run`),
+  and one `scenarios` entry per scenario with its id, category, severity, the
+  trace path used, and the full `detail` result. This makes the summary a
+  self-contained audit record. It validates against
+  `schemas/suite_result.schema.json`.
+
+### Resilience and gating
+
+The suite never lets one broken input hide the rest. A scenario that cannot run
+is recorded as a per-scenario `error` (with an `error_reason`) and the suite
+continues:
+
+| `error_reason` | Cause |
+|----------------|-------|
+| `missing_trace` | No `<scenario_id>.json` under `--trace-dir` |
+| `invalid_trace` | The trace file exists but is malformed JSON |
+| `invalid_scenario` | The scenario YAML failed validation |
+| `duplicate_scenario_id` | Two discovered scenarios share an id |
+
+Exit behavior composes with CI the same way as `run`:
+
+- Without `--exit-on-fail`, `suite` always exits 0 and the summary JSON is the
+  source of truth.
+- With `--exit-on-fail`, `suite` exits 1 if **any** scenario is `fail` or
+  `error` — so a missing trace mapping or an unparseable scenario fails the
+  build rather than silently reducing coverage.
+- If the scenario arguments match nothing, or `--trace-dir` does not exist,
+  `suite` exits 1 immediately. An empty match is treated as an error, not a
+  vacuous pass.
+
+A suite where every scenario comes back `not_run` (for example, only
+recognized-but-unimplemented assertions) aggregates to `not_run` and does **not**
+fail under `--exit-on-fail`. Watch the `not_run` count in the summary so a
+green suite does not hide a suite that tested nothing.
+
 ## A note on `not_run`
 
 Some assertions are recognized by the harness but not fully implemented yet.

@@ -18,7 +18,8 @@
   "properties": {
     "id": {
       "type": "string",
-      "minLength": 1
+      "minLength": 1,
+      "pattern": "^[A-Za-z0-9._-]+$"
     },
     "title": {
       "type": "string",

@@ -0,0 +1,111 @@
+{
+  "$schema": "https://json-schema.org/draft/2020-12/schema",
+  "$id": "https://owasp.org/schemas/agent-security-regression-harness/suite_result.schema.json",
+  "title": "OWASP Agent Security Regression Harness Suite Result",
+  "version": 1,
+  "type": "object",
+  "required": [
+    "result",
+    "counts",
+    "scenarios"
+  ],
+  "additionalProperties": false,
+  "properties": {
+    "result": {
+      "type": "string",
+      "enum": [
+        "pass",
+        "fail",
+        "error",
+        "not_run"
+      ]
+    },
+    "counts": {
+      "type": "object",
+      "required": [
+        "total",
+        "pass",
+        "fail",
+        "error",
+        "not_run"
+      ],
+      "additionalProperties": false,
+      "properties": {
+        "total": {
+          "type": "integer",
+          "minimum": 0
+        },
+        "pass": {
+          "type": "integer",
+          "minimum": 0
+        },
+        "fail": {
+          "type": "integer",
+          "minimum": 0
+        },
+        "error": {
+          "type": "integer",
+          "minimum": 0
+        },
+        "not_run": {
+          "type": "integer",
+          "minimum": 0
+        }
+      }
+    },
+    "scenarios": {
+      "type": "array",
+      "items": {
+        "type": "object",
+        "required": [
+          "scenario_path",
+          "result"
+        ],
+        "additionalProperties": false,
+        "properties": {
+          "scenario_path": {
+            "type": "string",
+            "minLength": 1
+          },
+          "scenario_id": {
+            "type": "string",
+            "minLength": 1
+          },
+          "category": {
+            "type": "string"
+          },
+          "severity": {
+            "type": "string"
+          },
+          "trace_path": {
+            "type": "string"
+          },
+          "result": {
+            "type": "string",
+            "enum": [
+              "pass",
+              "fail",
+              "error",
+              "not_run"
+            ]
+          },
+          "error_reason": {
+            "type": "string",
+            "enum": [
+              "missing_trace",
+              "invalid_scenario",
+              "invalid_trace",
+              "duplicate_scenario_id"
+            ]
+          },
+          "evidence": {
+            "type": "string"
+          },
+          "detail": {
+            "type": "object"
+          }
+        }
+      }
+    }
+  }
+}
@@ -19,6 +19,7 @@
     run_scenario_with_openai_agent,
     run_scenario_with_python_target,
     run_scenario_with_trace,
+    run_suite,
 )
 from agent_harness.scenario import ScenarioValidationError, load_scenario
 from agent_harness.trace import TraceValidationError, load_trace
@@ -103,6 +104,40 @@ def build_parser() -> argparse.ArgumentParser:
         help="Scenario YAML file, directory, or glob pattern to validate.",
     )
 
+    suite_parser = subparsers.add_parser(
+        "suite",
+        help="Run a directory of scenarios against trace files and aggregate results.",
+    )
+    suite_parser.add_argument(
+        "scenario_paths",
+        nargs="+",
+        help="Scenario YAML files, directories, or glob patterns to run.",
+    )
+    suite_parser.add_argument(
+        "--trace-dir",
+        required=True,
+        help=(
+            "Directory of trace JSON files. Each scenario is matched to "
+            "'<trace-dir>/<scenario_id>.json'."
+        ),
+    )
+    suite_parser.add_argument(
+        "--out-dir",
+        help=(
+            "Optional directory to write per-scenario result JSON "
+            "('<scenario_id>.json') plus an aggregate 'summary.json'."
+        ),
+    )
+    suite_parser.add_argument(
+        "--exit-on-fail",
+        action="store_true",
+        help=(
+            "Exit with code 1 if any scenario's result is 'fail' or 'error' "
+            "(including missing trace mappings). Without this flag, 'suite' "
+            "exits 0 and the aggregate summary JSON is the source of truth."
+        ),
+    )
+
     run_parser = subparsers.add_parser(
         "run",
         help="Run a scenario file.",
@@ -248,6 +283,53 @@ def main() -> int:
         print(f"summary: {valid_count} valid, {invalid_count} invalid")
         return 1 if invalid_count else 0
 
+    if args.command == "suite":
+        scenario_files = _discover_scenario_files(args.scenario_paths)
+        if not scenario_files:
+            print("invalid: no scenario files matched", file=sys.stderr)
+            return 1
+
+        trace_dir = Path(args.trace_dir)
+        if not trace_dir.is_dir():
+            print(
+                f"invalid: trace directory does not exist: {trace_dir}",
+                file=sys.stderr,
+            )
+            return 1
+
+        suite_result = run_suite(scenario_files, trace_dir)
+
+        if args.out_dir:
+            out_dir = Path(args.out_dir)
+            out_dir.mkdir(parents=True, exist_ok=True)
+            for entry in suite_result.entries:
+                if entry.scenario_id is None or entry.detail is None:
+                    continue
+                result_path = out_dir / f"{entry.scenario_id}.json"
+                result_path.write_text(
+                    entry.detail.to_json() + "\n", encoding="utf-8"
+                )
+            (out_dir / "summary.json").write_text(
+                suite_result.to_json() + "\n", encoding="utf-8"
+            )
+
+        print(suite_result.to_json())
+
+        counts = suite_result.counts
+        print(
+            "summary: "
+            f"{counts['total']} scenarios, "
+            f"{counts['pass']} pass, "
+            f"{counts['fail']} fail, "
+            f"{counts['error']} error, "
+            f"{counts['not_run']} not_run",
+            file=sys.stderr,
+        )
+
+        if args.exit_on_fail and suite_result.result in {"fail", "error"}:
+            return 1
+
+        return 0
 
     if args.command == "run":
         selected_modes = [