Last updated: 2026-02-24
- Measure Sentinel proxy overhead versus direct upstream for identical request shape.
- Keep benchmark reproducible and scriptable in CI.
- Publish machine-readable artifacts for regression gates.
Command:
npm run benchmark -- --duration 3 --connections 16 --pipelining 1
npm run benchmark:gate
npm run benchmark:datasetsPrimary output artifact:
metrics/benchmark-YYYY-MM-DD.jsondocs/benchmarks/results/standard-datasets.json
Canonical release snapshot used for comparison page:
docs/benchmarks/results/sentinel-v4.json(sourced frommetrics/benchmark-2026-02-23.json)
- Endpoint:
/v1/chat/completions - Request type: OpenAI-compatible chat completion payload
- Modes: direct upstream and through Sentinel proxy
- Sentinel profile for baseline overhead: monitor-first, injection + PII baseline toggles as defined by benchmark harness
- Same request payload and concurrency across compared paths.
- Same host machine and Node runtime for direct-vs-sentinel runs.
- No cherry-picked percentile: report at least p50/p95/p99.
- If a competitor metric is unavailable from reproducible local execution, report as
not_measuredexplicitly.
Competitor coverage and setup data is tracked in:
docs/benchmarks/results/competitor-coverage.json
Current state:
- OWASP coverage mapping is based on each tool's public documentation.
- Latency and setup timings are only claimed when reproducible in this repo.
- Unknown/unreproduced metrics are intentionally left null to avoid unverifiable claims.
Sentinel ships redistribution-safe mini benchmark fixtures derived from standard adversarial families:
docs/benchmarks/datasets/advbench-mini.jsondocs/benchmarks/datasets/trojai-mini.json
Execution path:
npm run benchmark:datasetsDetection model used in this harness:
- deterministic
InjectionScannerscore gate (>= 0.45) - deterministic tool-forgery heuristic for
tool_callsabuse payloads - no network calls and no LLM API usage
Scope boundary:
- this is not a claim of parity with full upstream benchmark corpora.
- this is a reproducible, in-repo regression baseline aligned with standard attack families.
- Re-run
npm run benchmarkandnpm run benchmark:gate. - Update
docs/benchmarks/results/sentinel-v4.jsonfrom latest stable metric file. - Re-verify competitor docs and refresh
competitor-coverage.jsonwithlast_verified. - Update comparison table date in
docs/benchmarks/COMPETITOR_COMPARISON.md.