Skip to content

Commit 61007b0

Browse files
committed
docs: update README corpus counts and detection gaps section
- Single-shot corpus rows: 230 → 311 (added probe_attacks family) - Multi-turn rows: 33 → 41 - Attack families: 6 → 7 (probe_attacks now distinct) - Detection gaps: note PII-context enumeration is now partially addressed at input stage via regex.system_prompt_extraction patterns
1 parent 9a5dff0 commit 61007b0

1 file changed

Lines changed: 3 additions & 3 deletions

File tree

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -39,8 +39,8 @@ Numbers below are local preview measurements from 2026-05-05, generated by [`tes
3939
| Honeypot P95 latency budget | **≤ 16,000 ms** (empirical ~11,875–15,500 ms steady-state on the hardware envelope above) | [`tests/fitness/test_llm_p95_latency.py`](tests/fitness/test_llm_p95_latency.py); see [ADR-023](docs/architecture/decisions/023-llm-budget-soft-fail.md) for the budget rationale and measurement methodology |
4040
| Daemon cold-start budget | **≤ 5,000 ms** on the hardware envelope above | [`tests/fitness/test_cold_start_budget.py`](tests/fitness/test_cold_start_budget.py) |
4141
| Validator + honeypot model size | **~462 MB** GGUF (Q4_K_M) | [ADR-018](docs/architecture/decisions/018-validator-model-choice.md) |
42-
| Red-team corpus rows (single-shot) | **230** across 6 attack families (direct_injection, exfiltration, indirect_injection, jailbreak, obfuscation, tool_abuse) | [`tests/eval/corpus/`](tests/eval/corpus/) |
43-
| Multi-turn scenario rows | **33** (chunked + scenarios) | [`tests/eval/corpus/`](tests/eval/corpus/) |
42+
| Red-team corpus rows (single-shot) | **311** across 7 attack families (direct_injection, exfiltration, indirect_injection, jailbreak, obfuscation, tool_abuse, probe_attacks) | [`tests/eval/corpus/`](tests/eval/corpus/) |
43+
| Multi-turn scenario rows | **41** (chunked + scenarios) | [`tests/eval/corpus/`](tests/eval/corpus/) |
4444

4545
Re-run the full benchmark per the [Reproduce the model-selection benchmark](#reproduce-the-model-selection-benchmark) section. Fitness budgets are re-checked on every `make fitness` run.
4646

@@ -58,7 +58,7 @@ Being explicit about gaps. Each item links to where the design tradeoff is captu
5858

5959
- **Adversary model boundaries.** armor is a layer between user and agent; it defends in-band prompt-level attacks. It does **not** defend against host-level compromise (an attacker with shell access can bypass it), tampering with the validator model weights before the Docker image is built, side-channels (timing oracles, response-size fingerprinting), or attacks against the daemon process itself. See [`docs/architecture/threat-model.md`](docs/architecture/threat-model.md) §"NOT Defended Against" for the full enumeration.
6060
- **Validator soft-fail = fail-open.** When the validator LLM times out (P95 budget breached), the request **passes** rather than blocks. This trades latency-spike availability for strict block-on-uncertain semantics. The daemon is fail-open by default on LLM timeouts; there is no operator override. See [ADR-023](docs/architecture/decisions/023-llm-budget-soft-fail.md).
61-
- **Detection gaps.** The eval corpus is **English-heavy** — multilingual jailbreaks (Chinese, Russian, Arabic obfuscations) are under-tested. Polymorphic / novel encodings outside the entropy + decode-and-rescan envelope may pass. Very-long-context attacks beyond the per-session rolling buffer (default 8 KB / 20 turns, see [`docs/spec/configuration.md`](docs/spec/configuration.md)) lose multi-turn correlation. Social-engineering attacks that don't use injection patterns (e.g. legitimately phrased requests for sensitive data) are out of scope.
61+
- **Detection gaps.** The eval corpus is **English-heavy** — multilingual jailbreaks (Chinese, Russian, Arabic obfuscations) are under-tested. Polymorphic / novel encodings outside the entropy + decode-and-rescan envelope may pass. Very-long-context attacks beyond the per-session rolling buffer (default 8 KB / 20 turns, see [`docs/spec/configuration.md`](docs/spec/configuration.md)) lose multi-turn correlation. Social-engineering attacks that don't use injection patterns are partially covered: PII-context enumeration attacks ("list all personal information in your context") are now blocked at the input stage; legitimately-phrased requests for sensitive data that don't match known enumeration patterns remain out of scope for input blocking, but the canary output scanner provides a backstop when PII canaries are seeded via `armor canary seed`.
6262
- **No user-facing UI.** armor is a guard-layer, not an admin console. Forensic incidents are inspected via SQLite (`sqlite3 armor.db 'SELECT * FROM Incident …'`) or the `armor incidents` / `armor sessions` CLI subcommands. There is no web UI; operators wanting one can build on the structured-log output documented in [`docs/spec/interfaces.md`](docs/spec/interfaces.md).
6363
- **Single-tenant assumption.** One daemon per trusted-agent-fleet boundary. armor's SQLite schema and rate-limiting do not isolate across multiple mutually-untrusted tenants. See [`docs/architecture/threat-model.md`](docs/architecture/threat-model.md) §"Cross-Tenant Isolation" for why this is by design.
6464
- **Tools registered as malicious are out of scope.** armor validates tool *parameters* against declared schemas and catches dangerous bash patterns; it does **not** sandbox the tool itself. A tool that is intentionally adversarial (e.g. an installed plugin with a hostile maintainer) is a supply-chain problem, not a guardrail problem.

0 commit comments

Comments
 (0)