docs: update README corpus counts and detection gaps section

tkdtaylor · tkdtaylor · commit 61007b082118 · 2026-05-17T19:46:53.000-04:00
- Single-shot corpus rows: 230 → 311 (added probe_attacks family)
- Multi-turn rows: 33 → 41
- Attack families: 6 → 7 (probe_attacks now distinct)
- Detection gaps: note PII-context enumeration is now partially addressed
  at input stage via regex.system_prompt_extraction patterns
diff --git a/README.md b/README.md
@@ -39,8 +39,8 @@ Numbers below are local preview measurements from 2026-05-05, generated by [`tes
 | Honeypot P95 latency budget | **≤ 16,000 ms** (empirical ~11,875–15,500 ms steady-state on the hardware envelope above) | [`tests/fitness/test_llm_p95_latency.py`](tests/fitness/test_llm_p95_latency.py); see [ADR-023](docs/architecture/decisions/023-llm-budget-soft-fail.md) for the budget rationale and measurement methodology |
 | Daemon cold-start budget | **≤ 5,000 ms** on the hardware envelope above | [`tests/fitness/test_cold_start_budget.py`](tests/fitness/test_cold_start_budget.py) |
 | Validator + honeypot model size | **~462 MB** GGUF (Q4_K_M) | [ADR-018](docs/architecture/decisions/018-validator-model-choice.md) |
-| Red-team corpus rows (single-shot) | **230** across 6 attack families (direct_injection, exfiltration, indirect_injection, jailbreak, obfuscation, tool_abuse) | [`tests/eval/corpus/`](tests/eval/corpus/) |
-| Multi-turn scenario rows | **33** (chunked + scenarios) | [`tests/eval/corpus/`](tests/eval/corpus/) |
+| Red-team corpus rows (single-shot) | **311** across 7 attack families (direct_injection, exfiltration, indirect_injection, jailbreak, obfuscation, tool_abuse, probe_attacks) | [`tests/eval/corpus/`](tests/eval/corpus/) |
+| Multi-turn scenario rows | **41** (chunked + scenarios) | [`tests/eval/corpus/`](tests/eval/corpus/) |
 
 Re-run the full benchmark per the [Reproduce the model-selection benchmark](#reproduce-the-model-selection-benchmark) section. Fitness budgets are re-checked on every `make fitness` run.
 
@@ -58,7 +58,7 @@ Being explicit about gaps. Each item links to where the design tradeoff is captu
 
 - **Adversary model boundaries.** armor is a layer between user and agent; it defends in-band prompt-level attacks. It does **not** defend against host-level compromise (an attacker with shell access can bypass it), tampering with the validator model weights before the Docker image is built, side-channels (timing oracles, response-size fingerprinting), or attacks against the daemon process itself. See [`docs/architecture/threat-model.md`](docs/architecture/threat-model.md) §"NOT Defended Against" for the full enumeration.
 - **Validator soft-fail = fail-open.** When the validator LLM times out (P95 budget breached), the request **passes** rather than blocks. This trades latency-spike availability for strict block-on-uncertain semantics. The daemon is fail-open by default on LLM timeouts; there is no operator override. See [ADR-023](docs/architecture/decisions/023-llm-budget-soft-fail.md).
-- **Detection gaps.** The eval corpus is **English-heavy** — multilingual jailbreaks (Chinese, Russian, Arabic obfuscations) are under-tested. Polymorphic / novel encodings outside the entropy + decode-and-rescan envelope may pass. Very-long-context attacks beyond the per-session rolling buffer (default 8 KB / 20 turns, see [`docs/spec/configuration.md`](docs/spec/configuration.md)) lose multi-turn correlation. Social-engineering attacks that don't use injection patterns (e.g. legitimately phrased requests for sensitive data) are out of scope.
+- **Detection gaps.** The eval corpus is **English-heavy** — multilingual jailbreaks (Chinese, Russian, Arabic obfuscations) are under-tested. Polymorphic / novel encodings outside the entropy + decode-and-rescan envelope may pass. Very-long-context attacks beyond the per-session rolling buffer (default 8 KB / 20 turns, see [`docs/spec/configuration.md`](docs/spec/configuration.md)) lose multi-turn correlation. Social-engineering attacks that don't use injection patterns are partially covered: PII-context enumeration attacks ("list all personal information in your context") are now blocked at the input stage; legitimately-phrased requests for sensitive data that don't match known enumeration patterns remain out of scope for input blocking, but the canary output scanner provides a backstop when PII canaries are seeded via `armor canary seed`.
 - **No user-facing UI.** armor is a guard-layer, not an admin console. Forensic incidents are inspected via SQLite (`sqlite3 armor.db 'SELECT * FROM Incident …'`) or the `armor incidents` / `armor sessions` CLI subcommands. There is no web UI; operators wanting one can build on the structured-log output documented in [`docs/spec/interfaces.md`](docs/spec/interfaces.md).
 - **Single-tenant assumption.** One daemon per trusted-agent-fleet boundary. armor's SQLite schema and rate-limiting do not isolate across multiple mutually-untrusted tenants. See [`docs/architecture/threat-model.md`](docs/architecture/threat-model.md) §"Cross-Tenant Isolation" for why this is by design.
 - **Tools registered as malicious are out of scope.** armor validates tool *parameters* against declared schemas and catches dangerous bash patterns; it does **not** sandbox the tool itself. A tool that is intentionally adversarial (e.g. an installed plugin with a hostile maintainer) is a supply-chain problem, not a guardrail problem.