UX + observability patch release (0.3.1) by Ar9av · Pull Request #21 · PrismorSec/immunity-agent

Ar9av · 2026-04-24T15:43:43Z

Summary

UX and observability patch release based on the end-to-end benchmark (120-attack / 539-benign / 1,000-call latency) run against tier1-ux-coverage. All nine changes are behaviour-compatible for existing policy authors; SDK embedders gain two new entry points.

Fixes

#	Issue found in bench	Fix
1	`install-hooks` appended a duplicate when re-run with a different `--mode`, leaving both `observe` and `enforce` commands in `settings.json`	Install path now strips any existing Warden-owned hook (by `warden/cli.py` marker) before merging, matching uninstall semantics. Idempotent by construction.
2	Session-log events wrote `"ts": null` for every agent (Claude/Cursor/Windsurf/Hermes/Openclaw/Copilot) because payloads don't include `timestamp`. SIEM correlation broke.	`_make_ts(payload)` helper returns ISO 8601 UTC stamp on receipt when the payload lacks one. Applied to all six `_normalize_*` functions.
3	`hook-dispatch` was silent on `action: warn` rules — exit 0, empty stderr, HIGH-severity warnings invisible to the user and the agent.	Warn findings now emit `[warden WARN] [SEV] rule-id: title` to stderr regardless of mode. `action: block` behaviour is unchanged.

Additions (new entry points)

#	What	Where
4	`PolicyEngine.evaluate` accepts CLI-style type aliases (`command`→`shell`, `read`→`file_read`, `write`→`file_write`) so an embedder can pass whichever vocabulary is natural for its layer	`warden/policy_engine.py`
5	`PolicyEngine.evaluate(event)` — `index` and `session_id` now default, so the common SDK call-site works without boilerplate	`warden/policy_engine.py`
6	Subcommand-scoped argparse errors — wrong flag inside a subcommand now shows the subcommand's usage and a `warden <sub>: error: ...` line instead of dumping the top-level help wall	`warden/cli.py::build_parser`
7	`PolicyEngine.from_defaults(workspace=...)` classmethod as the explicit SDK factory	`warden/policy_engine.py`
8	`warden install-hooks` prints scope + mode and, when at project scope, a one-line hint about `--scope user`	`warden/cli.py`
9	`warden policy show --json` emits the rule list as structured JSON (workspace, project-policy path, per-rule id/severity/category/action/event-types/fields, allowlists) for scripts	`warden/cli.py`

Test plan

pytest -x -q — 236/236 pass (227 existing + 9 new regression tests covering idempotency, timestamp, and the embedder API).
Bench detection run on Lightsail — coverage unchanged vs 0.3.0 (55/97 hard-block, 82/97 detect, 15 silent miss; 0.19% hard-block FPR, 2.04% total-finding FPR on 539 benign cases). The nine fixes do not regress detection.
E2E verification on Lightsail (ubuntu@44.214.4.208):
- Fix feat: warden tokenization — secret prevention layer for Claude Code #1: re-running install-hooks --mode observe then --mode enforce leaves exactly one Warden command per hook event (3 total in Claude's three-event install), only --mode enforce persists.
- Fix updated readme #2: session.jsonl last entry has "ts": "2026-04-24T15:42:04.005098Z" instead of null.
- Fix fix: fail loudly when PyYAML is missing instead of silent policy bypass #3: cat /etc/shadow now prints [warden WARN] [HIGH] path-traversal: ... to stderr, exit 0.
- Fix feat: add 15 detection rules for credential aggregation, persistence, config tampering #4/docs: add Docker hardening guide, known limitations, and prerequisites #5/fix: prevent project-level policy from disabling critical security rules #7: PolicyEngine.from_defaults().evaluate({"type":"command","command":"rm -rf /"}) returns destructive-command finding.
- Fix fix: prevent regex evasion via newlines and symlink path bypass #6: warden check --command foo now prints the check subparser usage and warden check: error: unrecognized arguments: --command.
- Fix Add Hermes gateway as a supported agent #8: warden install-hooks --agent claude --mode enforce prints the scope hint.
- Fix Add Trae and Kilocode to sweep scan targets #11: warden policy show --json emits parseable JSON with activeRuleCount: 54.

Bench artefacts

The pre-0.3.1 run of the 120-attack / 539-benign / 1,000-call benchmark is in /Users/ar9av/Downloads/warden-bench/. The write-up is in /Users/ar9av/Downloads/warden-bench-report.md and /Users/ar9av/Downloads/warden-paper/warden_benchmark.pdf. All nine fixes map to issues disclosed in §7 of the bench report ("What we found that we didn't know before").

Not in this PR

Add timeout=30 to HTTP requests and subprocess calls #9 cold-start latency (344 ms p50 per hook call, dominated by CPython start-up). Needs a persistent-daemon integration — out of scope for a patch release; roadmapped for 0.4.x.
Add Gemini CLI hook adapter, sweep target, and skill scan #12 sessions roll-up (warden sessions tail --limit N across all session files). New CLI feature, not a fix.
Detection-rule gaps surfaced by the bench (claude-credential-access missing top-level ~/.claude.json, no rule for scp -r ~/.ssh, --registry=URL equals form, etc.) — tracked as a separate detection-coverage patch, distinct from UX.

Tighten destructive-command regex so it only matches genuinely destructive rm targets — false positives like `rm -rf ./node_modules` and `rm -rf /tmp/build` no longer block legitimate cleanup. New patterns cover `rm -rf /`, system dirs, `$HOME`/`~`, `*`, `..`, and `sudo rm -rf`. Expand policy coverage: - persistence rules for /etc/cron.d/, /etc/crontab, /etc/systemd/system/, /etc/init.d/, /etc/rc.local, /etc/profile.d/ - tls-verification-disabled rule (git sslVerify false, GIT_SSL_NO_VERIFY, StrictHostKeyChecking=no, curl -k, NODE_TLS_REJECT_UNAUTHORIZED, etc.) - npm/yarn/pnpm --registry and git+ supply-chain patterns - shell-obfuscation rule for base64/xxd/printf hex decode-and-execute Workspace resolution: argparse subparsers declaring --workspace were clobbering the top-level value with None. CLI now scans argv directly and falls back to PRISMOR_WARDEN_WORKSPACE env var. This also fixes egress allowlist enforcement in `warden check` and `policy init --workspace` writing to the wrong directory, both of which depended on workspace flowing through correctly. Scanner now reads project-level .claude/settings.json, .mcp.json, and .claude/settings.local.json from the workspace — previously workspace MCP configs were ignored entirely. should_block() loads default categories from the policy when block_categories is None so the public API works for direct callers (3 previously-failing unit tests now pass). uninstall-hooks removes cloaking hooks too (userprompt-guard.sh, decloak.sh, recloak-mcp.sh) so the claude settings.json ends up clean instead of partially stripped. sweep --clean / --restore catch RuntimeError when stdin has no TTY and exit 1 with a clean message pointing at PRISMOR_SWEEP_PASS, instead of raising an unhandled Python traceback. upgrade_feed.py is now idempotent (no-op if no changes) so the signature isn't invalidated on every run. When it does write, it auto re-signs via pipeline/sign_feed.sh if PRISMOR_SIGNING_PRIVATE_KEY is set; otherwise it prints a loud warning. SARIF output now populates rules[] with all 53 policy rules including titles, severities, categories, and helpUri — giving GitHub Code Scanning and other consumers full rule metadata. Minor UX: analyze and session accept positional args in addition to --input / --session-id, and `check` no longer prints the resolved symlink path as a trailing line that breaks parsers. All 227 tests pass.

Three edge-case fixes surfaced by remote variation testing: 1. rm -rf ../build was incorrectly blocked because the parent-dir pattern matched .. followed by any /. Tightened to .. or ../.. or ../../.. chains (dir-escape), not relative build paths. 2. rm -r -f /etc (and rm -f -r /etc) weren't caught because the lookahead consumed the inter-flag space. Switched to non-consuming (?=\s|$) lookahead so [^\n]* can backtrack onto the -f token. 3. rm --recursive --force / and --force --recursive are now matched via long-form alternation in the same lookahead; bare --recursive without --force is left lenient (consistent with bare -r).

Old pattern 'nc\s+.*-[a-z]*l[a-z]*\s*.*-p' required -l and -p as separate flags, so nc -lvp 4444 -e /bin/bash (very common reverse shell one-liner) slipped past. Three patterns now cover: nc listener with explicit -e/--exec, nc with combined -lvp/-lp flags, and nc with separate -l and -p flags.

- rm -rf "/" and rm -rf '/' now blocked (quoted target) - /dev/tcp/<hostname>/port now caught (was digit-only) - git -c http.sslVerify=false clone caught - curl -sk, -ksL, -Lk (combined flag) caught - npm/yarn/pnpm --registry flag caught regardless of position (npm i short form, registry flag before or after install/i/add)

Previously only the root / case accepted quoted targets, so rm -rf "/etc" slipped past while rm -rf "/" was blocked. Extended optional quote match to the system-critical-path, top-level-dir, and $HOME/~ branches for consistency.

Implements every Tier-1 item from IMPROVEMENT_PLAN.md: New subsystems - warden canary plant|list|remove|status — honey-token credentials with webhook beacons. First AI-agent-specific canary impl. - warden/sinks.py — webhook, syslog (UDP/TCP), and file (JSON/CEF) SIEM sinks configured via settings.outputs in policy.yaml. - warden/policy_test.py — declarative test runner; bundled OWASP LLM Top 10 + Agentic Top 10 + MITRE ATLAS starter pack (28 cases). Detection coverage - agent-instruction-tampering rule: CLAUDE.md, AGENTS.md, .cursorrules, .windsurfrules, .github/copilot-instructions.md. - MCP schema auditor: overbroad allowlists, any-typed params on exec-capable tools, missing input schemas, fs+net+exec capability combination, risky description language. - Unicode homoglyph detection: Cyrillic / Greek / Latin-extended confusables, fullwidth letters, zero-width joiners. - Lockfile integrity: non-registry sources, missing integrity hashes, lockfile-injection (deps in lock but not in package.json). check ergonomics - --explain prints matched rule + full pattern. - --from-log PATH replays a JSONL session log through current policy. - --suggest-allowlist emits a ready-to-paste allowlist entry. Bump to 0.3.0; CHANGELOG added. 227/227 unit tests pass; 28/28 OWASP starter cases pass.

Surfaces six issues that the end-to-end bench (120-attack / 539-benign / 1,000-call latency) flagged. All changes are behaviour-compatible for existing policy authors; SDK embedders gain two new entry points. Fixed: - install-hooks idempotent: re-run with different --mode now replaces the existing Warden hook instead of appending a duplicate entry. - Session-log events carry ISO 8601 UTC timestamps (was null, broke SIEM correlation). - hook-dispatch surfaces WARN-level findings to stderr as [warden WARN] [SEV] rule-id: title (was silent; exit 0 with no feedback meant HIGH-severity warns could fire invisibly). Added: - PolicyEngine.from_defaults() classmethod for embedders. - PolicyEngine.evaluate(event) default args (index=0, session_id=""). - Event-type aliases: command→shell, read→file_read, write→file_write. - warden policy show --json for scripting. - Subcommand-scoped argparse errors (shows the subcommand's usage, not the top-level wall). - install-hooks prints scope+mode and hints at --scope user for global coverage. Tests: 236/236 pass (227 existing + 9 new regression tests for idempotency, timestamp, embedder API).

Ar9av added 8 commits April 24, 2026 09:09

shell-obfuscation: match perl pack with q{H*} alternate quotes

e11e3a1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UX + observability patch release (0.3.1)#21

UX + observability patch release (0.3.1)#21
Ar9av wants to merge 8 commits into
mainfrom
ux-fixes-0.3.1

Ar9av commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Ar9av commented Apr 24, 2026

Summary

Fixes

Additions (new entry points)

Test plan

Bench artefacts

Not in this PR

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant