Skip to content

UX + observability patch release (0.3.1)#21

Open
Ar9av wants to merge 8 commits into
mainfrom
ux-fixes-0.3.1
Open

UX + observability patch release (0.3.1)#21
Ar9av wants to merge 8 commits into
mainfrom
ux-fixes-0.3.1

Conversation

@Ar9av
Copy link
Copy Markdown
Contributor

@Ar9av Ar9av commented Apr 24, 2026

Summary

UX and observability patch release based on the end-to-end benchmark (120-attack / 539-benign / 1,000-call latency) run against tier1-ux-coverage. All nine changes are behaviour-compatible for existing policy authors; SDK embedders gain two new entry points.

Fixes

# Issue found in bench Fix
1 install-hooks appended a duplicate when re-run with a different --mode, leaving both observe and enforce commands in settings.json Install path now strips any existing Warden-owned hook (by warden/cli.py marker) before merging, matching uninstall semantics. Idempotent by construction.
2 Session-log events wrote "ts": null for every agent (Claude/Cursor/Windsurf/Hermes/Openclaw/Copilot) because payloads don't include timestamp. SIEM correlation broke. _make_ts(payload) helper returns ISO 8601 UTC stamp on receipt when the payload lacks one. Applied to all six _normalize_* functions.
3 hook-dispatch was silent on action: warn rules — exit 0, empty stderr, HIGH-severity warnings invisible to the user and the agent. Warn findings now emit [warden WARN] [SEV] rule-id: title to stderr regardless of mode. action: block behaviour is unchanged.

Additions (new entry points)

# What Where
4 PolicyEngine.evaluate accepts CLI-style type aliases (commandshell, readfile_read, writefile_write) so an embedder can pass whichever vocabulary is natural for its layer warden/policy_engine.py
5 PolicyEngine.evaluate(event)index and session_id now default, so the common SDK call-site works without boilerplate warden/policy_engine.py
6 Subcommand-scoped argparse errors — wrong flag inside a subcommand now shows the subcommand's usage and a warden <sub>: error: ... line instead of dumping the top-level help wall warden/cli.py::build_parser
7 PolicyEngine.from_defaults(workspace=...) classmethod as the explicit SDK factory warden/policy_engine.py
8 warden install-hooks prints scope + mode and, when at project scope, a one-line hint about --scope user warden/cli.py
9 warden policy show --json emits the rule list as structured JSON (workspace, project-policy path, per-rule id/severity/category/action/event-types/fields, allowlists) for scripts warden/cli.py

Test plan

Bench artefacts

The pre-0.3.1 run of the 120-attack / 539-benign / 1,000-call benchmark is in /Users/ar9av/Downloads/warden-bench/. The write-up is in /Users/ar9av/Downloads/warden-bench-report.md and /Users/ar9av/Downloads/warden-paper/warden_benchmark.pdf. All nine fixes map to issues disclosed in §7 of the bench report ("What we found that we didn't know before").

Not in this PR

  • Add timeout=30 to HTTP requests and subprocess calls #9 cold-start latency (344 ms p50 per hook call, dominated by CPython start-up). Needs a persistent-daemon integration — out of scope for a patch release; roadmapped for 0.4.x.
  • Add Gemini CLI hook adapter, sweep target, and skill scan #12 sessions roll-up (warden sessions tail --limit N across all session files). New CLI feature, not a fix.
  • Detection-rule gaps surfaced by the bench (claude-credential-access missing top-level ~/.claude.json, no rule for scp -r ~/.ssh, --registry=URL equals form, etc.) — tracked as a separate detection-coverage patch, distinct from UX.

Ar9av added 8 commits April 24, 2026 09:09
Tighten destructive-command regex so it only matches genuinely
destructive rm targets — false positives like `rm -rf ./node_modules`
and `rm -rf /tmp/build` no longer block legitimate cleanup. New patterns
cover `rm -rf /`, system dirs, `$HOME`/`~`, `*`, `..`, and `sudo rm -rf`.

Expand policy coverage:
- persistence rules for /etc/cron.d/, /etc/crontab, /etc/systemd/system/,
  /etc/init.d/, /etc/rc.local, /etc/profile.d/
- tls-verification-disabled rule (git sslVerify false, GIT_SSL_NO_VERIFY,
  StrictHostKeyChecking=no, curl -k, NODE_TLS_REJECT_UNAUTHORIZED, etc.)
- npm/yarn/pnpm --registry and git+ supply-chain patterns
- shell-obfuscation rule for base64/xxd/printf hex decode-and-execute

Workspace resolution: argparse subparsers declaring --workspace were
clobbering the top-level value with None. CLI now scans argv directly
and falls back to PRISMOR_WARDEN_WORKSPACE env var. This also fixes
egress allowlist enforcement in `warden check` and `policy init
--workspace` writing to the wrong directory, both of which depended on
workspace flowing through correctly.

Scanner now reads project-level .claude/settings.json, .mcp.json, and
.claude/settings.local.json from the workspace — previously workspace
MCP configs were ignored entirely.

should_block() loads default categories from the policy when
block_categories is None so the public API works for direct callers
(3 previously-failing unit tests now pass).

uninstall-hooks removes cloaking hooks too (userprompt-guard.sh,
decloak.sh, recloak-mcp.sh) so the claude settings.json ends up clean
instead of partially stripped.

sweep --clean / --restore catch RuntimeError when stdin has no TTY
and exit 1 with a clean message pointing at PRISMOR_SWEEP_PASS,
instead of raising an unhandled Python traceback.

upgrade_feed.py is now idempotent (no-op if no changes) so the
signature isn't invalidated on every run. When it does write, it
auto re-signs via pipeline/sign_feed.sh if PRISMOR_SIGNING_PRIVATE_KEY
is set; otherwise it prints a loud warning.

SARIF output now populates rules[] with all 53 policy rules including
titles, severities, categories, and helpUri — giving GitHub Code
Scanning and other consumers full rule metadata.

Minor UX: analyze and session accept positional args in addition to
--input / --session-id, and `check` no longer prints the resolved
symlink path as a trailing line that breaks parsers.

All 227 tests pass.
Three edge-case fixes surfaced by remote variation testing:

1. rm -rf ../build was incorrectly blocked because the parent-dir
   pattern matched .. followed by any /.  Tightened to .. or ../..
   or ../../.. chains (dir-escape), not relative build paths.

2. rm -r -f /etc (and rm -f -r /etc) weren't caught because the
   lookahead consumed the inter-flag space.  Switched to non-consuming
   (?=\s|$) lookahead so [^\n]* can backtrack onto the -f token.

3. rm --recursive --force / and --force --recursive are now matched
   via long-form alternation in the same lookahead; bare --recursive
   without --force is left lenient (consistent with bare -r).
Old pattern 'nc\s+.*-[a-z]*l[a-z]*\s*.*-p' required -l and -p as
separate flags, so nc -lvp 4444 -e /bin/bash (very common reverse
shell one-liner) slipped past. Three patterns now cover: nc listener
with explicit -e/--exec, nc with combined -lvp/-lp flags, and nc
with separate -l and -p flags.
- rm -rf "/" and rm -rf '/' now blocked (quoted target)
- /dev/tcp/<hostname>/port now caught (was digit-only)
- git -c http.sslVerify=false clone caught
- curl -sk, -ksL, -Lk (combined flag) caught
- npm/yarn/pnpm --registry flag caught regardless of position
  (npm i short form, registry flag before or after install/i/add)
Previously only the root / case accepted quoted targets, so
rm -rf "/etc" slipped past while rm -rf "/" was blocked.
Extended optional quote match to the system-critical-path,
top-level-dir, and $HOME/~ branches for consistency.
Implements every Tier-1 item from IMPROVEMENT_PLAN.md:

New subsystems
  - warden canary plant|list|remove|status — honey-token credentials
    with webhook beacons.  First AI-agent-specific canary impl.
  - warden/sinks.py — webhook, syslog (UDP/TCP), and file (JSON/CEF)
    SIEM sinks configured via settings.outputs in policy.yaml.
  - warden/policy_test.py — declarative test runner; bundled OWASP LLM
    Top 10 + Agentic Top 10 + MITRE ATLAS starter pack (28 cases).

Detection coverage
  - agent-instruction-tampering rule: CLAUDE.md, AGENTS.md,
    .cursorrules, .windsurfrules, .github/copilot-instructions.md.
  - MCP schema auditor: overbroad allowlists, any-typed params on
    exec-capable tools, missing input schemas, fs+net+exec
    capability combination, risky description language.
  - Unicode homoglyph detection: Cyrillic / Greek / Latin-extended
    confusables, fullwidth letters, zero-width joiners.
  - Lockfile integrity: non-registry sources, missing integrity hashes,
    lockfile-injection (deps in lock but not in package.json).

check ergonomics
  - --explain prints matched rule + full pattern.
  - --from-log PATH replays a JSONL session log through current policy.
  - --suggest-allowlist emits a ready-to-paste allowlist entry.

Bump to 0.3.0; CHANGELOG added.  227/227 unit tests pass; 28/28 OWASP
starter cases pass.
Surfaces six issues that the end-to-end bench (120-attack / 539-benign /
1,000-call latency) flagged. All changes are behaviour-compatible for
existing policy authors; SDK embedders gain two new entry points.

Fixed:
- install-hooks idempotent: re-run with different --mode now replaces
  the existing Warden hook instead of appending a duplicate entry.
- Session-log events carry ISO 8601 UTC timestamps (was null, broke
  SIEM correlation).
- hook-dispatch surfaces WARN-level findings to stderr as
  [warden WARN] [SEV] rule-id: title (was silent; exit 0 with no
  feedback meant HIGH-severity warns could fire invisibly).

Added:
- PolicyEngine.from_defaults() classmethod for embedders.
- PolicyEngine.evaluate(event) default args (index=0, session_id="").
- Event-type aliases: command→shell, read→file_read, write→file_write.
- warden policy show --json for scripting.
- Subcommand-scoped argparse errors (shows the subcommand's usage,
  not the top-level wall).
- install-hooks prints scope+mode and hints at --scope user for
  global coverage.

Tests: 236/236 pass (227 existing + 9 new regression tests for
idempotency, timestamp, embedder API).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant