Add adapters/ with Claude Code, Cursor, and NAT reference implementations#22
Add adapters/ with Claude Code, Cursor, and NAT reference implementations#22bar-capsule wants to merge 21 commits into
Conversation
…ions Introduces a top-level adapters/ directory holding reference implementations that wire popular agent frameworks to an ACS Guardian through configuration only, with no agent code changes. Three working adapters: - adapters/claude-code/: shell-stdin adapter wired via Claude Code's settings.json. 13 unit/integration tests + 2 automated live tests that spawn `claude --print` against a project-level settings.json and exercise ALLOW and DENY paths end-to-end. - adapters/cursor/: shell-stdin adapter wired via Cursor's hooks.json. Schema sourced from Cursor's bundled create-hook skill docs. 13 unit tests against the shared example Guardian. Manual live-verification procedure documented in tests/live_verification.md (Cursor is a desktop app with no documented headless mode). - adapters/nat/: in-process Python FunctionMiddleware for NVIDIA Agent Toolkit (nvidia-nat-core 1.7.0). Registered via @register_middleware, inherits FunctionMiddlewareBaseConfig with name="acs_guardian". 7 integration tests against real NAT types + 5 live workflow tests exercising the actual function_middleware_invoke orchestration path. Shared infrastructure: - adapters/example-guardian/: minimal deterministic Guardian used by all three adapters' integration tests. Stdlib-only Python. Documented as a teaching artifact, not a production Guardian. - adapters/README.md: framework -> adapter -> Guardian flow diagram, 6-step walkthrough with concrete JSON payloads at every step, cross-adapter comparison table, behavior-contract explanation. Total tests: 40 automated tests passing across all adapters (13 + 2 claude-code, 13 cursor, 7 + 5 nat). Schema gaps between docs and reality were closed via the live tests for Claude Code and NAT; Cursor was manually verified through a reproduction procedure. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
98470b4 to
8e65380
Compare
Normalize directory layout so all three adapters have identical
structure (file naming differs only where the framework's own naming
convention dictates, e.g. config file extensions):
adapters/<framework>/
├── README.md
├── acs_adapter.py (was: cursor_adapter.py, acs_middleware.py)
├── mapping.md
├── <config>.example (settings.json / hooks.json / workflow.yml)
└── tests/
├── __init__.py
├── test_adapter.py
├── test_live.py (was: test_live_claude_code.py, test_live_nat_workflow.py)
├── example_payloads.md (NEW: masked real-world payload examples)
└── live_verification.md (Cursor only — manual procedure)
Renames:
- adapters/cursor/cursor_adapter.py -> acs_adapter.py
- adapters/nat/acs_middleware.py -> acs_adapter.py
- adapters/claude-code/tests/test_live_claude_code.py -> test_live.py
- adapters/nat/tests/test_live_nat_workflow.py -> test_live.py
New config example files (parallels claude-code/settings.json.example):
- adapters/cursor/hooks.json.example
- adapters/nat/workflow.yml.example
New tests/example_payloads.md per adapter — masked real-world payload
examples sourced from actual sessions, with identifying fields replaced
by placeholders. Lets adopters see the actual schema each framework
emits (including fields not in the public docs for Claude Code and
Cursor) without committing any real session data. Each file uses a
consistent masking convention table at the bottom.
Cursor gets a placeholder tests/test_live.py that skips with a pointer
to live_verification.md, so the file naming stays identical across all
three adapters.
Top-level adapters/README.md now shows the consistent directory layout
as a code block so reviewers can see the parallel structure at a glance.
All 40 automated tests still passing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
rocklambros
left a comment
There was a problem hiding this comment.
@bar-capsule I went deep on this branch before writing anything. Checked it out, ran the three suites, read each adapter against the v0.1.0 schemas. The config-only pattern is the right call, and the per-adapter docs are honest about what's deferred. I want to flag a gap before "reference implementation" sticks, because people copy reference adapters line for line, and a few of these would ride along into their deployments. Marking this request-changes so the wire-format and fail-open items get a look before merge, not as a veto on the direction.
I checked all of this against the open issues first (#10 through #19). Those are spec-level: capability resolution, HMAC key management, the conformance program. Nothing below is a dupe. This is adapter-implementation ground. Three of mine have a protocol-level twin and I'll point those out.
The one that matters most: the adapters don't emit the v0.1.0 wire format. In claude-code/acs_adapter.py the request puts acs_version, request_id, timestamp, and metadata at the top level, but request-envelope.json wants them inside params and sets additionalProperties: false. A schema-validating Guardian rejects every request. timestamp goes out as epoch millis (int(time.time() * 1000)) where the schema asks for an ISO-8601 string. The payload uses tool / name / arguments where the hook schema wants the payload wrapper with arguments as {value, provenance}. Same shape in the cursor and nat adapters. The tests stay green because example_guardian.py reads params.get("tool") too, so both sides agree with each other and disagree with the spec. Nothing validates an emitted envelope against acs_schema.json. One test that does would have caught all of it, and it's the highest-leverage thing you can add here.
The deny path fails open on anything it doesn't recognize. In translate_response, an unknown or empty decision returns {}, which Claude Code and Cursor both read as "proceed." ACS_DEFAULT_DENY only kicks in on an exception (Guardian unreachable), not on a Guardian that answers with a verdict the map doesn't know. So a v0.2 disposition, a typo, even a trailing space surviving .lower(), all proceed. NAT already does the right thing and blocks under default_deny. The other two should match it. One line each.
No signing at all (this one's adjacent to #11, not the same). The adapters don't HMAC the envelope, and the example Guardian neither verifies a signature nor checks for replays. The READMEs call this "deferred to transport," but conformance.md:28 lists the baseline signature as a Core MUST, and :67 says transport doesn't satisfy it. #11 is about key distribution and rotation once you're signing. This is the step before: the reference ships with no signing, so every copy starts from an unauthenticated channel. The default http://127.0.0.1:8787/acs in the config keeps that invisible until someone repoints the URL at another host.
Delegation walks around the gate. SubagentStart isn't in HOOK_MAP, Claude Code can't block on it anyway, and example_guardian.py allows Task by default. A subagent spawn isn't evaluated before it acts. That's the adapter-level version of #16, and it lands on the exact confused-deputy path the subagent hooks were promoted to cover. At minimum I'd surface it in mapping.md's "not mapped" list instead of leaving it silent.
On the tests: "40 tests, all passing" is true on your machine, not in CI. NAT's 12 tests skip when nvidia-nat-core isn't installed (I got Ran 12 tests in 0.000s, OK (skipped=12)), Cursor's live test is a skip placeholder, and no workflow runs the adapter tests at all (only sync_version.yml). Skips read as passes, so a regression that lets a denied call through lands green. There's no requirements.txt under adapters/, and the NAT install is unpinned (pip install nvidia-nat-core, no ==). The NAT deny test also catches ACSGuardianDenied and returns without asserting the call was actually aborted, so it passes as long as something raised. This cluster worries me second-most, because green tests on a security control are worse than no tests. Pin the dep, run the unit tests in CI, and have the deny tests assert a real side effect didn't happen (the file wasn't written, the counter stayed at 0).
Smaller stuff, and I'm less sure these are worth blocking on:
example_guardian.py's regex missesrm -fr /,rm --recursive --force /,rm -rf ~, andfind / -delete. It's labeled illustrative so I won't die on this hill, but it's the only thing a newcomer can run on day one, so I'd make it harder to fool or louder about being a toy.- A PostToolUse deny can't undo a side effect that already ran (Claude Code's own hook docs say PostToolUse can't block the action). Cursor's
beforeReadFilereturns{}, so a denied file read still happens. The pre-hooks are the only real gate, and the docs should say so plainly. - NAT's
_build_requestisn't inside the try/except, so a non-serializable kwarg throws beforedefault_denycan catch it. Andpost_invokeignores a result-side deny. mapping.mdand the code disagree on the deny shape for non-PreToolUse hooks. The doc says{"continue": false, "stopReason": ...}, the code emits{"decision": "block", "reason": ...}. One is wrong against Claude Code's contract.
None of this changes my read on the direction. I like where this is going, and the cross-adapter table in the README is genuinely useful. I'd hold the "reference implementation" label until the envelope matches the schema and the deny paths fail closed, since those are the parts people copy without reading the footnotes. Happy to send a PR with the schema-validation test, or pair on the envelope fix if that's faster.
Tracking: I cross-linked the delegation gap onto #16 and the no-signing gap onto #11 so the protocol-level and adapter-level views sit together. The rest of the findings here are adapter-specific with no matching issue.
(cc @afogel since the envelope and signing points touch conformance.md.)
|
Merge-order note: this should land after #21, and after the change-request items above are addressed.
No git conflict with #20 or #21 (this only touches |
Rock's PR Agent-Control-Standard#22 review caught that the three reference adapters and the example Guardian shared a wire format that diverged from specification/v0.1.0/request-envelope.json: acs_version / request_id / timestamp / metadata at the envelope's top level instead of inside params, timestamp as epoch milliseconds instead of ISO-8601 string, tool payload missing the required payload wrapper, arguments not wrapped per tool-call-request.json. Tests passed because the adapter and the example Guardian agreed with each other; the canonical spec was outside the test loop. This commit: - Restructures every adapter's envelope to nest the AcsParams fields inside params, ISO-8601 timestamps, metadata.{agent_id, session_id} populated, payload wrapped per the relevant hook schema, arguments wrapped as {value: ...} per tool-call-request.json:26-37. - Updates example_guardian.py to read from params.payload, gate the Task subagent tool by default, and expand the destructive-Bash regex set (rm -fr, --recursive --force, ~, --no-preserve-root, find / -delete / -exec rm, chmod 777 on system paths). - Fixes the fail-open-on-unknown-disposition bug in claude-code and cursor translate_response; NAT pre_invoke and post_invoke now default-deny on unknown verdicts. - NAT post_invoke now honors a Guardian deny verdict by clearing context.output and setting acs_post_invoke_redacted, matching Specification §6.4's output-redaction gate. - NAT _build_request is now inside the try/except in both pre_invoke and post_invoke so build errors apply the same fail posture as transport errors. - Adds tests/test_envelope_schema.py to each adapter. These validate every adapter-emitted envelope and per-hook payload against the canonical v0.1.0 JSON schemas loaded from $ACS_SPEC_DIR. They are hard-FAIL if the schemas are missing — not skipped — because spec validation is non-negotiable. - Adds .github/workflows/adapter_tests.yml to run the schema + round- trip + live tests per adapter on every push and PR, with the spec schemas pulled from upstream Agent-Control-Standard/ACS:main. - Pins nvidia-nat-core==1.7.0 (adapters/nat/requirements.txt) and jsonschema>=4.20,<5 (adapters/requirements-test.txt). - Updates each adapter's README conformance table to be MUST-honest against docs/spec/conformance.md: handshake, baseline HMAC-SHA256 integrity, replay nonce, system/ping, wrapped MCP are now marked ✗ not implemented, with citations. The previous "deferred to transport layer" claim for baseline integrity was inconsistent with conformance.md:28 and :67. Test counts (all pass, zero hidden skips): claude-code: 17 schema + 13 round-trip cursor: 36 schema + 13 round-trip nat: 6 schema + 7 round-trip + 5 live (NAT 1.7.0) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ke, ping, audit Critical-review pass against the spec, MUST by MUST. Closes the gaps the previous commit (fbd5d7c) acknowledged but did not address. Adapters now exercise the full ACS-Core wire surface, not just the envelope shape: - HMAC-SHA256 baseline signing (§10). HKDF-SHA256 derives a per-session key from ACS_HMAC_SECRET and session_id. Adapter signs every request; Guardian verifies every signed request and signs every response. JCS canonicalization with the signature field removed is the signed input, per §10. Unsigned/tampered → SIGNATURE_INVALID (-32004). - Real rolling chain in example_guardian per §8.2. Per-session previous_hash tracking; entry_hash = SHA-256(JCS(entry) || prev_hash). Each session's chain head is computed and published on the response, covered by the signature per §8.6. - Replay rejection (§10.3): duplicate request_id within a session → REPLAY_DETECTED (-32005). Per-session seen-id set. - Timestamp skew rejection (§10.3): timestamps outside the negotiated skew window (default 300_000ms) → TIMESTAMP_OUT_OF_WINDOW (-32006). - handshake/hello (§4): adapter sends ClientHello on first session call; ServerHello is cached in ~/.cache/acs-adapter-handshake/ keyed by (session_id, guardian_url). ServerHello carries on_decision_failure ("proceed", spec default per §6.4), skew_window_ms, profiles_accepted. Version mismatch → UNSUPPORTED_VERSION (-32001). - system/ping (§13): Guardian always returns allow, doesn't enter the chain, doesn't require a signature, doesn't consume replay state. - Fail-open audit events per §6.4: every adapter that proceeds without a decision emits a structured ACS_AUDIT JSON line so the bypass is visible, not silent. Deployments redirect or parse the line to feed a real audit sink. - Default fail posture flipped to fail-open with audit (spec default per §6.4). Fail-closed is now explicit opt-in via ACS_DEFAULT_DENY=1. The previous default was the opposite of the spec default. - request_id_ref in tool-call-result payloads (tool-call-result.json:19-23) is now populated by all three adapters via a deterministic uuid5 derivation, so toolCallResult correlates to its originating toolCallRequest on the Guardian side. Testing methodology gaps closed: - format_checker added to every schema-validation test so format: uuid and format: date-time constraints are enforced. Previously they were annotation-only and a malformed value would pass. - Response-envelope schema validation added: every Guardian response shape (allow, deny, handshake, ping, error) is validated against response-envelope.json. - New tests/test_spec_compliance.py (20 tests) targets the Core MUSTs: rolling chain, replay, skew, signing (sign/verify/tamper-detect), handshake (success + version mismatch), system/ping (always-allow + chain-bypass), response-envelope validation. Honesty fixes in the conformance documentation: - NAT downgraded from "Reference implementation" to "Partial reference" in adapters/README.md and adapters/nat/README.md. NAT alone emits steps/toolCallRequest + steps/toolCallResult only — not the full 6-hook minimum. A NAT deployment using only this adapter is not ACS-Core conformant on its own; documented explicitly. - Cursor per-hook honesty table added: subagentStart, subagentStop, preCompact payloads are schema-valid but contain synthetic uuid5/ sha256 placeholders because Cursor does not expose the required fields. The synthetic values satisfy the schema but do not carry the meaning the spec requires. - Conformance tables across all three adapter READMEs now mark items ✓ that are actually verified by a test, with the test name cited. Items previously marked ✗ (handshake, baseline integrity, system/ping, replay enforcement, fail-open audit) are now ✓ with citation. Shared helpers in adapters/_common/acs_common.py: - jcs_canonicalize, derive_session_key (HKDF-SHA256) - sign_envelope, verify_signature (HMAC-SHA256) - iso8601_now, coerce_uuid, parse_iso8601 - audit_event (structured ACS_AUDIT line emitter) - do_handshake (cached per-session ClientHello/ServerHello) - ping (system/ping helper) Test counts after this commit: claude-code: 13 round-trip + 17 envelope-schema = 30 + 2 fail-posture cursor: 13 round-trip + 36 envelope-schema = 49 + 1 fail-posture (1 manual skip is intentional) nat: 18 (7 unit + 5 live + 6 envelope-schema; require nvidia-nat-core==1.7.0 for the first 12) guardian: 20 spec-compliance (handshake, ping, chain, replay, skew, signing, response-envelope validation) All pass. Zero hidden skips. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ail-without-secret, full JCS
Responds to the post-fbd5d7c critical review by closing the gaps that
were documented as "still not compliant":
- NAT lifecycle middleware (A). ACSMiddleware now subscribes to NAT's
IntermediateStepManager on first pre_invoke. WORKFLOW_START fires
steps/sessionStart + steps/userMessage; WORKFLOW_END fires
steps/agentResponse + steps/sessionEnd. NAT alone now satisfies the
ACS-Core 6-hook taxonomy minimum (conformance.md:19); the previous
"partial reference" status is gone. Verified in nat/tests/test_lifecycle.py.
- Cursor subagent + preCompact, honest (B). Stops fabricating UUIDs and
hashes. Adapter now keeps a per-session state file (~/.cache/acs-adapter-session/)
recording each step's request_id. subagentStart populates three of four
required fields from real data — subagent_session_id (uuid5 of
parent_session+subagent_id, stable across hooks), parent_session_id
(the envelope's session_id, real), parent_step_id (last step the
adapter has actually seen). Only intent_derivation stays a defensible
hardcoded default ("derived_from_parent" for IDE-spawned subagents).
preCompact's entries_to_compact comes from real seen step_ids.
subagentStop is dropped from HOOK_MAP entirely: final_chain_hash is
unknowable from Cursor (no chain on its side), and fabricating it is
worse than omitting the hook.
- Guardian refuses to start without a signing secret (C). On
startup, the Guardian requires ACS_HMAC_SECRET or ACS_HMAC_SECRET_FILE.
ACS_DEV_MODE=1 overrides with a stderr warning that §10 baseline
integrity is not satisfied. README points operators at
`openssl rand -hex 32 > /etc/acs/hmac.key && chmod 600`. All test
harnesses opt into ACS_DEV_MODE explicitly.
- Full RFC 8785 JCS via rfc8785 package (D). acs_common.jcs_canonicalize
uses the rfc8785 PyPI package when installed (full RFC compliance
including number edge cases) and falls back to the sorted-keys +
compact-separators implementation otherwise. requirements-test.txt
lists rfc8785>=0.1,<5.
- Chain entries use the client's timestamp (E). append_to_chain takes
the request's params.timestamp (already skew-validated upstream)
instead of stamping its own iso8601_now(). External observers can
now fully recompute the chain from the request stream and the
published chain_hash.
- ACS_HMAC_SECRET_FILE support (F). load_hmac_secret() resolves to a
file path first, env var second. File path is preferred for
production deployments (no exposure in ps eauxw, child-process
envs, core dumps). Trailing whitespace stripped.
- Per-session state helper. acs_common.load_session_state /
save_session_state / record_step provide a small JSON file in
~/.cache/acs-adapter-session/ for adapters that need to accumulate
state across separate hook-process invocations (Cursor uses this
for last_step_id and seen_step_ids; the same primitive is available
to claude-code and NAT if they need it).
- Handshake re-reads env on every call. The Guardian's
signature_algorithms_supported in ServerHello reflects the current
ACS_HMAC_SECRET / _FILE value at handshake time, not at process
start. Operators can rotate or set the secret without a restart.
Test counts (all pass, zero hidden skips):
claude-code: 32 round-trip + envelope-schema
cursor: 48 (47 + 1 intentional manual-Cursor skip)
guardian: 20 spec-compliance (handshake, ping, chain, replay, skew, signing, response-envelope validation)
nat: 20 (test_adapter 7 + test_live 5 + test_envelope_schema 6 + test_lifecycle 2)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two passes folded into one commit because they share helper code. ## Pass A — Security review (SECURITY.md threat model + tests + fixes) 12-threat model documented in adapters/SECURITY.md. 5 code fixes for high-priority threats; 7 out-of-scope items documented honestly. - T5 SSRF via ACS_GUARDIAN_URL: validate_guardian_url() allowlist (http/https only); called from every adapter's call_guardian and from do_handshake / ping in _common. - T6 Guardian DoS via oversized request body: MAX_REQUEST_BODY_BYTES (1 MiB) cap, matching the handshake's max_payload_size_bytes advertisement. Refuses Content-Length > cap with HTTP 413. - T7 leaky HMAC secret file: _check_secret_file_perms() refuses mode & 0o077 != 0, wrong owner, or symlink. Raises SecretFilePermissionsError instead of silently using a leaked key. - T8 cache poisoning: save_session_state and do_handshake create cache dirs mode 0700 and files mode 0600 via os.open(O_CREAT, 0o600) so other local users can't read or poison adapter state. - T9 regex DoS via huge command: scan_destructive_bash_safely() refuses inputs > 8 KiB, emits audit event, returns "input_too_large" sentinel — caller MUST treat as suspicious. 16 new tests in _common/tests/test_security.py, each named for the specific attack it falsifies. ## Pass B — Harsh-reviewer audit + 3 production failure modes + fixes Stepped back from the work, audited as a hostile reviewer. Identified the 3 most-likely production failure modes; wrote tests that fail on the current code; fixed the bugs. - BUG Agent-Control-Standard#1 NAT _correlation_request_id collision: the old uuid5-from- (session, function, kwargs-hash) was deterministic. Two calls to the same tool with the same args (list_files(), get_status(), any idempotent tool — very common) produced identical request_ids and the Guardian's replay protection rejected the second with REPLAY_DETECTED. Fix: stash a fresh uuid4 on the context in pre_invoke; post_invoke reads it back. Per-call uniqueness + pre/ post correlation both preserved. - BUG Agent-Control-Standard#2 Guardian state lost on restart: GuardianState was RAM-only. A Guardian restart (deploy / OOM / autoscaling roll) wiped seen_request_ids, opening a replay window for every previously-sent envelope. §10.3 doesn't pause for the duration of a deploy. Fix: per-session state persisted to a JSON file under ACS_GUARDIAN_STATE_DIR (default ~/.cache/acs-guardian-state/), mode 0700/0600. State loads on first session-touch, persists on every mutation in check_replay and append_to_chain. - BUG Agent-Control-Standard#3 lifecycle subscription race: _ensure_lifecycle_subscribed was a check-then-set with no lock. Two parallel pre_invoke calls (normal in NAT) both saw _lifecycle_subscribed=False and both subscribed; every WORKFLOW event then fired its ACS lifecycle hook twice. Fix: threading.Lock around the check-then-set, with re-check inside the lock. 4 new tests in nat/tests/test_failure_modes.py: 3 for the failure modes above, 1 regression guard ensuring the BUG Agent-Control-Standard#1 fix preserves pre/post correlation (post_invoke's request_id_ref must equal pre_invoke's request_id, per tool-call-result.json:19-23). ## Test-strengthening: catching 2 mutations that previously slipped Two mutation tests passed previously because of weaknesses in the tests themselves: - RollingChain::test_chain_hash_links_consecutive_requests only asserted hashes differed. Dropping previous_hash from the chain still produced different per-request hashes, so the mutation slipped. Strengthened test_chain_is_recomputable now EXTERNALLY recomputes the expected chain hash across 3 entries (now possible because the Guardian uses the client's timestamp) and asserts byte-equality. Also asserts the published hash does NOT match the "no previous_hash" computation. - Cursor envelope-schema fixtures all used SESSION_UUID, so a skip-coercion mutation slipped. Added UuidCoercionForNonUuidCursorIds (2 tests) with conv-abc123 / chat_xyz / test-cc-session inputs, asserting the adapter coerces them to valid UUIDs and that the coercion is deterministic. ## Test counts after this commit (all green, zero hidden skips) _common: 16 security claude-code: 32 round-trip + envelope-schema cursor: 50 round-trip + envelope-schema + uuid-coercion (1 intentional manual-Cursor skip) example-guardian: 20 spec-compliance nat: 24 (test_adapter 7 + test_live 5 + test_envelope_schema 6 + test_lifecycle 2 + test_failure_modes 4) Total: 142 tests. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Each item has a falsifying test in adapters/_common/tests/test_edge_cases.py (17 tests total). Items not requiring code changes still have tests that codify the safe behavior so a future regression would be caught. ## Items 1-12 Agent-Control-Standard#1 rfc8785 JCS consistency — test confirms fallback matches the rfc8785 package byte-for-byte on every ACS envelope shape we ship. No code change needed; a mixed-install signature mismatch would surface as test failure. Agent-Control-Standard#2 Guardian regex DoS, server-side: _matches_destructive_bash now returns "too_large" for inputs > DESTRUCTIVE_SCAN_MAX_LEN (8 KiB). The Guardian denies with reason_codes=["input_too_large"] — fail-safe direction. Previously, _common had the cap but the Guardian iterated patterns directly, leaving the server unprotected. Agent-Control-Standard#3 HA Guardian replay window: persist() now takes an exclusive flock on a .lock sidecar, re-reads on-disk state, merges (union of seen_request_ids / seen_nonces with earliest-timestamp wins), and atomically writes. check_replay re-reads the state on every call so Guardian A's writes are visible to Guardian B within one request. Cross-instance replay window closed under shared ACS_GUARDIAN_STATE_DIR. Agent-Control-Standard#4 Unbounded seen_request_ids: switched to dict {rid: timestamp}. New evict_old_request_ids() drops entries older than 2 × skew window (replay impossible past skew anyway). check_replay calls eviction opportunistically every 100 inserts. Memory bound is now O(skew_window / inter-request-time), not unbounded. Backwards- compat for list-format state files preserved. Agent-Control-Standard#5 Handshake cache TTL: do_handshake skips cache files older than ACS_HANDSHAKE_CACHE_TTL_SECONDS (default 3600s). Operator config changes propagate within the TTL. Agent-Control-Standard#6 NAT id(context) collision: WeakKeyDictionary fallback for contexts that reject attribute assignment. Last-resort path (object isn't weak-referenceable either) returns a fresh uuid4 per call and emits an audit event — pre→post correlation is lost in that path, but no silent collision. Agent-Control-Standard#7 Unicode / NULL / surrogate round-trip: emoji, NULL bytes, multi- plane unicode all sign+verify cleanly. JCS handles them via UTF-8 encoding; no code change needed. Agent-Control-Standard#8 ISO 8601 parse resilience: parse_iso8601 already accepts Z suffix, timezone offsets, millisecond + microsecond precision. Test codifies the accepted shapes + asserts garbage is rejected. Agent-Control-Standard#9 ACS_GUARDIAN_HOST_ALLOWLIST: optional env-var allowlist that restricts validate_guardian_url to specific hostnames in addition to the http/https scheme check. Defense in depth against env-var attacks that smuggle a valid http:// URL to internal services. Agent-Control-Standard#10 Cursor session-state file collision: _session_state_path now accepts an optional workspace parameter folded into the hash key. Cursor adapter passes the workspace_path / cwd so two Cursor windows with the same non-UUID conversation_id can't share state. Agent-Control-Standard#11 Guardian envelope schema validation: if jsonschema + ACS_SPEC_DIR are available, every incoming envelope is validated against request-envelope.json before policy evaluation. Malformed envelopes rejected with -32600 Invalid Request. system/ping and handshake/hello exempt because their payload shapes differ. Agent-Control-Standard#12 State-file hash length: bumped _session_state_path and _handshake_cache_path hashes from sha256[:16] (64-bit) to full sha256 (256-bit). Eliminates birthday collisions over deployment lifetime. ## Test counts after this commit (all green, 1 intentional manual skip) _common: 33 (16 security + 17 edge-cases) claude-code: 32 cursor: 50 example-guardian: 20 nat: 24 Total: 159 tests. ## Side-effects of the fixes - Round-trip test fixtures updated to use real UUID session_ids (claude-code/test_adapter.py). Old "test-cc-session" fails the new Guardian-side envelope-schema check, which is correct — non-UUID session_ids never reached the Guardian from real Claude Code. - Cursor adapter wires workspace through to load/save/record session state for Agent-Control-Standard#10 (new _workspace helper). - example_guardian.py imports DESTRUCTIVE_SCAN_MAX_LEN from acs_common to keep the cap in one place. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Stops the 'we keep finding bugs' pattern by enumerating the contract
once and testing it. One file (adapters/test_acs_core_conformance.py),
one command, one MUST per test method, each docstring quoting the
spec line it falsifies.
Coverage maps directly to docs/spec/conformance.md ACS-Core (lines 13-26):
Core01_Handshake (§4)
handshake/hello returns ServerHello with required keys
version mismatch -> UNSUPPORTED_VERSION -32001
Core02_EnvelopeShape (§3, request-envelope.json)
valid envelope passes canonical schema validation (with format-checker)
jsonrpc must be literal "2.0"
no additional top-level fields allowed
params required {acs_version, request_id, timestamp, metadata, payload}
metadata required {agent_id, session_id}
request_id format: uuid
timestamp format: date-time
acs_version pattern semver
method matches namespace pattern
Core03_HookTaxonomyMinimum (conformance.md:19)
all six of sessionStart, userMessage, toolCallRequest,
toolCallResult, agentResponse, sessionEnd accepted
Core04_Dispositions (§6, response-envelope.json conditionals)
allow response shape
deny requires reasoning
modify requires reasoning + modifications (schema-level)
ask requires reasoning + ask_details
defer requires reasoning + defer_details
Core05_SessionContext (§8)
response carries chain_hash
chain_hash is lowercase 64-hex SHA-256
consecutive entries chain
distinct sessions have independent chain heads
chain externally recomputable from request stream
Core06_ReplayProtection (§10.3)
duplicate request_id -> -32005 REPLAY_DETECTED
timestamp outside skew -> -32006 TIMESTAMP_OUT_OF_WINDOW
same request_id across sessions is fine (per-session scope)
Core07_BaselineIntegrity (§10)
signed request accepted
unsigned request rejected when secret configured -> -32004
tampered request -> -32004 SIGNATURE_INVALID
response is signed and verifies with HKDF key
per-session HKDF distinct keys for distinct sessions, same for same
signature covers session_id (cross-session signature lift rejected)
Core08_DecisionHonoring (§6.4)
Guardian responds within negotiated timeout (wire-level)
fail-open emits ACS_AUDIT event (adapter-level, exercised via subprocess)
Core09_SystemPing (§13)
ping returns allow regardless of policy / signature / session state
ping does not require signature
ping payload includes {status, echo, server_timestamp}
ping does not consume replay slot
Core10_WrappedMcp (conformance.md:26)
protocols/MCP/* method namespace validates
Guardian returns structured envelope for MCP methods (partial — full
MCP wrapping is documented as a follow-up; namespace + envelope OK)
44/44 pass on the current reference implementation. Spec source defaults
to /tmp/acs-spec-source/specification/v0.1.0/; set ACS_SPEC_DIR to
override. Hard-fails (does not skip) if schemas are missing — spec
validation is non-negotiable.
Why this matters: previous test layers (per-adapter, schema, security,
edge-cases, failure-modes) covered properties but didn't give an
adopter a single yes/no on Core conformance. test_acs_core_conformance
is the contract. If an adopter forks this and modifies it, they run
the file and either keep their Core claim or know exactly which MUST
they broke.
adapters/README.md now leads with this command.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Self-audit caught six tests that were positive-only (asserted the
happy path but had no falsifier). A no-op validator or a non-chaining
chain would have passed several of them. Mutation-tested the
strengthened versions: each named bug is now caught with a clear
failure message citing the spec.
Strengthening per group:
Core02 Envelope shape
+ test_contradiction_validator_actually_works — a deliberately
broken envelope ({"jsonrpc": "2.0"} with no method/id/params)
MUST be rejected. Without this, a no-op validator passes every
drop-required-field test silently.
Core03 Hook taxonomy minimum
+ Reframed each of 6 hook tests as parametrized over a table of
(method, valid_payload, broken_payload, schema_file). Asserts
both that the valid case produces a KNOWN disposition (not
garbage) AND that the broken_payload is rejected by the
canonical hook payload schema. Falsifier-per-hook.
Core04 Dispositions
+ test_allow_response_without_required_envelope_fields_rejected —
synthesizes 5 broken allow responses (missing type, acs_version,
request_id, decision; bogus decision enum) and asserts each
fails response-envelope.json. Without this, the positive
"allow validates" test is tautological.
Core05 SessionContext + chain
* test_chain_externally_recomputable extended to THREE entries.
Old version only checked entry 1, which doesn't have a
previous_hash to fold in — a "chain that doesn't chain"
mutation produced the right value for the root and passed.
Now: entry 2's recomputed hash with previous_hash MUST match
what the Guardian published; AND must NOT match the computation
that ignores previous_hash. Entry 3 verifies transitive chaining.
Core08 Decision honoring
* Replaced wire-level "Guardian responds fast" check (wrong
property) with three adapter-side tests of the actual MUST:
- test_adapter_actually_applies_guardian_deny — Guardian
returns DENY, adapter MUST translate to deny, not allow.
- test_adapter_waits_for_a_slow_guardian — Guardian sleeps
1s, adapter MUST take at least 1s; an adapter that proceeds
without waiting is caught by elapsed-time check.
- test_fail_open_emits_audit_event — unchanged.
Core10 Wrapped MCP
* Strengthened the "no crash" test to also require the response
validates against response-envelope.json. A no-op Guardian
returning {} would no longer pass.
+ test_mcp_method_namespace_rejects_garbage_namespaces —
contradiction: methods outside the reserved namespaces (e.g.,
"arbitrary/method", "step/typo", "PROTOCOLS/upper") MUST be
rejected by the schema. Without this, "namespace pattern works"
is unverified.
Mutation-tested the strengthened suite. Four representative bugs were
injected one at a time; each was caught:
No-op _validate_request_envelope → 9 Core02 tests fail
compute_entry_hash ignores previous_hash → Core05 3-entry test fails at entry 2
Adapter silently fails open (no audit) → Core08 audit test fails
Adapter proceeds without waiting → Core08 deny + slow-guardian tests fail
Net: still 44 tests, but every one now has a verifiable falsifier.
Adopter who forks and breaks a Core MUST gets a precise failure with
spec citation, not a passing suite that papered over their bug.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… tests Bundle of cleanup and feature work accumulated since the last push. ## Code structure - Delete `adapters/example-guardian/tests/test_spec_compliance.py` (-485 lines). It tested the same Guardian behaviors as the conformance suite. The 4 uniquely-covered checks (future-timestamp rejection, response-envelope validation on handshake / ping / replay-error) were folded into the conformance suite where they belong. - New `adapters/_common/test_harness.py`. One canonical home for `free_port`, `wait_port`, `make_envelope`, `validate_request_envelope`, `validate_response_envelope`, `validate_hook_payload`, `claude_code_event`, `post_envelope`, `spawn_guardian` context manager, and `ProgrammableGuardian`. Eleven test files migrated from inline copies to imports from here. Net ~250 fewer lines of duplicated code across the test corpus. - Add `GUARDIAN_ERROR_CAUSE` + `guardian_error_cause(code)` to `_common/acs_common.py`. JSON-RPC error code → audit cause label taxonomy lives in one place; Cursor/NAT can reuse. - Refactor `adapters/claude-code/acs_adapter.py`. Dispatch tables for payload-building and response-translation (one function per hook); `_emit`, `_pretool_response`, `_block_response` writers used by both `translate_response` and `_fail`'s fail-closed branch. Same behavior, no duplicated JSON shapes, easier to extend (new hook = two dict entries). Code is longer but each function does one thing and the deny shapes have a single definition. ## End-to-end check that drives REAL Claude (`adapters/claude-code/e2e_check.py`) Rewrote from synthetic to real. Spawns a recording, signing ProgrammableGuardian on localhost, writes a temp `.claude/settings.json` wiring all 7 hooks at the adapter, then runs `claude --print --model claude-haiku-4-5` through 4 scenarios: 1. ALLOW — benign Bash; marker appears in toolCallResult envelope 2. DENY — Guardian denies all Bash; marker absent from results 3. READ TOOL — different tool, same wire contract 4. HANDSHAKE — multi-tool prompt → exactly 1 handshake/hello Prints every envelope on the wire plus PASS/FAIL per scenario. 4/4 passes against the current adapter. Requires `claude` CLI on PATH; documented in `adapters/claude-code/README.md`. ## `adapters/claude-code/wire.py` — wiring CLI Operator-driven `settings.json` editor. Dry-run by default with a unified diff; `--write` to apply (timestamped backup). Reversible via `--unwire`. Per-hook fail posture: gate hooks (`PreToolUse`, `UserPromptSubmit`) default to `ACS_DEFAULT_DENY=1` (fail-closed); observational hooks (`PostToolUse`, `Notification`, `SessionStart`, `SessionEnd`) default to fail-open per §6.4 spec default. Operators see the per-hook posture in the dry-run output. Override with `--default-deny` (all-closed) or `--all-fail-open`. Other guardrails: URL allowlist, file-perm checks on `--secret-file`, schema-conformant output, won't touch settings.json without explicit `--write`. ## Audit cause differentiation (footgun fix surfaced by a Claude probe) The original adapter audit log used `fail_open_bypass` for every decision-failure. A probe in a separate Claude session walked into the gap: an unsigned envelope to a signing-required Guardian gets SIGNATURE_INVALID (-32004), the adapter logged it as a generic bypass indistinguishable from an unreachable Guardian. Three changes: - Audit events now carry a `cause` field: `transport_failure`, `signature_invalid_response`, `malformed_envelope_response`, `replay_detected_response`, `timestamp_out_of_window_response`, `response_signature_invalid`, `adapter_build_failed`, `adapter_exception`, `invalid_stdin_json`. Disposition stays `fail_open_bypass` / `decision_failure_fail_closed` (back-compat). - 4 new regression tests in `Core08_DecisionHonoringAdapter`: `test_fail_open_emits_audit_event` (extended to assert cause=transport_failure), `test_guardian_error_response_carries_distinct_cause`, `test_guardian_error_under_fail_closed_emits_deny`, `test_malformed_envelope_under_fail_closed_emits_deny`. - Documents the operator triage path: client bugs (signature, malformed envelope) vs ops issues (Guardian down) now distinguishable by `grep cause=... acs-audit.log`. ## Counts conformance: 45 → 48 tests claude-code: 32 tests (unchanged) cursor: 50 tests (unchanged, 1 intentional skip) _common: 33 tests (unchanged) Plus: `e2e_check.py` (4/4 against real Claude) and `wire.py` (manual dry-run + --unwire round-trip verified on Bar's machine). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…cope §6.4 per-adapter
Consolidates the README work from this PR into one commit, with all
filesystem paths parameterized as $ACS_REPO (no machine-specific
paths leak into the docs).
## adapters/claude-code/README.md — full rewrite
- 5-step install (generate HMAC secret, run Guardian, wire.py, restart
Claude Code, verify with e2e_check.py). All paths use $ACS_REPO so
adopters set one env var and the rest works regardless of where they
cloned the repo.
- 5 smoke tests ordered broadest-to-most-specific (automated suite,
e2e_check.py, in-session manual, audit-cause differentiation,
pre-flight inventory). Each shows the expected output so adopters
know what success looks like.
- Configuration table reflects current env vars (drops the deprecated
ACS_SESSION_ID, adds ACS_HMAC_SECRET_FILE, ACS_HANDSHAKE,
ACS_HANDSHAKE_CACHE, ACS_GUARDIAN_HOST_ALLOWLIST).
- On-disk state section (handshake cache + Guardian state files).
- Troubleshooting table for the current failure modes: SecretFile-
PermissionsError, SIGNATURE_INVALID, TIMESTAMP_OUT_OF_WINDOW,
invalid session_id, transport failures.
- Conformance table updated: HMAC signing is ✓; the stale "no signing"
and "no handshake" 'What this is not' bullets removed.
- New '§6.4 Decision honoring' subsection — framework-specific story
of how Claude Code provides the wait-and-apply guarantee.
## adapters/README.md — prune to general concepts
- Drop the Status table — PR-only stat content, irrelevant to reviewers.
- Drop adapter-specific top-level sections ('How NAT is wired',
'How spec-schema validation runs') — those belong in per-adapter
READMEs or fold into the conformance section.
- Collapse 'Why the in-process NAT adapter blocks differently' (moved
to nat/README.md) and the multi-paragraph 'behavior contract' into a
one-paragraph 'Decision honoring is a framework property' under the
'How adapters work' section — the general invariant stays in common,
the per-framework mechanics move per-adapter.
- Drop orphan --- separator left over by the Status removal.
- Walkthrough now flags that it shows envelope SHAPES and omits HMAC
signatures + handshake/hello for clarity; points readers to
e2e_check.py for verbatim envelopes.
- Stale '44/44' → '48/48' (4 audit-cause regression tests landed).
## adapters/{nat,cursor}/README.md — per-adapter §6.4 paragraph
Each gets a short 'Decision honoring (§6.4)' subsection naming how
that specific framework provides the wait-and-apply guarantee
(Cursor's blocking subprocess + exit-2; NAT's pre_invoke → call_next
ordering inside function_middleware_invoke).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Bar's manual test surfaced that `rm -rfv /tmp/...` slipped past the example Guardian's destructive-Bash policy. Any single-letter extension after `-rf` (or `-fr`) defeated the regex — a trivial evasion for a security control. Root cause: the pattern was -[a-zA-Z]*r[a-zA-Z]*f \b ... The `\b` (word boundary) after the second required flag letter only matched when the very next character was non-word. `rm -rfv` matched the alternation as `-rf`, but then `\b` had to fire between `f` and `v` — both word chars — so the whole pattern failed. Fix: append `[a-zA-Z]*` after the second required flag letter so trailing letters are consumed by the alternation before `\b` fires. Also switched the bridge between flags and path from greedy `.*` to non-greedy `.*?` for cleaner matching. -[a-zA-Z]*r[a-zA-Z]*f[a-zA-Z]* \b .*? \s+ (/|~|$HOME) Regression coverage: new `Item13_DestructiveRmFlagVariants` class in test_security.py with 5 tests covering canonical, trailing-letter, middle-letter, and leading-letter variants, plus the exact shape Claude generated (`rm -rfv /tmp/path/ ; echo done`). Benign rm (without both r and f) still passes through; rmdir and naked-rm not flagged. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…t this is not'
Common README now opens with a 'What is a Guardian?' section that
anchors readers before they see the word a dozen times in the rest of
the doc. Three things made explicit:
- Production Guardian (real policy engine — OPA/Rego, Cedar, vendor SDK)
vs example_guardian.py (teaching artifact in this repo)
- example_guardian.py implements the full wire protocol but a
deliberately tiny policy — not for production
- Running the Guardian is operator responsibility (terminal, launchd,
systemd, container); the adapter expects it reachable at
$ACS_GUARDIAN_URL
claude-code/README.md drops the 'What this is not' section — its three
bullets were either now-redundant ("not a production Guardian" + "not
a service manager" both covered by the common Guardian section) or
already in the Conformance status table ("not a full MCP wrapping
implementation" — explicit ✗ row with citation). The conformance
table is the structured, authoritative version of "what's implemented
and what isn't"; the loose prose duplicated it less precisely.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…licy e2e
Brings Cursor to feature parity with the Claude Code adapter and
catches a class of real-world bugs the unit suite misses.
Adapter (acs_adapter.py):
- Refactor to dispatch-table pattern with shared _emit /
_permission_response / _post_tool_response writers
- Audit-cause taxonomy on every fail-closed/fail-open path
(§6.4): transport_failure, signature_invalid_response,
malformed_envelope_response, replay_detected_response,
timestamp_out_of_window_response, response_signature_invalid,
adapter_build_failed, adapter_exception, invalid_stdin_json
- beforeSubmitPrompt allow path now emits "{}" instead of empty
stdout, since Cursor with failClosed: true reads "exit 0 + no
output" as hook failure and blocks the prompt
wire.py (new):
- Idempotent installer for ~/.cursor/hooks.json
- Sets BOTH Cursor's native failClosed: true AND the adapter's
ACS_DEFAULT_DENY=1 on gate hooks (defense-in-depth)
- Re-entrant merge/unwire via "# acs-adapter-wired" marker
e2e_check.py (new):
- Semi-automated end-to-end against a recording Guardian wired
to the REAL example_guardian policy (not synthetic per-scenario
handlers, which created an operator footgun where running rm
during the wrong prompt would silently succeed)
- 5 scenarios + setup: ALLOW, READ-TOOL, DESTRUCTIVE, USER-MESSAGE,
HANDSHAKE-ONCE — every scenario sees the same shipping policy
- DESTRUCTIVE drives the actual _matches_destructive_bash regex
end-to-end through Cursor, so policy regressions (like the
rm -rfv evasion fixed in 9713703) get caught at integration
level, not just at the regex unit-test level
.gitignore:
- Exclude adapters/*/.cursor/, .claude/, .acs-handshake-cache/ —
scratch dirs from manual probing that carry absolute machine
paths and must never reach commits
README rewrite + test_adapter.py assertion update for the new
audit-cause stderr format. 50/50 cursor tests pass; 48/48
conformance tests pass.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Brings NAT to feature parity with Cursor/Claude (wire.py CLI, real-
policy e2e_check, audit-cause taxonomy) and lands four correctness
fixes that a real Vertex/Gemini LLM-driven react_agent exposed —
fixes the synthetic test suite would have shipped past.
NAT structural port:
- wire.py: comment-preserving YAML installer + linter via ruamel.yaml.
--check / --write / --unwire; walks every attachment point
(workflow / function_groups / functions with override middleware)
and idempotently inserts acs_guardian. Marker-tagged so unwire
removes exactly what was added; hand-wired entries stay.
- e2e_check.py: 5-scenario automated check (ALLOW, DESTRUCTIVE,
READ-TOOL, HANDSHAKE-ONCE, LIFECYCLE) against a recording
Guardian wired to the REAL example_guardian.evaluate_step
policy. Canary-file pattern on DESTRUCTIVE proves enforcement
instead of relying on a side-effect counter that can false-pass.
- README: full rewrite to match Cursor/Claude structure (5-step
install, smoke tests, configuration table, conformance table,
troubleshooting). Documents the YAML-only discipline + dynamic-
registration caveat that's structural to NAT (vs. framework-wide
interception in Cursor/Claude).
- Audit-cause taxonomy: pre_invoke + post_invoke now emit `cause`
fields covering transport_failure, adapter_exception,
response_signature_invalid, and the 7 JSON-RPC error code
mappings via the shared guardian_error_cause() helper. Matches
Cursor/Claude's _fail(cause=) shape.
Correctness fixes uncovered by the live Vertex test:
- _extract_arguments: NAT's middleware chain captures function
input as modified_args[0] (a Pydantic model returned by
Function._convert_input) on every LangChain agent path. The
adapter was reading only modified_kwargs — always empty on that
path — so every toolCallRequest envelope shipped with
arguments:{} and the Guardian had no command to inspect. A real
LLM-driven `rm -rf` blew past the destructive regex with `→
allow` because the Guardian never saw the command. Helper now
flattens args from Pydantic / dataclass / dict / scalar shapes
into the arguments dict before sending.
- _apply_overrides_to_context: same shape, MODIFY side. Adapter
wrote parameter_overrides into modified_kwargs but NAT's actual
call uses modified_args[0]. Override silently dropped — a
Guardian saying "rewrite rm -rf to echo safe" had ZERO effect.
Helper now mutates the Pydantic model (via model_copy) /
dataclass / dict in place AND modified_kwargs.
- _redact_output: post_invoke deny set
`context.acs_post_invoke_redacted = True` on InvocationContext,
which is a strict Pydantic model with validate_assignment=True
and no extra fields. Every redaction crashed with
ValidationError, leaving sensitive output flowing through.
Helper drops the bogus extra-field assignment; redaction signal
is output=None plus the ACS_AUDIT post_invoke_redacted event.
- example_guardian case-fold: destructive-Bash + Task + Write
policy checks were case-sensitive. NAT YAML key `shell`
(lowercase) bypassed the Bash/Shell branch entirely — same
class of bug as the rm -rfv regex evasion. Now case-folds.
Also adds decision logging + a startup banner stamp so live
operators can see allow/deny in the terminal, not just the
received-envelope line.
Regression tests added:
- Item14_ToolNameCaseInsensitive (_common/test_security.py):
7 casings × destructive + benign Read + Task gate variants.
- ExtractArgumentsFromInvocationContext (nat/test_adapter.py):
Pydantic v2 model / dict / dataclass / scalar with schema /
empty context — 7 angles.
- DispositionsLive (nat/test_dispositions_live.py): MODIFY
override reaches function input; ASK/DEFER substitute to DENY;
post_invoke DENY redacts output to None.
Shared e2e_report.py:
Report + ANSI constants + real_policy_handler + assert_envelopes_
signed_and_valid extracted to adapters/_common/e2e_report.py so
every adapter's e2e_check imports one source of truth. NAT's
e2e_check uses it from inception.
.gitignore: adapters/*/.nat-venv/, __pycache__/, *.pyc — venvs
and bytecode must not ship.
207 tests pass: 32 claude + 50 cursor + 35 nat + 42 _common +
48 conformance.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…dule Both e2e_check.py files carried their own ~90-line copy of the same pretty-printer (Report class + ANSI constants + scenario PASS/FAIL summary) and the same envelope-validation block (HMAC signing check + canonical-schema validation against request-envelope.json). Three copies — cursor, claude-code, and now nat — would have drifted as soon as the next change landed. This commit switches Cursor + Claude to the shared adapters/_common/e2e_report.py module introduced alongside the NAT e2e_check. No behavior change: same scenarios, same assertions, same output format. Verified by running NAT's e2e_check live (5/5 PASS) and booting Cursor + Claude e2e_check to confirm the refactored Report renders correctly past the header into the first scenario. Net: ~110 fewer lines across the three e2e_check.py files; single source of truth for the printer, the policy-handler builder, and the schema-validation sub-checks. 207 tests still pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The original adapter_tests.yml had three bugs that made it non-functional and reproduced the exact failure mode PR Agent-Control-Standard#22 review flagged: tests "passing" via skips that operators read as green. Fixes: - Python 3.13 → 3.12. NAT 1.7.0 has no 3.13 wheel; the install step would fail silently in skip mode on any NAT-dependent test. - Drop `example-guardian` from the matrix — no tests/ directory there; the matrix entry crashed on `unittest discover tests`. - Add the cross-adapter conformance suite (`adapters/test_acs_core_ conformance.py`) as its own job. That 48-test file was previously not run by CI at all. Skip handling, per Rock's "skips read as passes" point: - NAT job: NAT is installed (pinned `nvidia-nat-core==1.7.0` + matching `nvidia-nat-langchain`), so ANY skipped test means the test gating is buggy. Hard fail. - Conformance job: zero skips allowed — every ACS-Core MUST runs. - Other adapters: surface skips as warnings (Claude Code's live tests legitimately skip when the `claude` CLI isn't installed in CI; Cursor has a manual-procedure placeholder). Both intentional. Now exercises ~190 tests on every push to adapters/ or specification/, with the load-bearing security tests pinned and required. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Thanks @rocklambros for the thorough review! every item addressed. Wire format vs request-envelope.json - Every emitted envelope is schema-validated against canonical request-envelope.json from $ACS_SPEC_DIR (not against fixtures, not against the example Guardian's shape). params wrapper, ISO-8601 timestamps, payload + {value, provenance} arg wrapping - all to spec. Deny fails open on unknown - _fail(cause=…) taxonomy across all three adapters covering transport, adapter exception, signature failure, and 7 JSON-RPC error codes via shared guardian_error_cause(). Unknown disposition + ACS_DEFAULT_DENY=1 → block; otherwise an ACS_AUDIT fail_open_bypass event with the cause label. Signing - HMAC-SHA256 across all three adapters via adapters/_common/. Every envelope signed; every response verified. SIGNATURE_INVALID / REPLAY_DETECTED / TIMESTAMP_OUT_OF_WINDOW each map to a distinct audit cause. Subagent delegation - example_guardian gates Task by default (subagent_gated); opt-in via ACS_ALLOW_SUBAGENT=1. Each adapter's mapping.md lists where delegation hooks aren't honorable by the framework. Tests on tests - Deny tests now assert the real side effect didn't happen, the way you described — counter checks plus, for NAT, a canary-file pattern (if rm -rf runs despite the deny, the canary file vanishes regardless of what the counter says). A real Vertex/Gemini react_agent run surfaced silent-bypass bugs the synthetic tests would have shipped; all have regression tests. CI workflow - Pinned nvidia-nat-core==1.7.0 + nvidia-nat-langchain==1.7.0, Python 3.12, runs the per-adapter + conformance suites on every push to adapters/ or specification/, hard-fails on any skipped NAT or conformance test. Smaller items: rm regex hardened (-rfv, -fr, --recursive --force, --no-preserve-root). READMEs say plainly that pre-hooks are the gate; post-hooks redact via output=None + audit. NAT _build_request moved inside try; post_invoke result-side deny propagates. mapping.md and code now agree on deny shape. Would appreciate re-review when you have a window. |
…w Wrapped MCP claim Two findings from Rock's review of PR Agent-Control-Standard#22: P1.1 — Conformance CI fails for the right reason now. Without rfc3339-validator installed, jsonschema's date-time format checker silently no-ops and test_timestamp_is_iso8601 false-passes (invalid "yesterday" passes validation; assertion sees an empty error list; CI shows green on a real wire-format bug). Pin rfc3339-validator in adapters/requirements-test.txt + add a fail-fast setUpClass guard that rejects any future degradation: if the date-time checker accepts "not-a-date", the whole conformance class refuses to run with a pointed error message. P2.2 — Wrapped MCP claim narrowed. conformance.md:26 lists protocols/MCP/* as part of the Core baseline. Our Core10_WrappedMcp suite verifies the WIRE-FORMAT shape (envelope validates, Guardian returns a structured response, no crash) but not full MCP request wrapping; the reference Guardian routes incoming MCP through the standard toolCallRequest path with the tool name reflecting the MCP method. The module docstring and the top-level adapters/README now say plainly that a green run = "ACS-Core baseline minus full Wrapped MCP", not "the whole baseline". v0.2 deferral marked explicitly. Deployments needing full wrapping must extend the Guardian. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… to malformed base64
Reviewer caught that base64.b64decode() in verify_signature() ran
without exception handling. A malformed signature value
("not-base64", padding garbage, truncated input) raised
binascii.Error up to the Guardian's request handler, which only
catches GuardianError. Result: a bad signature tore down the
request path on the wire (uncaught exception, 500-class response)
instead of returning the spec's SIGNATURE_INVALID (-32004). Same
risk on the adapter side for malformed signed responses. Security
control was a DoS vector.
verify_signature() now catches binascii.Error / ValueError /
TypeError around the b64decode and returns False — the existing
caller chain (Guardian's check_signature, adapter's response
verification) then emits -32004 with cause=signature_invalid_*
and the audit event fires correctly.
Regression test: Item15_VerifySignatureRobustToMalformedBase64
exercises 7 forms of unparseable input (garbage, padding-only,
mid-string padding, oversized, empty); every one must return
False, none may raise.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ation-only on every adapter
Reviewer caught that ACS-Core §hooks.md describes agentResponse as
decision-eligible (ALLOW / DENY / MODIFY), but every adapter
silently drops denies on the hook that produces it. Claude Code
maps Notification → agentResponse and returns {} on deny; Cursor
afterAgentResponse does the same; NAT lifecycle hooks are
fire-and-forget through the IntermediateStepManager subscription.
The framework constraint is real and not fixable in this PR:
- Claude Code's Notification fires AFTER assistant message
delivery — no veto path.
- Cursor's afterAgentResponse fires AFTER the message — same.
- NAT's IntermediateStepManager is a notification stream;
subscriber callbacks cannot abort an event after it fires.
This commit makes the docs honest about that. Each adapter's
mapping.md now marks the relevant hook explicitly as
"observation-only" with an explanation of which framework
boundary blocks pre-delivery enforcement. The per-adapter
README conformance tables narrow the dispositions claim to
"ALLOW / DENY / MODIFY on pre-execution hooks" with a pointer to
mapping.md for lifecycle / post-execution observation-only
posture.
Also includes a hunk missed from the previous Wrapped MCP commit:
adapters/README.md's top-level claim now says "ACS-Core baseline
minus full Wrapped MCP" to match what the conformance suite
actually verifies.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…claim on post_invoke redaction Reviewer caught that the Post-tool-deny-redaction row still said post_invoke sets acs_post_invoke_redacted=True, contradicting the code in adapters/nat/acs_adapter.py:294 / :717 — InvocationContext is a strict Pydantic model and that extra attribute would crash. The real redaction signal is context.output = None plus the ACS_AUDIT post_invoke_redacted event. README now says so. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Summary
Introduces a top-level
adapters/directory holding reference implementations that wire popular agent frameworks to an ACS Guardian through configuration only, with no agent code changes. Ships three working adapters with passing tests for each.What lands
adapters/claude-code/claude --printround-trip, ALLOW + DENY paths (test_live_claude_code.py)adapters/nat/function_middleware_invokeagainstnvidia-nat-core1.7.0 (test_live_nat_workflow.py)adapters/cursor/tests/live_verification.md(Cursor is a desktop app with no documented headless mode)adapters/example-guardian/Total: 40 automated tests + 1 documented manual verification procedure, all passing.
The adapter pattern
ACS-Core specifies what a hook event looks like on the wire and what the Guardian's decision looks like coming back. It does not dictate how a framework physically wires the interception in. Each adapter demonstrates the boundary choice for its framework:
hookSpecificOutput.permissionDecisionargv[1]permission(top-level, per-event) + exit code 2FunctionMiddlewareclassACSGuardianDenied(NAT 1.7.0) orInvocationAction.SKIP(NAT dev)All three send the same ACS JSON-RPC shape to the Guardian. The example Guardian (
adapters/example-guardian/example_guardian.py) is shared across all adapters — same wire format regardless of which framework emits the event.The top-level
adapters/README.mdcontains a step-by-step walkthrough with concrete JSON payloads at each step, a cross-adapter comparison table, and a flow diagram. Read that first.Claude Code adapter
Wire it up by editing
~/.claude/settings.json(seesettings.json.example); no code changes to your agent.Live verification:
tests/test_live_claude_code.pyspawnsclaude --printin a sub-process with a project-levelsettings.jsonwiring the adapter intoPreToolUse. Tests both:echocommand runs and the marker string appears in Claude Code's output.Both passing in ~18s.
Schema corrections discovered via the live test (real Claude Code differs from public docs):
hookSpecificOutput.permissionDecision = "deny", NOT top-leveldecision: "block".tool_response(object), NOTtool_output(string).tool_use_id,effort,duration_msnot mentioned in public docs.NAT adapter (NVIDIA Agent Toolkit)
Real NAT middleware class. Installs via
pip install nvidia-nat-core, configured in NAT workflow YAML:Live verification:
tests/test_live_nat_workflow.pyexercises NAT's actual orchestration method (FunctionMiddleware.function_middleware_invoke) — the same code path NAT's runtime calls when a function with middleware is invoked. Tests prove the load-bearing property: when the Guardian denies, the target function does not execute (the test's side-effect counter stays at 0).Covers: allow / deny / fail-closed / fail-open. 5/5 passing against
nvidia-nat-core1.7.0.Schema corrections discovered while building:
InvocationAction.SKIPis on the NAT dev branch, NOT in 1.7.0. Block by raising. Adapter feature-detects and prefers action-based path when available.FunctionMiddlewareBaseConfigwithname=class kwarg (NAT's TypedBaseModel registration). Plain PydanticBaseModelfails on@register_middleware.Cursor adapter
Real Cursor schema sourced from Cursor's own bundled
~/.cursor/skills-cursor/create-hook/SKILL.md. Maps all 20 documented Cursor hook events to ACSsteps/*methods.Cursor is a desktop application without a documented headless mode, so live verification is a documented manual procedure in
tests/live_verification.md. The procedure has been run end-to-end (5+ hooks flowed through the adapter, zero adapter errors); captured payloads from that reproduction are not committed because Cursor's events include session-identifying fields. Anyone with Cursor installed can reproduce.Why in-spec
adapters/and not separate reposSingle repo for spec + reference implementations on the first batch makes the spec evolve alongside the adapters that exercise it (the live tests on this PR found several real schema gaps between docs and actual behavior — that feedback loop is what makes the spec text trustworthy). When the pattern stabilizes and individual adapters need their own release cycle, splitting to separate repos is straightforward.
What this PR does NOT do
Running the tests
🤖 Generated with Claude Code