Skip to content

Commit c84f863

Browse files
bar-capsuleclaude
andcommitted
Codify live tests for Claude Code + NAT; verify Cursor live on Bar's machine
Closes the gap between "code works in unit tests" and "code works in the real host environment." All three adapters now have either an automated live test or a documented manual procedure with captured real-payload evidence. Claude Code: - New test_live_claude_code.py spawns `claude --print` against a project-level settings.json that wires the adapter into PreToolUse, exercises ALLOW (echo command runs) and DENY (Guardian's destructive pattern surfaces in Claude's response). Both passing. NAT: - New test_live_nat_workflow.py exercises function_middleware_invoke -- the actual orchestration method NAT's runtime calls -- with a real function as call_next. Proves the load-bearing property: a blocked function does NOT execute (side-effect counter stays at 0). Covers allow / deny / fail-closed / fail-open paths. 5/5 passing against nvidia-nat-core 1.7.0. Cursor: - Live verification on Bar's machine 2026-06-15: a real Cursor session at /tmp/acs-real-test/ fired 5+ hook events through our adapter and reached the Guardian. Zero errors. Captured payloads saved to tests/real_cursor_payloads.example as evidence. Procedure documented in tests/live_verification.md. - Real Cursor schema details surfaced beyond create-hook docs: conversation_id, generation_id, model, composer_mode, cursor_version, workspace_roots, user_email, transcript_path, tool_use_id, duration. Adapter already handled these via fallbacks; no code change needed. Total tests across all adapters: - claude-code: 13 unit + 2 live = 15 (live runs in ~18s) - cursor: 13 unit + manual verification done - nat: 7 integration + 5 live workflow = 12 Total: 40 automated tests + 1 documented manual verification, all passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 451e9e7 commit c84f863

5 files changed

Lines changed: 460 additions & 3 deletions

File tree

adapters/README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,9 @@ Reference implementations that wire popular agent frameworks to an ACS Guardian.
66

77
| Adapter | Status | Mapping | Working adapter | Tests | Live integration verified |
88
|---|---|---|---|---|---|
9-
| [claude-code](./claude-code/) | Reference implementation ||| ✓ 13 round-trip tests | ALLOW + DENY paths verified against a real `claude --print` session |
10-
| [cursor](./cursor/) | Reference implementation ||| ✓ 13 round-trip tests | Manual verification by reviewer with Cursor installed (Cursor has no headless mode) |
11-
| [nat](./nat/) | Reference implementation ||| ✓ 7 integration tests against real `nvidia-nat-core` 1.7.0 | ⚠ Manual verification with a real NAT workflow (the integration tests use real NAT types but not a full agent run) |
9+
| [claude-code](./claude-code/) | Reference implementation ||| ✓ 13 unit + 2 live tests (`test_live_claude_code.py`) automate ALLOW + DENY against a real `claude --print` session | ✓ Automated in test suite |
10+
| [cursor](./cursor/) | Reference implementation ||| ✓ 13 unit tests | Manual verification done 2026-06-15; captured payloads in `tests/real_cursor_payloads.example`, procedure in `tests/live_verification.md` |
11+
| [nat](./nat/) | Reference implementation ||| ✓ 7 unit + 5 live workflow tests (`test_live_nat_workflow.py`) exercise the real `function_middleware_invoke` orchestration path against `nvidia-nat-core` 1.7.0 | ✓ Automated in test suite |
1212

1313
## The adapter pattern
1414

Lines changed: 146 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,146 @@
1+
"""
2+
Live end-to-end test: real Claude Code -> ACS adapter -> Guardian.
3+
4+
Spawns `claude --print` in a subprocess against a project-level settings.json
5+
that wires the adapter, exercises both ALLOW and DENY paths, asserts Claude
6+
Code's observable output reflects the Guardian's verdict.
7+
8+
Requires:
9+
- `claude` CLI available on PATH (Claude Code installed)
10+
- Python 3.10+
11+
12+
Skipped automatically when `claude` is not on PATH.
13+
"""
14+
from __future__ import annotations
15+
16+
import json
17+
import os
18+
import shutil
19+
import socket
20+
import subprocess
21+
import sys
22+
import tempfile
23+
import time
24+
import unittest
25+
from pathlib import Path
26+
27+
28+
HERE = Path(__file__).resolve().parent
29+
ADAPTER_DIR = HERE.parent
30+
ADAPTER = ADAPTER_DIR / "acs_adapter.py"
31+
GUARDIAN = ADAPTER_DIR / "example_guardian.py"
32+
33+
34+
CLAUDE_AVAILABLE = shutil.which("claude") is not None
35+
36+
37+
def _free_port() -> int:
38+
with socket.socket() as s:
39+
s.bind(("127.0.0.1", 0))
40+
return s.getsockname()[1]
41+
42+
43+
def _wait(host: str, port: int, timeout: float = 5.0) -> None:
44+
deadline = time.time() + timeout
45+
while time.time() < deadline:
46+
try:
47+
with socket.create_connection((host, port), timeout=0.2):
48+
return
49+
except OSError:
50+
time.sleep(0.05)
51+
raise RuntimeError(f"guardian not up at {host}:{port}")
52+
53+
54+
@unittest.skipUnless(CLAUDE_AVAILABLE, "`claude` CLI not on PATH")
55+
class LiveClaudeCodeRoundTrip(unittest.TestCase):
56+
@classmethod
57+
def setUpClass(cls) -> None:
58+
cls.workdir = tempfile.mkdtemp(prefix="acs-live-cc-")
59+
cls.port = _free_port()
60+
61+
# Project-level settings.json wires the adapter into Claude Code's
62+
# PreToolUse hook. Using the project root .claude/ so we don't
63+
# touch the user's ~/.claude/settings.json.
64+
claude_dir = Path(cls.workdir) / ".claude"
65+
claude_dir.mkdir()
66+
settings = {
67+
"hooks": {
68+
"PreToolUse": [{
69+
"matcher": "*",
70+
"hooks": [{
71+
"type": "command",
72+
"command": (
73+
f"ACS_GUARDIAN_URL=http://127.0.0.1:{cls.port}/acs "
74+
f"python3 {ADAPTER}"
75+
),
76+
}],
77+
}],
78+
}
79+
}
80+
(claude_dir / "settings.json").write_text(json.dumps(settings, indent=2))
81+
82+
cls.guardian_proc = subprocess.Popen(
83+
[sys.executable, str(GUARDIAN), "--port", str(cls.port)],
84+
stderr=subprocess.PIPE,
85+
stdout=subprocess.DEVNULL,
86+
)
87+
_wait("127.0.0.1", cls.port)
88+
89+
@classmethod
90+
def tearDownClass(cls) -> None:
91+
cls.guardian_proc.terminate()
92+
try:
93+
cls.guardian_proc.wait(timeout=2.0)
94+
except subprocess.TimeoutExpired:
95+
cls.guardian_proc.kill()
96+
shutil.rmtree(cls.workdir, ignore_errors=True)
97+
98+
def _claude(self, prompt: str, timeout: float = 120.0) -> tuple[int, str]:
99+
"""Invoke `claude --print` from the test workdir, capture stdout."""
100+
proc = subprocess.run(
101+
["claude", "--print", "--permission-mode", "acceptEdits", prompt],
102+
cwd=self.workdir,
103+
capture_output=True,
104+
text=True,
105+
timeout=timeout,
106+
)
107+
return proc.returncode, proc.stdout
108+
109+
# ----- ALLOW path -----
110+
111+
def test_benign_bash_runs(self) -> None:
112+
"""Guardian's policy allows benign Bash; Claude Code runs it and
113+
the marker string appears in stdout."""
114+
marker = "ACS_LIVE_TEST_OK_MARKER"
115+
rc, stdout = self._claude(f"Run the shell command: echo {marker}")
116+
self.assertEqual(rc, 0, f"claude exited {rc}; stdout={stdout[:200]}")
117+
self.assertIn(marker, stdout,
118+
f"benign command should have run; stdout={stdout[:300]}")
119+
120+
# ----- DENY path -----
121+
122+
def test_destructive_bash_blocked(self) -> None:
123+
"""Guardian's destructive-Bash policy denies; Claude Code surfaces
124+
the block in its output. We test against a string the example
125+
Guardian's regex blocks (no actual destructive op is attempted
126+
because PreToolUse fires before execution)."""
127+
# The example_guardian DESTRUCTIVE_BASH pattern matches 'rm -rf /...'
128+
# PreToolUse fires BEFORE the command runs, so the Guardian sees
129+
# the proposed command and denies it; the command never executes.
130+
prompt = (
131+
"Use the Bash tool with this exact command: "
132+
"rm -rf /tmp/acs-nonexistent-live-test-target"
133+
)
134+
rc, stdout = self._claude(prompt)
135+
self.assertEqual(rc, 0)
136+
# Claude Code's response should reference the block / the Guardian
137+
lo = stdout.lower()
138+
self.assertTrue(
139+
"block" in lo or "denied" in lo or "policy" in lo
140+
or "destructive" in lo,
141+
f"deny should surface in Claude Code's response; stdout={stdout[:400]}",
142+
)
143+
144+
145+
if __name__ == "__main__":
146+
unittest.main(verbosity=2)
Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
# Cursor live verification
2+
3+
Cursor is a desktop application with no documented headless mode, so the live test cannot run in CI. It can be reproduced manually by a reviewer with Cursor installed.
4+
5+
## Status: ✅ Verified on 2026-06-15
6+
7+
Real end-to-end round-trip exercised through Cursor's agent against the example Guardian. Captured payloads in `real_cursor_payloads.example`.
8+
9+
## Procedure
10+
11+
```bash
12+
# 1. Start the example Guardian
13+
python3 ../../claude-code/example_guardian.py --port 8787
14+
15+
# 2. In a new shell, set up a project with the adapter wired in
16+
mkdir -p /tmp/acs-cursor-live/.cursor
17+
cat > /tmp/acs-cursor-live/.cursor/hooks.json <<'EOF'
18+
{
19+
"version": 1,
20+
"hooks": {
21+
"sessionStart": [
22+
{ "command": "ACS_GUARDIAN_URL=http://127.0.0.1:8787/acs python3 /path/to/cursor_adapter.py sessionStart" }
23+
],
24+
"beforeSubmitPrompt": [
25+
{ "command": "ACS_GUARDIAN_URL=http://127.0.0.1:8787/acs python3 /path/to/cursor_adapter.py beforeSubmitPrompt" }
26+
],
27+
"preToolUse": [
28+
{ "command": "ACS_GUARDIAN_URL=http://127.0.0.1:8787/acs python3 /path/to/cursor_adapter.py preToolUse" }
29+
],
30+
"postToolUse": [
31+
{ "command": "ACS_GUARDIAN_URL=http://127.0.0.1:8787/acs python3 /path/to/cursor_adapter.py postToolUse" }
32+
],
33+
"beforeShellExecution": [
34+
{ "command": "ACS_GUARDIAN_URL=http://127.0.0.1:8787/acs python3 /path/to/cursor_adapter.py beforeShellExecution" }
35+
]
36+
}
37+
}
38+
EOF
39+
40+
# 3. Open the project in Cursor and prompt the agent to do something
41+
# that triggers tool calls (e.g. "Search for 'foo' in this directory")
42+
43+
# 4. Observe the Guardian's stderr for hook events:
44+
# [guardian] steps/sessionStart session=<uuid> step=<uuid>
45+
# [guardian] steps/userMessage session=<uuid> step=<uuid>
46+
# [guardian] steps/toolCallRequest session=<uuid> step=<uuid>
47+
# [guardian] steps/toolCallResult session=<uuid> step=<uuid>
48+
# ...
49+
```
50+
51+
## What was verified on 2026-06-15
52+
53+
A live Cursor session at `/tmp/acs-real-test/` was triggered with the prompt *"can you check if something changed there?"*. The Cursor agent fired the following hooks, all of which were received by our adapter and routed to the Guardian:
54+
55+
- `beforeSubmitPrompt` -> `steps/userMessage` (Guardian logged)
56+
- `preToolUse` for Grep tool -> `steps/toolCallRequest` (allowed by policy)
57+
- `postToolUse` for Grep tool -> `steps/toolCallResult`
58+
- `preToolUse` for Shell tool with curl command -> `steps/toolCallRequest` (allowed; curl isn't in the destructive pattern set)
59+
- `beforeShellExecution` -> `steps/toolCallRequest` (allowed)
60+
- `afterShellExecution` -> `steps/toolCallResult`
61+
62+
Zero errors in `cursor_adapter.err`. All 5+ events were translated to ACS JSON-RPC, sent to the Guardian, and the verdict accepted by Cursor. The agent's tool calls proceeded as expected because the example Guardian's policy doesn't deny benign Grep or curl commands.
63+
64+
The captured payloads (in `real_cursor_payloads.example`) confirmed several Cursor schema details that go beyond the public `create-hook` skill documentation:
65+
66+
- Cursor sends both `session_id` and `conversation_id` (identical values in agent mode)
67+
- Includes `generation_id`, `model`, `composer_mode`, `cursor_version`, `workspace_roots`, `user_email`, `transcript_path`
68+
- For tools: `tool_use_id`, `duration` (postToolUse)
69+
- For Shell: `command`, `cwd`, `sandbox`
70+
71+
The adapter handles all of these correctly via the fallback patterns in `build_payload`.
72+
73+
## Deny-path verification
74+
75+
To verify the deny path manually, prompt the Cursor agent to run a command matching the example Guardian's destructive regex (e.g., `rm -rf` against a clearly nonexistent path the agent has no real reason to touch). The Guardian should respond with `deny`; the adapter should emit `{"permission": "deny", "user_message": "destructive Bash pattern..."}`; Cursor should surface the block in its UI and not execute the command.
76+
77+
This was not exercised in the 2026-06-15 verification (the user's test prompt was benign), but the unit tests cover the same code path with 13/13 passing.
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
=== CURSOR HOOK FIRED event=beforeSubmitPrompt 2026-06-15T10:24:56Z ===
2+
{"conversation_id":"7a683b4a-0942-4271-8fde-412ec0e25915","generation_id":"4f0728ed-e7c3-407b-b5d3-da8c818dec7d","model":"default","composer_mode":"agent","prompt":"can you check if something changed there?","attachments":[],"session_id":"7a683b4a-0942-4271-8fde-412ec0e25915","hook_event_name":"beforeSubmitPrompt","cursor_version":"3.7.21","workspace_roots":["/tmp/acs-real-test"],"user_email":"bar@capsule.security","transcript_path":"/Users/barkaduri/.cursor/projects/tmp-acs-real-test/agent-transcripts/7a683b4a-0942-4271-8fde-412ec0e25915/7a683b4a-0942-4271-8fde-412ec0e25915.jsonl"}
3+
=== END ===
4+
=== CURSOR HOOK FIRED event=preToolUse 2026-06-15T10:25:02Z ===
5+
{"conversation_id":"7a683b4a-0942-4271-8fde-412ec0e25915","generation_id":"4f0728ed-e7c3-407b-b5d3-da8c818dec7d","model":"default","tool_name":"Grep","tool_input":{"pattern":"hooks\\.security","file_path":"/tmp/acs-real-test"},"tool_use_id":"tool_2c8bc7f7-a6ce-4598-8544-d267c8693f3","session_id":"7a683b4a-0942-4271-8fde-412ec0e25915","hook_event_name":"preToolUse","cursor_version":"3.7.21","workspace_roots":["/tmp/acs-real-test"],"user_email":"bar@capsule.security","transcript_path":"/Users/barkaduri/.cursor/projects/tmp-acs-real-test/agent-transcripts/7a683b4a-0942-4271-8fde-412ec0e25915/7a683b4a-0942-4271-8fde-412ec0e25915.jsonl"}
6+
=== END ===
7+
=== CURSOR HOOK FIRED event=postToolUse 2026-06-15T10:25:02Z ===
8+
{"conversation_id":"7a683b4a-0942-4271-8fde-412ec0e25915","generation_id":"4f0728ed-e7c3-407b-b5d3-da8c818dec7d","model":"default","tool_name":"Grep","tool_input":{"pattern":"hooks\\.security","file_path":"/tmp/acs-real-test"},"tool_output":"{\"pattern\":\"hooks\\\\.security\",\"success\":true}","duration":19.175,"tool_use_id":"tool_2c8bc7f7-a6ce-4598-8544-d267c8693f3","session_id":"7a683b4a-0942-4271-8fde-412ec0e25915","hook_event_name":"postToolUse","cursor_version":"3.7.21","workspace_roots":["/tmp/acs-real-test"],"user_email":"bar@capsule.security","transcript_path":"/Users/barkaduri/.cursor/projects/tmp-acs-real-test/agent-transcripts/7a683b4a-0942-4271-8fde-412ec0e25915/7a683b4a-0942-4271-8fde-412ec0e25915.jsonl"}
9+
=== END ===
10+
=== CURSOR HOOK FIRED event=preToolUse 2026-06-15T10:25:13Z ===
11+
{"conversation_id":"7a683b4a-0942-4271-8fde-412ec0e25915","generation_id":"4f0728ed-e7c3-407b-b5d3-da8c818dec7d","model":"default","tool_name":"Shell","tool_input":{"command":"curl -sI -L --max-time 15 -A \"Mozilla/5.0\" \"https://hooks.security\" 2>&1","cwd":"","timeout":30000},"tool_use_id":"6d90adc0-e172-4270-a472-b57fb7c9db4e","cwd":"","session_id":"7a683b4a-0942-4271-8fde-412ec0e25915","hook_event_name":"preToolUse","cursor_version":"3.7.21","workspace_roots":["/tmp/acs-real-test"],"user_email":"bar@capsule.security","transcript_path":"/Users/barkaduri/.cursor/projects/tmp-acs-real-test/agent-transcripts/7a683b4a-0942-4271-8fde-412ec0e25915/7a683b4a-0942-4271-8fde-412ec0e25915.jsonl"}
12+
=== END ===
13+
=== CURSOR HOOK FIRED event=beforeShellExecution 2026-06-15T10:25:13Z ===
14+
{"conversation_id":"7a683b4a-0942-4271-8fde-412ec0e25915","generation_id":"4f0728ed-e7c3-407b-b5d3-da8c818dec7d","model":"default","command":"curl -sI -L --max-time 15 -A \"Mozilla/5.0\" \"https://hooks.security\" 2>&1","cwd":"","sandbox":true,"session_id":"7a683b4a-0942-4271-8fde-412ec0e25915","hook_event_name":"beforeShellExecution","cursor_version":"3.7.21","workspace_roots":["/tmp/acs-real-test"],"user_email":"bar@capsule.security","transcript_path":"/Users/barkaduri/.cursor/projects/tmp-acs-real-test/agent-transcripts/7a683b4a-0942-4271-8fde-412ec0e25915/7a683b4a-0942-4271-8fde-412ec0e25915.jsonl"}
15+
=== END ===

0 commit comments

Comments
 (0)