Codify live tests for Claude Code + NAT; verify Cursor live on Bar's machine

bar-capsule · claude · bar-capsule · commit c84f8635509d · 2026-06-15T13:29:16.000+03:00
Closes the gap between "code works in unit tests" and "code works in
the real host environment." All three adapters now have either an
automated live test or a documented manual procedure with captured
real-payload evidence.

Claude Code:
- New test_live_claude_code.py spawns `claude --print` against a
  project-level settings.json that wires the adapter into PreToolUse,
  exercises ALLOW (echo command runs) and DENY (Guardian's destructive
  pattern surfaces in Claude's response). Both passing.

NAT:
- New test_live_nat_workflow.py exercises function_middleware_invoke --
  the actual orchestration method NAT's runtime calls -- with a real
  function as call_next. Proves the load-bearing property: a blocked
  function does NOT execute (side-effect counter stays at 0). Covers
  allow / deny / fail-closed / fail-open paths. 5/5 passing against
  nvidia-nat-core 1.7.0.

Cursor:
- Live verification on Bar's machine 2026-06-15: a real Cursor session
  at /tmp/acs-real-test/ fired 5+ hook events through our adapter and
  reached the Guardian. Zero errors. Captured payloads saved to
  tests/real_cursor_payloads.example as evidence. Procedure documented
  in tests/live_verification.md.
- Real Cursor schema details surfaced beyond create-hook docs:
  conversation_id, generation_id, model, composer_mode, cursor_version,
  workspace_roots, user_email, transcript_path, tool_use_id, duration.
  Adapter already handled these via fallbacks; no code change needed.

Total tests across all adapters:
- claude-code: 13 unit + 2 live = 15 (live runs in ~18s)
- cursor:     13 unit + manual verification done
- nat:         7 integration + 5 live workflow = 12
Total: 40 automated tests + 1 documented manual verification, all passing.

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/adapters/README.md b/adapters/README.md
@@ -6,9 +6,9 @@ Reference implementations that wire popular agent frameworks to an ACS Guardian.
 
 | Adapter | Status | Mapping | Working adapter | Tests | Live integration verified |
 |---|---|---|---|---|---|
-| [claude-code](./claude-code/) | Reference implementation | ✓ | ✓ | ✓ 13 round-trip tests | ✓ ALLOW + DENY paths verified against a real `claude --print` session |
-| [cursor](./cursor/) | Reference implementation | ✓ | ✓ | ✓ 13 round-trip tests | ⚠ Manual verification by reviewer with Cursor installed (Cursor has no headless mode) |
-| [nat](./nat/) | Reference implementation | ✓ | ✓ | ✓ 7 integration tests against real `nvidia-nat-core` 1.7.0 | ⚠ Manual verification with a real NAT workflow (the integration tests use real NAT types but not a full agent run) |
+| [claude-code](./claude-code/) | Reference implementation | ✓ | ✓ | ✓ 13 unit + 2 live tests (`test_live_claude_code.py`) automate ALLOW + DENY against a real `claude --print` session | ✓ Automated in test suite |
+| [cursor](./cursor/) | Reference implementation | ✓ | ✓ | ✓ 13 unit tests | ✓ Manual verification done 2026-06-15; captured payloads in `tests/real_cursor_payloads.example`, procedure in `tests/live_verification.md` |
+| [nat](./nat/) | Reference implementation | ✓ | ✓ | ✓ 7 unit + 5 live workflow tests (`test_live_nat_workflow.py`) exercise the real `function_middleware_invoke` orchestration path against `nvidia-nat-core` 1.7.0 | ✓ Automated in test suite |
 
 ## The adapter pattern
 
diff --git a/adapters/claude-code/tests/test_live_claude_code.py b/adapters/claude-code/tests/test_live_claude_code.py
@@ -0,0 +1,146 @@
+"""
+Live end-to-end test: real Claude Code -> ACS adapter -> Guardian.
+
+Spawns `claude --print` in a subprocess against a project-level settings.json
+that wires the adapter, exercises both ALLOW and DENY paths, asserts Claude
+Code's observable output reflects the Guardian's verdict.
+
+Requires:
+  - `claude` CLI available on PATH (Claude Code installed)
+  - Python 3.10+
+
+Skipped automatically when `claude` is not on PATH.
+"""
+from __future__ import annotations
+
+import json
+import os
+import shutil
+import socket
+import subprocess
+import sys
+import tempfile
+import time
+import unittest
+from pathlib import Path
+
+
+HERE = Path(__file__).resolve().parent
+ADAPTER_DIR = HERE.parent
+ADAPTER = ADAPTER_DIR / "acs_adapter.py"
+GUARDIAN = ADAPTER_DIR / "example_guardian.py"
+
+
+CLAUDE_AVAILABLE = shutil.which("claude") is not None
+
+
+def _free_port() -> int:
+    with socket.socket() as s:
+        s.bind(("127.0.0.1", 0))
+        return s.getsockname()[1]
+
+
+def _wait(host: str, port: int, timeout: float = 5.0) -> None:
+    deadline = time.time() + timeout
+    while time.time() < deadline:
+        try:
+            with socket.create_connection((host, port), timeout=0.2):
+                return
+        except OSError:
+            time.sleep(0.05)
+    raise RuntimeError(f"guardian not up at {host}:{port}")
+
+
+@unittest.skipUnless(CLAUDE_AVAILABLE, "`claude` CLI not on PATH")
+class LiveClaudeCodeRoundTrip(unittest.TestCase):
+    @classmethod
+    def setUpClass(cls) -> None:
+        cls.workdir = tempfile.mkdtemp(prefix="acs-live-cc-")
+        cls.port = _free_port()
+
+        # Project-level settings.json wires the adapter into Claude Code's
+        # PreToolUse hook. Using the project root .claude/ so we don't
+        # touch the user's ~/.claude/settings.json.
+        claude_dir = Path(cls.workdir) / ".claude"
+        claude_dir.mkdir()
+        settings = {
+            "hooks": {
+                "PreToolUse": [{
+                    "matcher": "*",
+                    "hooks": [{
+                        "type": "command",
+                        "command": (
+                            f"ACS_GUARDIAN_URL=http://127.0.0.1:{cls.port}/acs "
+                            f"python3 {ADAPTER}"
+                        ),
+                    }],
+                }],
+            }
+        }
+        (claude_dir / "settings.json").write_text(json.dumps(settings, indent=2))
+
+        cls.guardian_proc = subprocess.Popen(
+            [sys.executable, str(GUARDIAN), "--port", str(cls.port)],
+            stderr=subprocess.PIPE,
+            stdout=subprocess.DEVNULL,
+        )
+        _wait("127.0.0.1", cls.port)
+
+    @classmethod
+    def tearDownClass(cls) -> None:
+        cls.guardian_proc.terminate()
+        try:
+            cls.guardian_proc.wait(timeout=2.0)
+        except subprocess.TimeoutExpired:
+            cls.guardian_proc.kill()
+        shutil.rmtree(cls.workdir, ignore_errors=True)
+
+    def _claude(self, prompt: str, timeout: float = 120.0) -> tuple[int, str]:
+        """Invoke `claude --print` from the test workdir, capture stdout."""
+        proc = subprocess.run(
+            ["claude", "--print", "--permission-mode", "acceptEdits", prompt],
+            cwd=self.workdir,
+            capture_output=True,
+            text=True,
+            timeout=timeout,
+        )
+        return proc.returncode, proc.stdout
+
+    # ----- ALLOW path -----
+
+    def test_benign_bash_runs(self) -> None:
+        """Guardian's policy allows benign Bash; Claude Code runs it and
+        the marker string appears in stdout."""
+        marker = "ACS_LIVE_TEST_OK_MARKER"
+        rc, stdout = self._claude(f"Run the shell command: echo {marker}")
+        self.assertEqual(rc, 0, f"claude exited {rc}; stdout={stdout[:200]}")
+        self.assertIn(marker, stdout,
+                      f"benign command should have run; stdout={stdout[:300]}")
+
+    # ----- DENY path -----
+
+    def test_destructive_bash_blocked(self) -> None:
+        """Guardian's destructive-Bash policy denies; Claude Code surfaces
+        the block in its output. We test against a string the example
+        Guardian's regex blocks (no actual destructive op is attempted
+        because PreToolUse fires before execution)."""
+        # The example_guardian DESTRUCTIVE_BASH pattern matches 'rm -rf /...'
+        # PreToolUse fires BEFORE the command runs, so the Guardian sees
+        # the proposed command and denies it; the command never executes.
+        prompt = (
+            "Use the Bash tool with this exact command: "
+            "rm -rf /tmp/acs-nonexistent-live-test-target"
+        )
+        rc, stdout = self._claude(prompt)
+        self.assertEqual(rc, 0)
+        # Claude Code's response should reference the block / the Guardian
+        lo = stdout.lower()
+        self.assertTrue(
+            "block" in lo or "denied" in lo or "policy" in lo
+            or "destructive" in lo,
+            f"deny should surface in Claude Code's response; stdout={stdout[:400]}",
+        )
+
+
+if __name__ == "__main__":
+    unittest.main(verbosity=2)
diff --git a/adapters/cursor/tests/live_verification.md b/adapters/cursor/tests/live_verification.md
@@ -0,0 +1,77 @@
+# Cursor live verification
+
+Cursor is a desktop application with no documented headless mode, so the live test cannot run in CI. It can be reproduced manually by a reviewer with Cursor installed.
+
+## Status: ✅ Verified on 2026-06-15
+
+Real end-to-end round-trip exercised through Cursor's agent against the example Guardian. Captured payloads in `real_cursor_payloads.example`.
+
+## Procedure
+
+```bash
+# 1. Start the example Guardian
+python3 ../../claude-code/example_guardian.py --port 8787
+
+# 2. In a new shell, set up a project with the adapter wired in
+mkdir -p /tmp/acs-cursor-live/.cursor
+cat > /tmp/acs-cursor-live/.cursor/hooks.json <<'EOF'
+{
+  "version": 1,
+  "hooks": {
+    "sessionStart": [
+      { "command": "ACS_GUARDIAN_URL=http://127.0.0.1:8787/acs python3 /path/to/cursor_adapter.py sessionStart" }
+    ],
+    "beforeSubmitPrompt": [
+      { "command": "ACS_GUARDIAN_URL=http://127.0.0.1:8787/acs python3 /path/to/cursor_adapter.py beforeSubmitPrompt" }
+    ],
+    "preToolUse": [
+      { "command": "ACS_GUARDIAN_URL=http://127.0.0.1:8787/acs python3 /path/to/cursor_adapter.py preToolUse" }
+    ],
+    "postToolUse": [
+      { "command": "ACS_GUARDIAN_URL=http://127.0.0.1:8787/acs python3 /path/to/cursor_adapter.py postToolUse" }
+    ],
+    "beforeShellExecution": [
+      { "command": "ACS_GUARDIAN_URL=http://127.0.0.1:8787/acs python3 /path/to/cursor_adapter.py beforeShellExecution" }
+    ]
+  }
+}
+EOF
+
+# 3. Open the project in Cursor and prompt the agent to do something
+#    that triggers tool calls (e.g. "Search for 'foo' in this directory")
+
+# 4. Observe the Guardian's stderr for hook events:
+#    [guardian] steps/sessionStart session=<uuid> step=<uuid>
+#    [guardian] steps/userMessage session=<uuid> step=<uuid>
+#    [guardian] steps/toolCallRequest session=<uuid> step=<uuid>
+#    [guardian] steps/toolCallResult session=<uuid> step=<uuid>
+#    ...
+```
+
+## What was verified on 2026-06-15
+
+A live Cursor session at `/tmp/acs-real-test/` was triggered with the prompt *"can you check if something changed there?"*. The Cursor agent fired the following hooks, all of which were received by our adapter and routed to the Guardian:
+
+- `beforeSubmitPrompt` -> `steps/userMessage` (Guardian logged)
+- `preToolUse` for Grep tool -> `steps/toolCallRequest` (allowed by policy)
+- `postToolUse` for Grep tool -> `steps/toolCallResult`
+- `preToolUse` for Shell tool with curl command -> `steps/toolCallRequest` (allowed; curl isn't in the destructive pattern set)
+- `beforeShellExecution` -> `steps/toolCallRequest` (allowed)
+- `afterShellExecution` -> `steps/toolCallResult`
+
+Zero errors in `cursor_adapter.err`. All 5+ events were translated to ACS JSON-RPC, sent to the Guardian, and the verdict accepted by Cursor. The agent's tool calls proceeded as expected because the example Guardian's policy doesn't deny benign Grep or curl commands.
+
+The captured payloads (in `real_cursor_payloads.example`) confirmed several Cursor schema details that go beyond the public `create-hook` skill documentation:
+
+- Cursor sends both `session_id` and `conversation_id` (identical values in agent mode)
+- Includes `generation_id`, `model`, `composer_mode`, `cursor_version`, `workspace_roots`, `user_email`, `transcript_path`
+- For tools: `tool_use_id`, `duration` (postToolUse)
+- For Shell: `command`, `cwd`, `sandbox`
+
+The adapter handles all of these correctly via the fallback patterns in `build_payload`.
+
+## Deny-path verification
+
+To verify the deny path manually, prompt the Cursor agent to run a command matching the example Guardian's destructive regex (e.g., `rm -rf` against a clearly nonexistent path the agent has no real reason to touch). The Guardian should respond with `deny`; the adapter should emit `{"permission": "deny", "user_message": "destructive Bash pattern..."}`; Cursor should surface the block in its UI and not execute the command.
+
+This was not exercised in the 2026-06-15 verification (the user's test prompt was benign), but the unit tests cover the same code path with 13/13 passing.
diff --git a/adapters/cursor/tests/real_cursor_payloads.example b/adapters/cursor/tests/real_cursor_payloads.example
@@ -0,0 +1,15 @@
+=== CURSOR HOOK FIRED event=beforeSubmitPrompt 2026-06-15T10:24:56Z ===
+{"conversation_id":"7a683b4a-0942-4271-8fde-412ec0e25915","generation_id":"4f0728ed-e7c3-407b-b5d3-da8c818dec7d","model":"default","composer_mode":"agent","prompt":"can you check if something changed there?","attachments":[],"session_id":"7a683b4a-0942-4271-8fde-412ec0e25915","hook_event_name":"beforeSubmitPrompt","cursor_version":"3.7.21","workspace_roots":["/tmp/acs-real-test"],"user_email":"bar@capsule.security","transcript_path":"/Users/barkaduri/.cursor/projects/tmp-acs-real-test/agent-transcripts/7a683b4a-0942-4271-8fde-412ec0e25915/7a683b4a-0942-4271-8fde-412ec0e25915.jsonl"}
+=== END ===
+=== CURSOR HOOK FIRED event=preToolUse 2026-06-15T10:25:02Z ===
+{"conversation_id":"7a683b4a-0942-4271-8fde-412ec0e25915","generation_id":"4f0728ed-e7c3-407b-b5d3-da8c818dec7d","model":"default","tool_name":"Grep","tool_input":{"pattern":"hooks\\.security","file_path":"/tmp/acs-real-test"},"tool_use_id":"tool_2c8bc7f7-a6ce-4598-8544-d267c8693f3","session_id":"7a683b4a-0942-4271-8fde-412ec0e25915","hook_event_name":"preToolUse","cursor_version":"3.7.21","workspace_roots":["/tmp/acs-real-test"],"user_email":"bar@capsule.security","transcript_path":"/Users/barkaduri/.cursor/projects/tmp-acs-real-test/agent-transcripts/7a683b4a-0942-4271-8fde-412ec0e25915/7a683b4a-0942-4271-8fde-412ec0e25915.jsonl"}
+=== END ===
+=== CURSOR HOOK FIRED event=postToolUse 2026-06-15T10:25:02Z ===
+{"conversation_id":"7a683b4a-0942-4271-8fde-412ec0e25915","generation_id":"4f0728ed-e7c3-407b-b5d3-da8c818dec7d","model":"default","tool_name":"Grep","tool_input":{"pattern":"hooks\\.security","file_path":"/tmp/acs-real-test"},"tool_output":"{\"pattern\":\"hooks\\\\.security\",\"success\":true}","duration":19.175,"tool_use_id":"tool_2c8bc7f7-a6ce-4598-8544-d267c8693f3","session_id":"7a683b4a-0942-4271-8fde-412ec0e25915","hook_event_name":"postToolUse","cursor_version":"3.7.21","workspace_roots":["/tmp/acs-real-test"],"user_email":"bar@capsule.security","transcript_path":"/Users/barkaduri/.cursor/projects/tmp-acs-real-test/agent-transcripts/7a683b4a-0942-4271-8fde-412ec0e25915/7a683b4a-0942-4271-8fde-412ec0e25915.jsonl"}
+=== END ===
+=== CURSOR HOOK FIRED event=preToolUse 2026-06-15T10:25:13Z ===
+{"conversation_id":"7a683b4a-0942-4271-8fde-412ec0e25915","generation_id":"4f0728ed-e7c3-407b-b5d3-da8c818dec7d","model":"default","tool_name":"Shell","tool_input":{"command":"curl -sI -L --max-time 15 -A \"Mozilla/5.0\" \"https://hooks.security\" 2>&1","cwd":"","timeout":30000},"tool_use_id":"6d90adc0-e172-4270-a472-b57fb7c9db4e","cwd":"","session_id":"7a683b4a-0942-4271-8fde-412ec0e25915","hook_event_name":"preToolUse","cursor_version":"3.7.21","workspace_roots":["/tmp/acs-real-test"],"user_email":"bar@capsule.security","transcript_path":"/Users/barkaduri/.cursor/projects/tmp-acs-real-test/agent-transcripts/7a683b4a-0942-4271-8fde-412ec0e25915/7a683b4a-0942-4271-8fde-412ec0e25915.jsonl"}
+=== END ===
+=== CURSOR HOOK FIRED event=beforeShellExecution 2026-06-15T10:25:13Z ===
+{"conversation_id":"7a683b4a-0942-4271-8fde-412ec0e25915","generation_id":"4f0728ed-e7c3-407b-b5d3-da8c818dec7d","model":"default","command":"curl -sI -L --max-time 15 -A \"Mozilla/5.0\" \"https://hooks.security\" 2>&1","cwd":"","sandbox":true,"session_id":"7a683b4a-0942-4271-8fde-412ec0e25915","hook_event_name":"beforeShellExecution","cursor_version":"3.7.21","workspace_roots":["/tmp/acs-real-test"],"user_email":"bar@capsule.security","transcript_path":"/Users/barkaduri/.cursor/projects/tmp-acs-real-test/agent-transcripts/7a683b4a-0942-4271-8fde-412ec0e25915/7a683b4a-0942-4271-8fde-412ec0e25915.jsonl"}
+=== END ===
diff --git a/adapters/nat/tests/test_live_nat_workflow.py b/adapters/nat/tests/test_live_nat_workflow.py