Skip to content

feat(runtime,bridge,channels): MCP bridge for CC subprocesses + shell pre-gate uplift + approval push surface#1182

Closed
benhoverter wants to merge 13 commits into
RightNow-AI:mainfrom
benhoverter:feat/anai-32-capability-enforcement
Closed

feat(runtime,bridge,channels): MCP bridge for CC subprocesses + shell pre-gate uplift + approval push surface#1182
benhoverter wants to merge 13 commits into
RightNow-AI:mainfrom
benhoverter:feat/anai-32-capability-enforcement

Conversation

@benhoverter
Copy link
Copy Markdown
Contributor

Closes #1180.

Summary

  • Adds openfang-mcp-bridge crate — a stdio MCP server that forwards tools/call requests over a Unix-socket IPC channel back into the daemon's tool dispatcher, so Claude Code subprocesses can reach OpenFang's full tool surface (file_read, file_list, agent_list, channel_send, memory_recall, web_fetch, etc.) instead of being stuck with their built-ins.
  • Lifts the shell_exec pre-gate (metacharacter denylist + exec_policy allowlist) ahead of the approval gate, so denied commands fail synchronously without burning operator attention.
  • Adds an ApprovalManager lifecycle broadcast (Submitted / Resolved / TimedOut) and a channel_bridge approval surfacer that pushes formatted prompts to the most-specific bound channel for any command that falls through to "needs human".
  • Relocates builtin_tool_definitions() from runtime::tool_runner to openfang-types::tool::registry so the bridge and the runtime share one source of truth (re-exported from tool_runner for callsite stability).

Topology

daemon → claude (per-prompt) → openfang-mcp-bridge → unix socket → daemon::execute_tool
  • Daemon binds <home>/run/bridge.sock (0600), publishes OPENFANG_BRIDGE_SOCKET / OPENFANG_BRIDGE_BIN for subprocess drivers.
  • CC driver, per-spawn, generates a UUID token and writes a 0600 <home>/run/cc-mcp-<uuid>.json MCP-config, passed via --mcp-config <path> --strict-mcp-config. RAII guard removes the file on drop.
  • OPENFANG_BRIDGE_* env is stripped from CC's child env via apply_env_filter — bridge gets the discovery vars only via the explicit env map in the mcp-config, so a stray bridge can't pick up the daemon socket without a fresh per-spawn token.
  • Length-prefixed JSON IPC (1 MiB cap, 4-byte BE length prefix). Per-request correlation via PendingMap<u64, oneshot> so concurrent calls don't serialize at the dispatcher layer. Reader task drains pending oneshots with an error on connection close.

Default-off kill switch

OPENFANG_BRIDGE_ENABLED env gate. Unset / not in {1, true}try_build_bridge_mcp_config returns None and CC is spawned exactly as it was before this PR — no --mcp-config, no temp file, no bridge child. Daemon still starts the IPC listener and publishes discovery env unconditionally (both harmless without a bridge child connecting). Pure additive switch; zero behavior change when off.

Notable invariants

  • Identity at the bridge today is taken in-band from CallRequest::agent_id (stub). A daemon-issued per-spawn token table is the natural follow-up — flagged in the issue.
  • Pre-gate uplift is scoped to shell_exec only; process_start has a different input shape and its own validators. Widening is a separate change.
  • Bridge advertises the full kernel surface filtered by a substrate-level BRIDGE_DENY allowlist (currently empty). Per-agent gating remains in agent.toml.

Test plan

  • cargo check --workspace clean.
  • cargo clippy -D warnings clean (only pre-existing transitive imap-proto v0.10.2 future-incompat note, not ours).
  • openfang-mcp-bridge lib + binary tests:
    • protocol roundtrip (Frame / Hello / HelloAck / CallRequest / CallResponse)
    • built_in_tools_has_*_slice (drift sentinel against the registry)
    • permitted_tools_intersects_with_dispatcher_allowed
    • ipc_dispatcher_round_trip_and_correlation (fake daemon, full handshake, two concurrent calls + NotPermitted gate)
  • Daemon bridge_ipc (4 tests): handshake + dispatch end-to-end via tempfile socket, version-mismatch rejection, empty-token rejection, allowlist gate.
  • CC driver (11 tests, +3): test_build_bridge_mcp_config_shape, test_apply_env_filter_strips_bridge_discovery_vars, test_bridge_mcp_config_drop_removes_file, full bridge_enabled() truth-table.
  • ApprovalManager: Submitted + TimedOut event delivery; Resolved event delivery; UTF-8-safe log truncation.
  • Live validation against the daemon (recorded as tests A through F): file_list, file_read, agent_list, memory_recall, web_fetch, file_write all round-trip via the bridge; metacharacter denial is synchronous (no approval burned); allowlist match on argv0 basename clears without prompting; approval surfacer delivers prompts to the bound channel for fall-through commands.

🤖 Generated with Claude Code

benhoverter and others added 10 commits May 4, 2026 17:28
Standalone crate exposing OpenFang's tool surface to MCP clients (primarily
Claude Code subprocesses) over stdio. Per architectural decision in ANAI-22:
not folded into openfang-runtime — keeps the protocol adapter out of the
kernel/compactor blast radius and the dep graph clean.

This commit is scaffolding only:
  * Cargo manifest with rmcp 1.x (server, transport-io, macros)
  * lib.rs: ToolDispatcher seam trait (runtime-implements, bridge-consumes,
    one-way dep), ToolDispatchError enum, Bridge struct wrapping an
    Optional<Arc<dyn ToolDispatcher>>, single stub `ping` tool
  * main.rs: stdio MCP server entrypoint, tracing -> stderr (stdout is the
    transport), no dispatcher attached
  * Workspace members updated

Identity is bound at Bridge construction time, not per-call — the security
invariant tracked by ANAI-31. Real tool surface mapping lands in ANAI-30.

cargo check -p openfang-mcp-bridge: clean.
cargo check --workspace: clean (pre-existing imap-proto future-incompat
warning unrelated).

Refs: ANAI-22, ANAI-29

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds the daemon-side foundation for the MCP bridge per the ANAI-30 plan
(topology 1b: daemon → CC → bridge → unix socket → daemon dispatcher).

- New `protocol` module in openfang-mcp-bridge: Frame/Hello/HelloAck/
  CallRequest/CallResponse types with length-prefixed JSON framing
  (1 MiB cap, 4-byte BE length prefix). Gated by `ipc-codec` feature
  so type-only consumers can drop the tokio io traits.
- New `bridge_ipc` module in openfang-api: BridgeIpcServer binds
  <home_dir>/run/bridge.sock (0600), accept loop with graceful
  shutdown via Notify, per-connection Hello validation and CallRequest
  → CallResponse loop.
- run_daemon spawns the listener; failure is non-fatal (HTTP keeps
  serving; bridge just unavailable). Socket file removed on shutdown.

Step 1 stub: the dispatcher returns CallResult::Error
("not yet wired"). Step 2 replaces this with a call into
openfang_runtime::tool_runner::execute_tool, scoped to the four-tool
allowlist (file_read, file_list, agent_list, channel_send). Identity
binding + token-table auth land in ANAI-31.

Tests: 3 protocol roundtrip tests + 4 IPC handler tests
(handshake/dispatch end-to-end via tempfile socket, version mismatch
rejection, empty-token rejection).

Refs ANAI-30, ANAI-22.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replaces the step-1 stub in `BridgeIpcServer` with a real call into
`openfang_runtime::tool_runner::execute_tool`, mirroring the argument
bundle used by the HTTP /mcp endpoint in routes.rs.

- Added ALLOWED_TOOLS allowlist: file_read, file_list, agent_list,
  channel_send. Rejection happens at the protocol layer (CallResult::Error)
  before any kernel touch.
- Added dispatch_call(): snapshots the skill registry, builds a
  KernelHandle from Arc<OpenFangKernel>, and invokes execute_tool.
- ToolResult mapped to CallResult::Ok { content, is_error }, preserving
  the Ok/Error distinction (Error = bridge couldn't dispatch; Ok with
  is_error = tool ran but returned an error).
- Identity stub: caller_agent_id taken at face value from
  CallRequest::agent_id. Real per-spawn token-bound identity lands in
  ANAI-31.

Test: ipc_handshake_and_allowlist_gate verifies wire shape end-to-end:
disallowed tool gets allowlist Error, allowed tool gets Ok response. Real
execute_tool integration tests come once the daemon spawns the bridge
for real (ANAI-31).
…l surface

Replaces the stub `ping` tool with the four ANAI-30 tools (file_read,
file_list, agent_list, channel_send) and wires the bridge binary to forward
each `tools/call` over the daemon IPC socket established in step 1.

Library (lib.rs):
- ToolDispatcher::call now returns DispatchOk { content, is_error }
  preserving the tool-error-vs-dispatch-error distinction across the seam
- built_in_tools() declares the four-tool slice; schemas mirror
  runtime::tool_runner::builtin_tool_definitions() (kept in lockstep)
- Bridge: manual ServerHandler impl (drops the #[tool_router] macro). Filters
  advertised tools by intersecting built_in_tools() with
  ToolDispatcher::allowed_tools(); double-checks before dispatch
- Bridge::new now requires a dispatcher (was Option<_>)

Binary (main.rs):
- Reads OPENFANG_BRIDGE_SOCKET / TOKEN / AGENT_ID env vars (last is stub for
  ANAI-30; ANAI-31 derives identity from token)
- Connects to daemon, performs Hello/HelloAck handshake, exits on rejection
- IpcDispatcher: bridge-side ToolDispatcher impl. Forwards each call via mpsc
  to an actor task that owns the stream; correlation-by-request_id with a
  PendingMap<u64, oneshot> so concurrent tools/call invocations don't
  serialize at the dispatcher layer
- Reader task drains pending oneshots with an error on connection close so
  in-flight calls don't hang; production path exits the process so CC
  notices and tears down (gated behind cfg(not(test)))

Tests:
- lib: built_in_tools_has_anai30_slice, permitted_tools_intersects_with_dispatcher_allowed
- main: ipc_dispatcher_round_trip_and_correlation — fake daemon listener,
  full handshake, two concurrent calls, verifies per-id correlation and the
  NotPermitted gate

Workspace check clean. Daemon-side bridge_ipc tests still pass (4/4).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
End-to-end topology now exists at the type level:
daemon → claude (per-prompt) → openfang-mcp-bridge → IPC → daemon

- Add `caller_agent_id: Option<String>` to CompletionRequest. Plumbed
  through all construction sites; agent_loop populates it with
  session.agent_id, everywhere else passes None.
- Daemon (`server.rs::run_daemon`): after BridgeIpcServer starts,
  publish OPENFANG_BRIDGE_SOCKET and OPENFANG_BRIDGE_BIN as process env
  for subprocess drivers to discover. Bridge bin defaults to a sibling
  of current_exe; operators can override with OPENFANG_BRIDGE_BIN. Both
  set with `unsafe` (edition 2024) but only during single-threaded
  daemon startup, before any subprocess spawns.
- BridgeIpcServer gains `socket_path()` accessor.
- ClaudeCodeDriver: per-spawn `try_build_bridge_mcp_config`. When
  caller_agent_id is set AND both discovery env vars are present,
  generate a UUID token, write `<home>/run/cc-mcp-<uuid>.json` (0600),
  and add `--mcp-config <path> --strict-mcp-config` to the claude args.
  RAII guard removes the file on drop so per-spawn token lifetime is
  bounded by the CC subprocess.
- apply_env_filter extended to strip OPENFANG_BRIDGE_* from CC's child
  env. Bridge gets these only via the explicit `env` map in the
  mcp-config — CC inheriting them would risk a stray bridge picking up
  the daemon socket without a fresh per-spawn token.
- Tests:
  - test_build_bridge_mcp_config_shape — verifies wire shape claude
    expects: mcpServers.openfang.{command,args,env} with exactly the
    three discovery vars in env (no extras to leak state).
  - test_apply_env_filter_strips_bridge_discovery_vars — confirms
    filter removes all four bridge vars from CC's child env.
  - test_bridge_mcp_config_drop_removes_file — RAII cleanup invariant.

Stub points still flagged: token validated as non-empty (ANAI-31
replaces with daemon-issued per-spawn token table); agent_id taken
in-band from CallRequest (ANAI-31 derives from token).

11 CC driver tests pass. bridge_ipc (4) and bridge crate (6) tests
unchanged. Workspace check clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…NABLED

Default-off kill switch so we can deploy the bridge code path without
inlining it into every CC invocation. When the gate is unset or not in
{1, true}, try_build_bridge_mcp_config returns None and CC is spawned
exactly as it was pre-step-4 — no --mcp-config, no temp file, no bridge
child. Validation flow: deploy with gate off (sanity), launchctl setenv
OPENFANG_BRIDGE_ENABLED 1, bounce daemon, observe; if anything regresses,
flip back to 0 and bounce for instant recovery.

Daemon still starts the IPC listener and publishes BRIDGE_SOCKET/BIN env
unconditionally — both are harmless without a bridge child connecting.
Pure additive switch; zero behavior change when off.

Test exercises the full truth table for bridge_enabled() (unset, truthy
variants, falsy/garbage variants) and confirms the gate suppresses
config generation regardless of other env. Single test owns the global
env var so no serial_test infra needed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Bridge IPC handshake works standalone (bridge binary connects + Hello/HelloAck
ok against the live socket), and the daemon-side `wired CC --mcp-config for
OpenFang bridge` debug line confirms the flag is being passed to claude. But
no `bridge IPC accepted connection` events ever fire — meaning claude is
launched with `--mcp-config` but isn't spawning the MCP server subprocess.

Without `--debug`, claude swallows MCP launch errors silently. And we drop
CC's stderr on success spawns, so any silent rejection is invisible.

Add (both spawn paths):
- `--debug` flag when bridge config is wired, so MCP errors print to stderr.
- Always log a 4 KB tail of CC stderr at info when bridge_wired, regardless
  of success/failure. Streaming path now drains stderr concurrently to avoid
  pipe deadlock under chatty --debug output.

Existing 12/12 claude_code unit tests still pass; workspace check clean.

Diagnostic only — once the cause is identified we'll pare back to bounded
on-demand logging.
- bridge_ipc: promote handshake/dispatch events to INFO and add an
  `accepted connection` log on accept. Operators can now observe the
  full bridge lifecycle from daemon stderr without crawling through
  ~/.claude/debug/<uuid>.txt.
- claude_code driver: gate --debug + the 4 KB CC-stderr-tail diagnostic
  behind a new OPENFANG_BRIDGE_DEBUG env var (off by default). With
  proper INFO logs daemon-side, the noisy --debug output and the
  per-spawn ~/.claude/debug/ files are no longer load-bearing.
- server: validate operator-supplied OPENFANG_BRIDGE_BIN path at boot
  and log the resolution outcome (override vs. probe). Catches deploy
  ordering bugs where the env points at a binary that doesn't exist.

Stderr is still drained concurrently in the streaming path — required
whenever --debug might be on, cheap when it isn't.
Relocate builtin_tool_definitions() from runtime::tool_runner to
openfang_types::tool::registry as the single source of truth. Bridge
now derives its advertised surface from the registry, filtered by a
substrate-level BRIDGE_DENY allowlist (currently empty).

CC sees the full kernel surface; per-agent gating remains agent.toml.
web_fetch and web_search are no longer carved out — treat CC as an
API model: tools route through OpenFang, not hidden channels.

- openfang-types::tool: tool.rs → tool/{mod,registry}.rs
- tool_runner re-exports builtin_tool_definitions for callsite stability
- openfang-mcp-bridge: adds openfang-types dep (types-only, runtime-free
  invariant preserved); built_in_tools() is now ~7 lines
- Tests: drift sentinel for BRIDGE_DENY, full-surface assertion,
  ANAI-32 canonical-nine sanity (8/8 passing)

Validated end-to-end against live daemon: file_list, file_read,
agent_list, memory_recall, web_fetch, file_write all round-trip.
Lift the pure-syntactic shell validators -- metacharacter denylist and exec_policy allowlist -- out of the per-tool match arm in execute_tool and run them BEFORE the approval gate. Without this, denied commands were sent for human approval, approved, and only then rejected by the metachar denylist inside the per-tool arm. That wasted operator attention on commands guaranteed to fail. Validators remain inside the shell_exec arm as defense-in-depth.

Scoped to shell_exec only -- not all is_shell_tool entries. process_start has a different input shape and its own validators. Widening the pre-gate is a separate change.

For the deferred path -- commands that clear the pre-gate but still need human approval -- to reach a human, add the missing push surface.

ApprovalManager. Tokio broadcast of Submitted / Resolved / TimedOut lifecycle events, plus a subscribe API. Lag-tolerant. Slow subscribers get RecvError::Lagged and resync via list_pending. Tracing now includes agent_id, tool_name, risk, and decided_by on every lifecycle line.

channel_bridge. Spawn an approval surfacer that consumes those events, resolves the agent bindings via the registry, and pushes a formatted prompt to the most-specific bound channel and channel_id. Submission prompts include short id, agent, tool, risk, action summary -- truncated -- and timeout, with /approve and /reject hints. Resolved and TimedOut events post a follow-up so the prompt is not left dangling.

Tests added on the approval side cover Submitted+TimedOut and Resolved event delivery, plus UTF-8-safe log truncation.

Validated live on the daemon -- tests A through F. Metachar denial is synchronous, no approval burned. Allowlist match on argv0 basename clears without prompting. Approval surfacer delivers prompts to the bound channel for commands that fall through to approval.
benhoverter and others added 3 commits May 9, 2026 00:17
Apply rustfmt to files introduced or modified by this branch. Upstream-drift files (kernel, agent_loop, channels, anthropic/openai drivers, host_functions, model_catalog, types message) intentionally left untouched as a separate concern.
Both bridge-MCP-config wiring sites in the Claude Code driver were
using .map(|cfg| { side-effects; cfg }) on Option<NamedTempFile>,
which clippy flags as manual_inspect. .inspect() expresses intent
directly. No behavior change.
The MCP bridge IPC is unix-domain-socket-only by construction (daemon
listens on a unix socket; bridge subprocess connects to it). The bridge
crate and the daemon-side `bridge_ipc` module unconditionally imported
`tokio::net::{UnixStream, UnixListener}`, which broke Windows CI with
E0432 unresolved-import errors in `openfang-mcp-bridge::main` and
`openfang-api::bridge_ipc`.

Gates:
- `openfang-mcp-bridge::main` — entire body cfg-gated to `unix`; on
  non-unix the binary is a no-op stub that prints a clear message and
  exits non-zero. Tests gated `cfg(all(test, unix))`.
- `openfang-api::lib` — `pub mod bridge_ipc` gated to `unix`.
- `openfang-api::server::run_daemon` — `BridgeIpcServer::start` call
  gated to `unix`; non-unix logs a single info line and proceeds without
  bridge IPC. The CC driver's existing missing-socket fallthrough means
  CC subprocesses spawn without `--mcp-config` on Windows, matching the
  bridge-disabled path.

No behavioral change on Linux/macOS. Windows users get a daemon that
boots without bridge support; MCP-routed tools are unavailable until a
Windows-native transport (named pipes / TCP loopback) lands as a
follow-up.

Verified: cargo check --workspace, cargo check --workspace --tests,
cargo test -p openfang-mcp-bridge -p openfang-api --lib, cargo fmt
--check, and cargo clippy all clean on macOS.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@benhoverter
Copy link
Copy Markdown
Contributor Author

Superseded by ANAI-45 — closing unmerged.

This PR is being closed in favor of ANAI-45 (trust pipeline unification), which absorbs the capability gate into a unified decide_tool evaluator alongside exec_policy and file_policy. The capability gate's behavior — tool not in allowed_tools → deny — is preserved verbatim as Stage A of the new pipeline; the implementation just moves to a single call site so all three trust mechanisms share one decision contract (ToolDecision::{AutoApprove, Prompt, Deny}).

Rationale: this PR plus #1183 plus the (unpushed) ANAI-44 work each introduced their own "skip the approval prompt" path with different contracts. Consolidating them as three incremental PRs would mean reviewers seeing the design through a keyhole; ANAI-45 ships the whole shape coherently. Zero reviews here and zero installed users either side make a clean replacement the simpler move.

Branch feat/anai-32-capability-enforcement will be deleted after close. The ANAI-45 PR will reference this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Capability gate, MCP bridge for Claude Code subprocesses, and approval push surface

1 participant