Skip to content

Commit 59408d3

Browse files
authored
feat(mcp): auto-wire MCP gateway into openclaw via device-signed WS RPC (#18)
Replace the broken file-based MCP config injection with a proper gateway WebSocket RPC flow. The old approach wrote mcp-servers.json to a shared volume and called "openclaw mcp reload" via docker exec, but modern openclaw stores MCP server config under mcp.servers in openclaw.json and has removed the reload CLI command. New flow: - At sluice startup, after env injection and secrets reload both succeed, the startup goroutine invokes an embedded Node.js script via `docker exec openclaw node -e <script> wire-mcp sluice <url>`. - The script performs the full openclaw gateway handshake: reads device identity (ed25519 key + deviceId) from ~/.openclaw/identity/device.json, connects to ws://127.0.0.1:port/, receives connect.challenge, builds the v3 device auth payload (pipe-separated deviceId|clientId|clientMode|role|scopes|signedAt| token|nonce|platform|deviceFamily with ASCII-lowercased metadata), signs it, and sends a connect request claiming operator.admin scope. - On hello-ok, it chains config.get (to fetch the current baseHash) -> config.patch with restartDelayMs:3000 and raw JSON merge-patching mcp.servers.<name> = {url}. Idempotent: a second run with the same URL is a noop on the openclaw side. - The script also handles standalone secrets.reload and direct config.patch invocations, replacing the previous inline Node one- liner. Key pieces: - internal/container/gateway_rpc.js: the embedded Node script, loaded via //go:embed and passed to `node -e`. Argv auto-detects between file-mode and -e-mode invocation. - internal/container/gateway_rpc.go: GatewayRPCNodeCommand helper that builds the argv slice. - ContainerManager.WireMCPGateway(ctx, name, sluiceURL): new method implemented by Docker, Apple Container, and tart backends. - cmd/sluice/main.go: Phase 3 of the startup goroutine calls WireMCPGateway after secrets reload. Exit code 137 is tolerated (the config.patch response was received; the docker exec was then killed by the subsequent gateway restart). Cleanup: - Remove InjectMCPConfig and WriteMCPConfig helpers entirely. - Drop --mcp-dir flag, SLUICE_MCP_DIR env var, writeMCPServersJSON helper, and all associated tests. - Remove sluice-mcp volume from compose files. - Remove /home/sluice/mcp directory from Dockerfile. - compose.yml: pin sluice IP on the internal network to 172.30.0.2 and add extra_hosts ["sluice:172.30.0.2"] on tun2proxy so OpenClaw (sharing tun2proxy's namespace) can resolve "sluice" via /etc/hosts without Docker's embedded DNS (unreachable through the TUN). Also: - internal/mcp/server_http.go: handle GET on /mcp as an SSE stream with keepalive comments. openclaw's bundle-mcp probes with GET and logged 405 warnings; now returns 200 with text/event-stream. CLAUDE.md rewritten with the new one-command MCP setup flow.
1 parent b815323 commit 59408d3

23 files changed

Lines changed: 1124 additions & 608 deletions

CLAUDE.md

Lines changed: 15 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -86,7 +86,19 @@ When `--destination` is provided, `sluice cred add` also creates an allow rule a
8686

8787
Two credential types: `static` (default) for API keys and `oauth` for OAuth access/refresh token pairs. OAuth credentials prompt for tokens via stdin (not CLI flags) to avoid shell history exposure.
8888

89-
Runtime flags: `--mcp-dir` (or `SLUICE_MCP_DIR` env var) sets the shared volume path for `mcp-servers.json` auto-injection. Defaults to the MCP volume path in compose files (`/home/sluice/mcp`).
89+
Runtime flags: `--mcp-base-url` sets the external URL the agent uses to reach sluice's MCP gateway (e.g. `http://sluice:3000`). This is added to `SelfBypass` so sluice does not policy-check its own MCP traffic. Defaults to deriving from `--health-addr`.
90+
91+
## MCP Gateway Setup
92+
93+
OpenClaw connects to sluice's MCP gateway via Streamable HTTP. This is a one-time setup per deployment:
94+
95+
```bash
96+
docker exec openclaw openclaw mcp set sluice '{"url":"http://sluice:3000/mcp"}'
97+
```
98+
99+
For the hostname `sluice` to resolve inside OpenClaw, the compose file pins sluice's IP on the internal network (172.30.0.2) and adds an `extra_hosts` entry on tun2proxy (which OpenClaw shares). Docker's embedded DNS (127.0.0.11) is not reachable from OpenClaw because its DNS is routed through the TUN device. The `/etc/hosts` entry bypasses DNS entirely.
100+
101+
When new MCP upstreams are added to sluice via `sluice mcp add`, restart sluice so the gateway picks them up. OpenClaw does not need to be restarted - its connection to sluice:3000/mcp remains valid and it re-queries the tool list on subsequent agent runs.
90102

91103
## Policy Store
92104

@@ -174,7 +186,7 @@ Three upstream transports: stdio (child processes), Streamable HTTP, WebSocket.
174186

175187
`MCPHTTPHandler` serves `POST /mcp` and `DELETE /mcp` on port 3000 (alongside `/healthz`). Session tracking via `Mcp-Session-Id` header. SSE response support.
176188

177-
Auto-injection: when container runtime active, writes `mcp-servers.json` to shared MCP volume (`sluice-mcp`), signals agent to reload. SOCKS5 auto-bypasses connections to sluice's own listeners (`SelfBypass`).
189+
Agent connection: OpenClaw is configured once (via `openclaw mcp set`) to connect to `http://sluice:3000/mcp`. Sluice's `SelfBypass` auto-allows connections to its own MCP listener so the traffic is not policy-checked.
178190

179191
### Vault providers
180192

@@ -196,7 +208,7 @@ Chain provider: `providers = ["1password", "age"]` tries in order, first hit win
196208

197209
All backends implement `ContainerManager` interface (`internal/container/types.go`).
198210

199-
**Docker**: Three-container compose (sluice + tun2proxy + openclaw). Hot-reload via `docker exec` env var injection into `~/.openclaw/.env` + `docker exec openclaw openclaw secrets reload`. MCP config shared via `sluice-mcp` volume. Fallback: container restart.
211+
**Docker**: Three-container compose (sluice + tun2proxy + openclaw). Hot-reload via `docker exec` env var injection into `~/.openclaw/.env` + `docker exec openclaw openclaw secrets reload`. MCP wiring is a one-time `openclaw mcp set` (see "MCP Gateway Setup" above). Fallback: container restart.
200212

201213
**Apple Container**: Linux micro-VMs via Virtualization.framework. tun2proxy runs on host. `NetworkRouter` manages pf anchor rules to redirect VM bridge traffic. VirtioFS for shared volumes.
202214

Dockerfile

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -26,8 +26,8 @@ RUN apk add --no-cache ca-certificates wget nodejs npm python3 && \
2626
mv "/tmp/uv-${UV_ARCH}/uvx" /usr/local/bin/uvx && \
2727
rm -rf "/tmp/uv-${UV_ARCH}" && \
2828
adduser -D -h /home/sluice sluice && \
29-
mkdir -p /home/sluice/ca /home/sluice/.sluice /home/sluice/data /home/sluice/mcp /var/log/sluice /etc/sluice && \
30-
chown sluice:sluice /home/sluice/ca /home/sluice/.sluice /home/sluice/data /home/sluice/mcp /var/log/sluice /etc/sluice
29+
mkdir -p /home/sluice/ca /home/sluice/.sluice /home/sluice/data /var/log/sluice /etc/sluice && \
30+
chown sluice:sluice /home/sluice/ca /home/sluice/.sluice /home/sluice/data /var/log/sluice /etc/sluice
3131
COPY --from=builder /sluice /usr/local/bin/sluice
3232
COPY scripts/docker-entrypoint.sh /usr/local/bin/docker-entrypoint.sh
3333
RUN chmod +x /usr/local/bin/docker-entrypoint.sh
@@ -36,4 +36,4 @@ WORKDIR /home/sluice
3636
EXPOSE 1080 3000
3737
HEALTHCHECK --interval=10s --timeout=3s CMD wget -qO- http://localhost:3000/healthz || exit 1
3838
ENTRYPOINT ["docker-entrypoint.sh"]
39-
CMD ["-listen", "0.0.0.0:1080", "-health-addr", "0.0.0.0:3000", "-mcp-base-url", "http://sluice:3000", "-db", "data/sluice.db", "-config", "/etc/sluice/config.toml", "-audit", "/var/log/sluice/audit.jsonl", "-mcp-dir", "/home/sluice/mcp"]
39+
CMD ["-listen", "0.0.0.0:1080", "-health-addr", "0.0.0.0:3000", "-mcp-base-url", "http://sluice:3000", "-db", "data/sluice.db", "-config", "/etc/sluice/config.toml", "-audit", "/var/log/sluice/audit.jsonl"]

cmd/sluice/main.go

Lines changed: 85 additions & 95 deletions
Original file line numberDiff line numberDiff line change
@@ -82,19 +82,9 @@ func main() {
8282
containerName := flag.String("container-name", envDefault("SLUICE_AGENT_CONTAINER", "openclaw"), "agent container/VM name")
8383
certDir := flag.String("cert-dir", "", "shared volume path for CA certificate (enables MITM trust injection into guest)")
8484
dnsResolver := flag.String("dns-resolver", "", "upstream DNS resolver address for DNS interception (default: 8.8.8.8:53)")
85-
autoInjectMCP := flag.Bool("auto-inject-mcp", false, "auto-inject MCP config into agent container (default true for docker/apple runtimes)")
86-
mcpBaseURL := flag.String("mcp-base-url", "", "base URL for auto-injected MCP config (e.g. http://sluice:3000); derived from --health-addr when empty")
87-
mcpDir := flag.String("mcp-dir", "", "shared volume path for MCP config (mcp-servers.json); enables MCP auto-injection into agent container")
88-
autoInjectMCPSet := false
85+
mcpBaseURL := flag.String("mcp-base-url", "", "external base URL the agent uses to reach sluice's MCP gateway (e.g. http://sluice:3000); added to SelfBypass so sluice does not policy-check its own MCP traffic")
8986
flag.Parse()
9087

91-
// Track whether the flag was explicitly set by the user.
92-
flag.Visit(func(f *flag.Flag) {
93-
if f.Name == "auto-inject-mcp" {
94-
autoInjectMCPSet = true
95-
}
96-
})
97-
9888
// Validate --runtime flag early.
9989
switch *runtimeFlag {
10090
case "auto", "docker", "apple", "macos", "none":
@@ -278,26 +268,24 @@ func main() {
278268
log.Printf("apple container manager enabled: container=%s", *containerName)
279269
}
280270
case "macos":
281-
tartMgr, tartRouter, containerMgr, tartVMOwned = startMacOSVM(*vmImage, *containerName, *mcpDir, *certDir)
271+
tartMgr, tartRouter, containerMgr, tartVMOwned = startMacOSVM(*vmImage, *containerName, *certDir)
282272
case "none":
283273
log.Printf("standalone mode: no container runtime (configure ALL_PROXY=socks5://%s manually)", *listenAddr)
284274
case "":
285275
log.Printf("no container runtime detected; container management disabled")
286276
}
287277

288-
// Default auto-inject-mcp to true when a container runtime is active
289-
// and the user did not explicitly set the flag.
290-
if !autoInjectMCPSet && containerMgr != nil {
291-
*autoInjectMCP = true
292-
}
293-
294278
// Build self-bypass addresses so the agent's MCP HTTP connection to
295-
// sluice is auto-allowed without policy evaluation.
279+
// sluice is auto-allowed without policy evaluation. This applies
280+
// whenever a health address is configured; if no agent exists, the
281+
// bypass list has no effect.
296282
var selfBypass []string
297-
if *autoInjectMCP && *healthAddr != "" {
283+
if *healthAddr != "" {
298284
selfBypass = buildSelfBypass(*healthAddr)
299-
// When mcp-base-url is set, also bypass the external hostname (e.g.
300-
// "sluice:3000" in Docker Compose) that the agent uses to reach us.
285+
// When --mcp-base-url is set, also bypass the external hostname
286+
// (e.g. "sluice:3000" in Docker Compose) that the agent uses to
287+
// reach us, in case the connection is routed through the SOCKS5
288+
// proxy instead of directly on the Docker network.
301289
if *mcpBaseURL != "" {
302290
if extra := selfBypassFromURL(*mcpBaseURL, *healthAddr); extra != "" {
303291
selfBypass = append(selfBypass, extra)
@@ -338,59 +326,6 @@ func main() {
338326
}
339327
}
340328

341-
// Inject phantom env vars into the agent container at startup.
342-
// Bindings with env_var set produce env var entries (e.g. OPENAI_API_KEY=phantom-xxx)
343-
// that are written into the agent's .env file via docker exec.
344-
// Retry with backoff because the agent container may still be starting
345-
// (compose healthcheck ordering ensures sluice starts first).
346-
if containerMgr != nil && db != nil {
347-
go func() {
348-
// Phase 1: write .env file into the agent container.
349-
// Retry with backoff because the container may still be starting.
350-
backoff := []time.Duration{0, 2 * time.Second, 5 * time.Second, 10 * time.Second, 30 * time.Second}
351-
injected := false
352-
for i, delay := range backoff {
353-
if delay > 0 {
354-
time.Sleep(delay)
355-
}
356-
if err := injectEnvVarsFromStore(db, containerMgr); err != nil {
357-
if i < len(backoff)-1 {
358-
log.Printf("startup env injection attempt %d/%d failed: %v (retrying)", i+1, len(backoff), err)
359-
continue
360-
}
361-
log.Printf("WARNING: startup env injection failed after %d attempts: %v", len(backoff), err)
362-
} else {
363-
log.Printf("startup env injection succeeded (attempt %d/%d)", i+1, len(backoff))
364-
injected = true
365-
break
366-
}
367-
}
368-
if !injected {
369-
return
370-
}
371-
// Phase 2: signal the agent to reload secrets.
372-
// The gateway takes longer to start than the container itself,
373-
// so retry the reload with a longer backoff.
374-
reloadBackoff := []time.Duration{5 * time.Second, 10 * time.Second, 20 * time.Second, 30 * time.Second, 60 * time.Second}
375-
for i, delay := range reloadBackoff {
376-
time.Sleep(delay)
377-
ctx, cancel := context.WithTimeout(context.Background(), 15*time.Second)
378-
if err := containerMgr.ReloadSecrets(ctx); err != nil {
379-
cancel()
380-
if i < len(reloadBackoff)-1 {
381-
log.Printf("startup secrets reload attempt %d/%d failed: %v (retrying)", i+1, len(reloadBackoff), err)
382-
continue
383-
}
384-
log.Printf("WARNING: startup secrets reload failed after %d attempts: %v", len(reloadBackoff), err)
385-
} else {
386-
cancel()
387-
log.Printf("startup secrets reload succeeded (attempt %d/%d)", i+1, len(reloadBackoff))
388-
return
389-
}
390-
}
391-
}()
392-
}
393-
394329
// Configure the OAuth refresh callback so that after a token refresh
395330
// is persisted, the updated phantom env vars are re-injected into the
396331
// agent container.
@@ -512,7 +447,9 @@ func main() {
512447
}
513448

514449
// MCP gateway: if upstreams are configured, start the gateway and
515-
// serve it via HTTP on /mcp alongside the API.
450+
// serve it via HTTP on /mcp alongside the API. The mcpHandler local
451+
// is consumed by the startup goroutine below and by the HTTP API
452+
// server that exposes /mcp.
516453
var mcpHandler http.Handler
517454
upstreamRows, mcpListErr := db.ListMCPUpstreams()
518455
if mcpListErr != nil {
@@ -575,19 +512,77 @@ func main() {
575512

576513
mcpHandler = mcp.NewMCPHTTPHandler(mcpGW)
577514
log.Printf("MCP gateway on /mcp: %d tools from %d upstreams", len(mcpGW.Tools()), len(mcpUpstreams))
515+
}
578516

579-
// Auto-inject MCP config into the agent container so it connects
580-
// to sluice's gateway via Streamable HTTP.
581-
if *autoInjectMCP && containerMgr != nil && *mcpDir != "" {
517+
// Startup agent container setup: env var injection, secrets reload,
518+
// and MCP gateway wiring. All phases retry with backoff because the
519+
// agent container may still be starting (compose healthcheck ordering
520+
// ensures sluice starts first). Runs in a goroutine so sluice's HTTP
521+
// API and SOCKS5 listeners come up immediately. All phases are no-ops
522+
// outside a container runtime setup.
523+
if containerMgr != nil && db != nil {
524+
hasMCPGateway := mcpHandler != nil
525+
go func() {
526+
// Phase 1: write .env file into the agent container with
527+
// phantom tokens from bindings that declare env_var.
528+
backoff := []time.Duration{0, 2 * time.Second, 5 * time.Second, 10 * time.Second, 30 * time.Second}
529+
injected := false
530+
for i, delay := range backoff {
531+
if delay > 0 {
532+
time.Sleep(delay)
533+
}
534+
if err := injectEnvVarsFromStore(db, containerMgr); err != nil {
535+
if i < len(backoff)-1 {
536+
log.Printf("startup env injection attempt %d/%d failed: %v (retrying)", i+1, len(backoff), err)
537+
continue
538+
}
539+
log.Printf("WARNING: startup env injection failed after %d attempts: %v", len(backoff), err)
540+
} else {
541+
log.Printf("startup env injection succeeded (attempt %d/%d)", i+1, len(backoff))
542+
injected = true
543+
break
544+
}
545+
}
546+
if !injected {
547+
return
548+
}
549+
// Phase 2: signal the agent to reload secrets. The gateway
550+
// takes longer to start than the container itself, so retry
551+
// with a longer backoff.
552+
reloadBackoff := []time.Duration{5 * time.Second, 10 * time.Second, 20 * time.Second, 30 * time.Second, 60 * time.Second}
553+
reloadedSecrets := false
554+
for i, delay := range reloadBackoff {
555+
time.Sleep(delay)
556+
ctx, cancel := context.WithTimeout(context.Background(), 15*time.Second)
557+
if err := containerMgr.ReloadSecrets(ctx); err != nil {
558+
cancel()
559+
if i < len(reloadBackoff)-1 {
560+
log.Printf("startup secrets reload attempt %d/%d failed: %v (retrying)", i+1, len(reloadBackoff), err)
561+
continue
562+
}
563+
log.Printf("WARNING: startup secrets reload failed after %d attempts: %v", len(reloadBackoff), err)
564+
} else {
565+
cancel()
566+
log.Printf("startup secrets reload succeeded (attempt %d/%d)", i+1, len(reloadBackoff))
567+
reloadedSecrets = true
568+
break
569+
}
570+
}
571+
if !reloadedSecrets || !hasMCPGateway {
572+
return
573+
}
574+
// Phase 3: wire sluice's MCP gateway URL into the agent's
575+
// openclaw.json config via WebSocket RPC. Idempotent.
576+
// deriveMCPBaseURL already returns a URL ending in /mcp.
582577
mcpURL := deriveMCPBaseURL(*mcpBaseURL, *healthAddr)
583-
if injectErr := containerMgr.InjectMCPConfig(*mcpDir, mcpURL); injectErr != nil {
584-
log.Printf("WARNING: MCP auto-inject failed: %v", injectErr)
578+
wireCtx, wireCancel := context.WithTimeout(context.Background(), 15*time.Second)
579+
if wireErr := containerMgr.WireMCPGateway(wireCtx, "sluice", mcpURL); wireErr != nil {
580+
log.Printf("WARNING: failed to wire MCP gateway into agent config: %v", wireErr)
585581
} else {
586-
log.Printf("MCP config auto-injected to %s (url=%s)", *mcpDir, mcpURL)
582+
log.Printf("MCP gateway wired into agent config: mcp.servers.sluice.url=%s", mcpURL)
587583
}
588-
} else if *autoInjectMCP && containerMgr != nil && *mcpDir == "" {
589-
log.Printf("MCP auto-inject skipped: no shared volume path configured (use -mcp-dir)")
590-
}
584+
wireCancel()
585+
}()
591586
}
592587

593588
// Start HTTP server with health check and REST API on :3000 (or --health-addr).
@@ -1040,27 +1035,22 @@ func seedStoreFromConfig(db *store.Store, configPath string) error {
10401035
// routing. Returns the TartManager, NetworkRouter (for shutdown cleanup),
10411036
// the ContainerManager interface, and a boolean indicating whether sluice
10421037
// started the VM. Calls log.Fatalf on unrecoverable errors.
1043-
func startMacOSVM(vmImage, vmName, mcpDir, certDir string) (*container.TartManager, *container.NetworkRouter, container.ContainerManager, bool) {
1038+
func startMacOSVM(vmImage, vmName, certDir string) (*container.TartManager, *container.NetworkRouter, container.ContainerManager, bool) {
10441039
cli, cliErr := container.NewTartCLI(nil)
10451040
if cliErr != nil {
10461041
log.Fatalf("--runtime macos: tart CLI not available: %v", cliErr)
10471042
}
10481043

1049-
mgr, router, owned, err := setupMacOSVM(cli, vmImage, vmName, mcpDir, certDir)
1044+
mgr, router, owned, err := setupMacOSVM(cli, vmImage, vmName, certDir)
10501045
if err != nil {
10511046
log.Fatalf("--runtime macos: %v", err)
10521047
}
10531048
return mgr, router, mgr, owned
10541049
}
10551050

10561051
// buildTartRunConfig creates the TartRunConfig with VirtioFS mounts.
1057-
func buildTartRunConfig(vmName, mcpDir, certDir string) container.TartRunConfig {
1052+
func buildTartRunConfig(vmName, certDir string) container.TartRunConfig {
10581053
var dirMounts []container.TartDirMount
1059-
if mcpDir != "" {
1060-
dirMounts = append(dirMounts, container.TartDirMount{
1061-
Name: "mcp", HostPath: mcpDir,
1062-
})
1063-
}
10641054
if certDir != "" {
10651055
dirMounts = append(dirMounts, container.TartDirMount{
10661056
Name: "ca", HostPath: certDir, ReadOnly: true,
@@ -1100,7 +1090,7 @@ func waitForVMIP(ctx context.Context, cli *container.TartCLI, vmName string) (st
11001090
// indicates whether sluice started the VM (true) or attached to an
11011091
// already-running VM (false). Only VMs started by sluice should be
11021092
// stopped on shutdown.
1103-
func setupMacOSVM(cli *container.TartCLI, vmImage, vmName, mcpDir, certDir string) (*container.TartManager, *container.NetworkRouter, bool, error) {
1093+
func setupMacOSVM(cli *container.TartCLI, vmImage, vmName, certDir string) (*container.TartManager, *container.NetworkRouter, bool, error) {
11041094
ctx := context.Background()
11051095

11061096
// Check if VM already exists.
@@ -1123,7 +1113,7 @@ func setupMacOSVM(cli *container.TartCLI, vmImage, vmName, mcpDir, certDir strin
11231113
return nil, nil, false, fmt.Errorf("check VM state: %w", stateErr)
11241114
}
11251115

1126-
runCfg := buildTartRunConfig(vmName, mcpDir, certDir)
1116+
runCfg := buildTartRunConfig(vmName, certDir)
11271117

11281118
// Track whether sluice started the VM so we only stop it on shutdown
11291119
// if we own it. Attaching to a pre-existing VM and then killing it on

0 commit comments

Comments
 (0)