Skip to content

Commit f13261a

Browse files
userclaude
andcommitted
spec(runner): sync spec with code drift, add OpenShell desired state
- Fix source layout: add model.py, observability files, fixtures/, remove duplicate workspace.py - Document AGUI_TOKEN session auth middleware and SDK_OPTIONS env var - Document runtime model switching via POST /model - Add 'Desired State: OpenShell Credential Isolation' section with migration path 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
1 parent 233a2cc commit f13261a

3 files changed

Lines changed: 561 additions & 2 deletions

File tree

Lines changed: 242 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,242 @@
1+
# Adapting ambient-runner to Use OpenShell
2+
3+
> Analysis date: 2026-06-03
4+
> Companion doc: [OpenShell Security Model Analysis](openshell-security-analysis.md)
5+
> Target component: `components/runners/ambient-runner/ambient_runner/`
6+
7+
---
8+
9+
## Current Runner Credential Model (The Problem)
10+
11+
The runner puts **real secrets directly into `os.environ`** and the agent's process memory. If the agent inspects its own environment, it sees real credentials.
12+
13+
### How Secrets Flow Today
14+
15+
| Mechanism | File | What Happens |
16+
|-----------|------|-------------|
17+
| `populate_runtime_credentials()` | `platform/auth.py` | Fetches real tokens from backend API, writes them into `os.environ`: `GITHUB_TOKEN`, `GITLAB_TOKEN`, `JIRA_API_TOKEN`, `ANTHROPIC_API_KEY`, `CODERABBIT_API_KEY`, etc. |
18+
| Token files on disk | `platform/auth.py` | Writes real tokens to `/tmp/.ambient_github_token`, `/tmp/.ambient_gitlab_token`, `/tmp/.ambient_kubeconfig` for the git credential helper and `gh` wrapper |
19+
| Git credential helper | `platform/auth.py` | Shell script at `/tmp/git-credential-ambient` reads the real token from temp file and pipes it to git |
20+
| `gh` CLI wrapper | `platform/auth.py` | Shell script reads real GitHub token from file, exports `GH_TOKEN`, then exec's the real `gh` |
21+
| Secret redaction middleware | `middleware/secret_redaction.py` | Post-hoc defense: scrubs secrets from *outbound AG-UI events* only — the agent process still has full access to real secrets in memory and on disk |
22+
23+
### The Gap
24+
25+
```
26+
Agent reads /proc/self/environ → sees GITHUB_TOKEN=ghp_real_secret
27+
Agent runs: cat /tmp/.ambient_* → sees real tokens
28+
Agent runs: echo $ANTHROPIC_API_KEY → sees real API key
29+
```
30+
31+
The redaction middleware protects the *output stream* (events sent to the frontend), not the agent itself. A compromised or misbehaving agent has unrestricted access to all credentials.
32+
33+
---
34+
35+
## OpenShell Integration Strategies
36+
37+
### Strategy 1: OpenShell as Sidecar Supervisor (Recommended)
38+
39+
Replace the runner container's direct credential injection with OpenShell's Supervisor running as a sidecar (or init container + persistent process) in the same pod.
40+
41+
#### What Changes
42+
43+
| Component | Current | With OpenShell |
44+
|-----------|---------|---------------|
45+
| `auth.py:populate_runtime_credentials()` | Sets `os.environ["GITHUB_TOKEN"] = real_token` | Sets `os.environ["GITHUB_TOKEN"] = "openshell:resolve:env:GITHUB_TOKEN"` |
46+
| Token files (`/tmp/.ambient_*`) | Contain real tokens | Contain placeholder strings |
47+
| Git credential helper | Reads real token from file | Reads placeholder; OpenShell proxy rewrites on outbound |
48+
| `gh` wrapper | Exports real `GH_TOKEN` | Exports placeholder; proxy rewrites |
49+
| Network egress | Direct to `api.github.com`, etc. | Via OpenShell HTTP CONNECT proxy at `10.200.0.1:3128` |
50+
| `secret_redaction.py` | Primary defense for output stream | Redundant but kept as defense-in-depth |
51+
| `_grpc_client.py` | Direct gRPC to API server | Whitelisted in network policy (intra-cluster) |
52+
| Claude CLI subprocess | Full env access with real secrets | Runs in sandbox netns with placeholders only |
53+
54+
#### Implementation Steps
55+
56+
**1. New OpenShell provider type**
57+
58+
Register Ambient's credential store as an OpenShell provider. The Operator creates a provider config that maps each credential type (github, gitlab, jira, etc.) to the corresponding backend API credential endpoint. Two options:
59+
60+
- OpenShell's Gateway calls the Ambient backend to fetch the real token on demand
61+
- The Operator pre-populates the provider at pod creation time (simpler, no Gateway dependency)
62+
63+
**2. Modify `platform/auth.py`**
64+
65+
Replace `populate_runtime_credentials()` with a version that writes placeholders instead of real values:
66+
67+
```python
68+
# Before (current)
69+
os.environ["GITHUB_TOKEN"] = github_creds["token"] # real secret
70+
_GITHUB_TOKEN_FILE.write_text(github_creds["token"]) # real secret on disk
71+
72+
# After (with OpenShell)
73+
os.environ["GITHUB_TOKEN"] = "openshell:resolve:env:GITHUB_TOKEN" # placeholder
74+
_GITHUB_TOKEN_FILE.write_text("openshell:resolve:env:GITHUB_TOKEN") # placeholder
75+
# Real secret held only in Supervisor memory → proxy rewrites on outbound
76+
```
77+
78+
The same pattern applies to all credential types: `GITLAB_TOKEN`, `JIRA_API_TOKEN`, `ANTHROPIC_API_KEY`, `CODERABBIT_API_KEY`, `KUBECONFIG`.
79+
80+
**3. Modify the Dockerfile**
81+
82+
Add OpenShell Supervisor binary. The runner (uvicorn) starts normally; the Supervisor is invoked by `bridge.py` when launching the Claude CLI subprocess:
83+
84+
```dockerfile
85+
# Add OpenShell binary
86+
COPY --from=openshell/supervisor:latest /usr/bin/openshell-sandbox /usr/bin/openshell-sandbox
87+
88+
# Entrypoint unchanged — uvicorn runs unsandboxed:
89+
CMD ["/bin/bash", "-c", "umask 0022 && cd /app/ambient-runner && uvicorn main:app --host 0.0.0.0 --port 8001"]
90+
```
91+
92+
The Supervisor wraps only the Claude CLI subprocess (launched from `bridges/claude/bridge.py`), applying Landlock + seccomp + netns to the agent process. The runner itself (FastAPI/uvicorn, gRPC client, credential fetching) runs outside the sandbox boundary.
93+
94+
**4. Network policy via OpenShell**
95+
96+
Replace the K8s `NetworkPolicy` with OpenShell's per-sandbox network namespace + OPA policy:
97+
98+
```yaml
99+
network_policies:
100+
ambient_backend:
101+
name: ambient-backend-access
102+
endpoints:
103+
- host: backend-service.ambient-code.svc.cluster.local
104+
port: 8080
105+
protocol: rest
106+
access: read-write
107+
binaries:
108+
- { path: /usr/bin/python3 }
109+
110+
ambient_grpc:
111+
name: ambient-grpc-access
112+
endpoints:
113+
- host: ambient-api-server.ambient-code.svc.cluster.local
114+
port: 9000
115+
protocol: connect
116+
access: read-write
117+
binaries:
118+
- { path: /usr/bin/python3 }
119+
120+
github_api:
121+
name: github-api-access
122+
endpoints:
123+
- host: api.github.com
124+
port: 443
125+
protocol: rest
126+
access: read-write
127+
128+
anthropic_api:
129+
name: anthropic-api-access
130+
endpoints:
131+
- host: api.anthropic.com
132+
port: 443
133+
protocol: rest
134+
access: read-write
135+
136+
gitlab_api:
137+
name: gitlab-api-access
138+
endpoints:
139+
- host: "*.gitlab.com"
140+
port: 443
141+
protocol: rest
142+
access: read-write
143+
```
144+
145+
**5. `_grpc_client.py` — No changes needed**
146+
147+
The gRPC channel to the API server is established by the runner process, which runs outside the OpenShell sandbox boundary. Since only the Claude CLI subprocess is sandboxed, the gRPC client is unaffected.
148+
149+
**6. Modify `bridges/claude/bridge.py`**
150+
151+
Set `HTTP_PROXY`/`HTTPS_PROXY` for the Claude CLI subprocess so it routes through the OpenShell proxy. OpenShell injects these automatically when the sandbox starts — the bridge needs to pass them through to the subprocess env.
152+
153+
**7. Operator changes**
154+
155+
The Operator (`components/operator/`) configures OpenShell provider + policy per session Job:
156+
157+
- Inject OpenShell provider config as a ConfigMap or Secret
158+
- Mount the Supervisor binary (or use a sidecar container)
159+
- Generate per-session OPA policies based on the session's credential bindings
160+
- Pass the policy YAML as a volume mount
161+
162+
#### Files to Modify
163+
164+
| File | Change |
165+
|------|--------|
166+
| `platform/auth.py` | `populate_runtime_credentials()` writes placeholders, not real tokens |
167+
| `platform/auth.py` | Token files (`/tmp/.ambient_*`) get placeholder values |
168+
| `platform/auth.py` | `install_git_credential_helper()` — helper returns placeholder; proxy rewrites |
169+
| `platform/auth.py` | `install_gh_wrapper()` — wrapper exports placeholder `GH_TOKEN` |
170+
| `_grpc_client.py` | No changes needed — gRPC runs in runner process, outside Claude subprocess sandbox boundary |
171+
| `Dockerfile` | Add OpenShell Supervisor binary, modify CMD |
172+
| `bridges/claude/bridge.py` | Proxy env vars for Claude CLI subprocess |
173+
| `middleware/secret_redaction.py` | Keep as defense-in-depth (now truly redundant) |
174+
| `components/operator/` | Configure OpenShell provider + policy per session Job |
175+
176+
---
177+
178+
### Strategy 2: OpenShell as Pod Runtime (Operator-Level)
179+
180+
The Operator spawns Jobs using an OpenShell-managed container runtime instead of raw K8s containers. The integration moves up a level — runner code doesn't change, but the Operator configures OpenShell as the execution environment.
181+
182+
**Pros:** Zero runner code changes.
183+
184+
**Cons:** Requires OpenShell's Kubernetes compute driver to be production-ready (currently alpha). Heavier Operator changes. Less control over per-session policy granularity from the runner's perspective.
185+
186+
---
187+
188+
### Strategy 3: OpenShell Provider Bridge (Minimal, Credential-Only)
189+
190+
Adopt only the credential placeholder/proxy pattern without the full sandbox. Write a thin Python adapter that:
191+
192+
1. Starts a local HTTP CONNECT proxy in the runner pod
193+
2. Holds real secrets in proxy memory (separate process, higher privilege)
194+
3. Injects placeholders into `os.environ`
195+
4. Rewrites placeholders to real values on outbound requests
196+
197+
**Pros:** No Rust dependency, no kernel features (Landlock/seccomp) needed. Works on any kernel version. Smallest change surface.
198+
199+
**Cons:** No Landlock/seccomp/netns isolation — only credential isolation. Agent can still bypass the proxy if it makes raw socket calls (no network namespace enforcement). No L7 inspection or OPA policy evaluation.
200+
201+
---
202+
203+
## Strategy Comparison
204+
205+
| Criterion | Strategy 1 (Sidecar) | Strategy 2 (Pod Runtime) | Strategy 3 (Proxy Only) |
206+
|-----------|---------------------|------------------------|------------------------|
207+
| Credential isolation | Full (placeholder/proxy) | Full (placeholder/proxy) | Partial (no netns enforcement) |
208+
| Network isolation | Full (netns + iptables) | Full (netns + iptables) | None |
209+
| Filesystem isolation | Landlock LSM | Landlock LSM | None |
210+
| Syscall filtering | seccomp-BPF | seccomp-BPF | None |
211+
| L7 inspection (OPA) | Yes | Yes | No |
212+
| Runner code changes | Moderate (`auth.py`, `Dockerfile`) | None | Small (new proxy module) |
213+
| Operator changes | Moderate (provider + policy config) | Heavy (new compute driver) | None |
214+
| Kernel requirements | Linux 5.13+ (Landlock) | Linux 5.13+ (Landlock) | None |
215+
| OpenShell maturity dependency | Supervisor (stable) | K8s driver (alpha) | None (custom code) |
216+
| Defense depth | 5 layers | 5 layers | 1 layer |
217+
218+
---
219+
220+
## Recommendation
221+
222+
**Strategy 1 (Sidecar Supervisor)** is the right path. It provides:
223+
224+
- Agent never sees real secrets (even `/proc/self/environ` inspection fails)
225+
- L7 inspection via OPA policies (audit which APIs the agent calls)
226+
- Landlock + seccomp hardening within the container
227+
- Binary identity via SHA256 TOFU (only known binaries can make network calls)
228+
- The existing `secret_redaction.py` becomes a true defense-in-depth layer rather than the primary defense
229+
230+
The critical architectural insight: OpenShell's credential proxy pattern eliminates the single point of failure in the current design. Today, `populate_runtime_credentials()` puts real secrets into a space the agent fully controls. OpenShell moves real secrets into Supervisor memory — a separate privilege domain the agent cannot access.
231+
232+
### Prerequisite: Kernel Version
233+
234+
OpenShell's Landlock LSM requires Linux 5.13+. The runner containers run on UBI 10 (RHEL 10), which ships kernel 6.x — this is satisfied. OpenShell's `best_effort` Landlock mode also provides graceful degradation if the kernel lacks support.
235+
236+
### Migration Path
237+
238+
1. **Phase 1 — Credential proxy only (Strategy 3):** Ship a Python-only credential proxy as a proof of concept. Validates the placeholder/rewrite pattern works with git credential helper, `gh` wrapper, and Claude CLI without requiring OpenShell binary.
239+
240+
2. **Phase 2 — Sidecar Supervisor (Strategy 1):** Add OpenShell Supervisor binary, network namespace isolation, Landlock, and seccomp. This is the production target.
241+
242+
3. **Phase 3 — OPA policies:** Add L7 inspection with per-session OPA policies generated by the Operator from the session's credential bindings and project settings.

0 commit comments

Comments
 (0)