|
| 1 | +# Adapting ambient-runner to Use OpenShell |
| 2 | + |
| 3 | +> Analysis date: 2026-06-03 |
| 4 | +> Companion doc: [OpenShell Security Model Analysis](openshell-security-analysis.md) |
| 5 | +> Target component: `components/runners/ambient-runner/ambient_runner/` |
| 6 | +
|
| 7 | +--- |
| 8 | + |
| 9 | +## Current Runner Credential Model (The Problem) |
| 10 | + |
| 11 | +The runner puts **real secrets directly into `os.environ`** and the agent's process memory. If the agent inspects its own environment, it sees real credentials. |
| 12 | + |
| 13 | +### How Secrets Flow Today |
| 14 | + |
| 15 | +| Mechanism | File | What Happens | |
| 16 | +|-----------|------|-------------| |
| 17 | +| `populate_runtime_credentials()` | `platform/auth.py` | Fetches real tokens from backend API, writes them into `os.environ`: `GITHUB_TOKEN`, `GITLAB_TOKEN`, `JIRA_API_TOKEN`, `ANTHROPIC_API_KEY`, `CODERABBIT_API_KEY`, etc. | |
| 18 | +| Token files on disk | `platform/auth.py` | Writes real tokens to `/tmp/.ambient_github_token`, `/tmp/.ambient_gitlab_token`, `/tmp/.ambient_kubeconfig` for the git credential helper and `gh` wrapper | |
| 19 | +| Git credential helper | `platform/auth.py` | Shell script at `/tmp/git-credential-ambient` reads the real token from temp file and pipes it to git | |
| 20 | +| `gh` CLI wrapper | `platform/auth.py` | Shell script reads real GitHub token from file, exports `GH_TOKEN`, then exec's the real `gh` | |
| 21 | +| Secret redaction middleware | `middleware/secret_redaction.py` | Post-hoc defense: scrubs secrets from *outbound AG-UI events* only — the agent process still has full access to real secrets in memory and on disk | |
| 22 | + |
| 23 | +### The Gap |
| 24 | + |
| 25 | +``` |
| 26 | +Agent reads /proc/self/environ → sees GITHUB_TOKEN=ghp_real_secret |
| 27 | +Agent runs: cat /tmp/.ambient_* → sees real tokens |
| 28 | +Agent runs: echo $ANTHROPIC_API_KEY → sees real API key |
| 29 | +``` |
| 30 | + |
| 31 | +The redaction middleware protects the *output stream* (events sent to the frontend), not the agent itself. A compromised or misbehaving agent has unrestricted access to all credentials. |
| 32 | + |
| 33 | +--- |
| 34 | + |
| 35 | +## OpenShell Integration Strategies |
| 36 | + |
| 37 | +### Strategy 1: OpenShell as Sidecar Supervisor (Recommended) |
| 38 | + |
| 39 | +Replace the runner container's direct credential injection with OpenShell's Supervisor running as a sidecar (or init container + persistent process) in the same pod. |
| 40 | + |
| 41 | +#### What Changes |
| 42 | + |
| 43 | +| Component | Current | With OpenShell | |
| 44 | +|-----------|---------|---------------| |
| 45 | +| `auth.py:populate_runtime_credentials()` | Sets `os.environ["GITHUB_TOKEN"] = real_token` | Sets `os.environ["GITHUB_TOKEN"] = "openshell:resolve:env:GITHUB_TOKEN"` | |
| 46 | +| Token files (`/tmp/.ambient_*`) | Contain real tokens | Contain placeholder strings | |
| 47 | +| Git credential helper | Reads real token from file | Reads placeholder; OpenShell proxy rewrites on outbound | |
| 48 | +| `gh` wrapper | Exports real `GH_TOKEN` | Exports placeholder; proxy rewrites | |
| 49 | +| Network egress | Direct to `api.github.com`, etc. | Via OpenShell HTTP CONNECT proxy at `10.200.0.1:3128` | |
| 50 | +| `secret_redaction.py` | Primary defense for output stream | Redundant but kept as defense-in-depth | |
| 51 | +| `_grpc_client.py` | Direct gRPC to API server | Whitelisted in network policy (intra-cluster) | |
| 52 | +| Claude CLI subprocess | Full env access with real secrets | Runs in sandbox netns with placeholders only | |
| 53 | + |
| 54 | +#### Implementation Steps |
| 55 | + |
| 56 | +**1. New OpenShell provider type** |
| 57 | + |
| 58 | +Register Ambient's credential store as an OpenShell provider. The Operator creates a provider config that maps each credential type (github, gitlab, jira, etc.) to the corresponding backend API credential endpoint. Two options: |
| 59 | + |
| 60 | +- OpenShell's Gateway calls the Ambient backend to fetch the real token on demand |
| 61 | +- The Operator pre-populates the provider at pod creation time (simpler, no Gateway dependency) |
| 62 | + |
| 63 | +**2. Modify `platform/auth.py`** |
| 64 | + |
| 65 | +Replace `populate_runtime_credentials()` with a version that writes placeholders instead of real values: |
| 66 | + |
| 67 | +```python |
| 68 | +# Before (current) |
| 69 | +os.environ["GITHUB_TOKEN"] = github_creds["token"] # real secret |
| 70 | +_GITHUB_TOKEN_FILE.write_text(github_creds["token"]) # real secret on disk |
| 71 | + |
| 72 | +# After (with OpenShell) |
| 73 | +os.environ["GITHUB_TOKEN"] = "openshell:resolve:env:GITHUB_TOKEN" # placeholder |
| 74 | +_GITHUB_TOKEN_FILE.write_text("openshell:resolve:env:GITHUB_TOKEN") # placeholder |
| 75 | +# Real secret held only in Supervisor memory → proxy rewrites on outbound |
| 76 | +``` |
| 77 | + |
| 78 | +The same pattern applies to all credential types: `GITLAB_TOKEN`, `JIRA_API_TOKEN`, `ANTHROPIC_API_KEY`, `CODERABBIT_API_KEY`, `KUBECONFIG`. |
| 79 | + |
| 80 | +**3. Modify the Dockerfile** |
| 81 | + |
| 82 | +Add OpenShell Supervisor binary. The runner entrypoint wraps with `openshell-sandbox`: |
| 83 | + |
| 84 | +```dockerfile |
| 85 | +# Add OpenShell binary |
| 86 | +COPY --from=openshell/supervisor:latest /usr/bin/openshell-sandbox /usr/bin/openshell-sandbox |
| 87 | + |
| 88 | +# Entrypoint becomes: |
| 89 | +CMD ["openshell-sandbox", "--provider", "ambient", "--", \ |
| 90 | + "/bin/bash", "-c", "umask 0022 && cd /app/ambient-runner && uvicorn main:app --host 0.0.0.0 --port 8001"] |
| 91 | +``` |
| 92 | + |
| 93 | +The Supervisor wraps the uvicorn process, applying Landlock + seccomp + netns before exec. |
| 94 | + |
| 95 | +**4. Network policy via OpenShell** |
| 96 | + |
| 97 | +Replace the K8s `NetworkPolicy` with OpenShell's per-sandbox network namespace + OPA policy: |
| 98 | + |
| 99 | +```yaml |
| 100 | +network_policies: |
| 101 | + ambient_backend: |
| 102 | + name: ambient-backend-access |
| 103 | + endpoints: |
| 104 | + - host: backend-service.ambient-code.svc.cluster.local |
| 105 | + port: 8080 |
| 106 | + protocol: rest |
| 107 | + access: read-write |
| 108 | + binaries: |
| 109 | + - { path: /usr/bin/python3 } |
| 110 | + |
| 111 | + ambient_grpc: |
| 112 | + name: ambient-grpc-access |
| 113 | + endpoints: |
| 114 | + - host: ambient-api-server.ambient-code.svc.cluster.local |
| 115 | + port: 9000 |
| 116 | + protocol: connect |
| 117 | + access: read-write |
| 118 | + binaries: |
| 119 | + - { path: /usr/bin/python3 } |
| 120 | + |
| 121 | + github_api: |
| 122 | + name: github-api-access |
| 123 | + endpoints: |
| 124 | + - host: api.github.com |
| 125 | + port: 443 |
| 126 | + protocol: rest |
| 127 | + access: read-write |
| 128 | + |
| 129 | + anthropic_api: |
| 130 | + name: anthropic-api-access |
| 131 | + endpoints: |
| 132 | + - host: api.anthropic.com |
| 133 | + port: 443 |
| 134 | + protocol: rest |
| 135 | + access: read-write |
| 136 | + |
| 137 | + gitlab_api: |
| 138 | + name: gitlab-api-access |
| 139 | + endpoints: |
| 140 | + - host: "*.gitlab.com" |
| 141 | + port: 443 |
| 142 | + protocol: rest |
| 143 | + access: read-write |
| 144 | +``` |
| 145 | +
|
| 146 | +**5. Modify `_grpc_client.py`** |
| 147 | + |
| 148 | +The gRPC channel to the API server needs to be whitelisted in OpenShell's network policy. Since it's intra-cluster, it routes through the proxy with credential rewriting. The `_build_channel()` function may need proxy-awareness if OpenShell's netns routes all TCP through the CONNECT proxy. |
| 149 | + |
| 150 | +**6. Modify `bridges/claude/bridge.py`** |
| 151 | + |
| 152 | +Set `HTTP_PROXY`/`HTTPS_PROXY` for the Claude CLI subprocess so it routes through the OpenShell proxy. OpenShell injects these automatically when the sandbox starts — the bridge needs to pass them through to the subprocess env. |
| 153 | + |
| 154 | +**7. Operator changes** |
| 155 | + |
| 156 | +The Operator (`components/operator/`) configures OpenShell provider + policy per session Job: |
| 157 | + |
| 158 | +- Inject OpenShell provider config as a ConfigMap or Secret |
| 159 | +- Mount the Supervisor binary (or use a sidecar container) |
| 160 | +- Generate per-session OPA policies based on the session's credential bindings |
| 161 | +- Pass the policy YAML as a volume mount |
| 162 | + |
| 163 | +#### Files to Modify |
| 164 | + |
| 165 | +| File | Change | |
| 166 | +|------|--------| |
| 167 | +| `platform/auth.py` | `populate_runtime_credentials()` writes placeholders, not real tokens | |
| 168 | +| `platform/auth.py` | Token files (`/tmp/.ambient_*`) get placeholder values | |
| 169 | +| `platform/auth.py` | `install_git_credential_helper()` — helper returns placeholder; proxy rewrites | |
| 170 | +| `platform/auth.py` | `install_gh_wrapper()` — wrapper exports placeholder `GH_TOKEN` | |
| 171 | +| `_grpc_client.py` | Proxy-aware channel construction for intra-cluster gRPC | |
| 172 | +| `Dockerfile` | Add OpenShell Supervisor binary, modify CMD | |
| 173 | +| `bridges/claude/bridge.py` | Proxy env vars for Claude CLI subprocess | |
| 174 | +| `middleware/secret_redaction.py` | Keep as defense-in-depth (now truly redundant) | |
| 175 | +| `components/operator/` | Configure OpenShell provider + policy per session Job | |
| 176 | + |
| 177 | +--- |
| 178 | + |
| 179 | +### Strategy 2: OpenShell as Pod Runtime (Operator-Level) |
| 180 | + |
| 181 | +The Operator spawns Jobs using an OpenShell-managed container runtime instead of raw K8s containers. The integration moves up a level — runner code doesn't change, but the Operator configures OpenShell as the execution environment. |
| 182 | + |
| 183 | +**Pros:** Zero runner code changes. |
| 184 | + |
| 185 | +**Cons:** Requires OpenShell's Kubernetes compute driver to be production-ready (currently alpha). Heavier Operator changes. Less control over per-session policy granularity from the runner's perspective. |
| 186 | + |
| 187 | +--- |
| 188 | + |
| 189 | +### Strategy 3: OpenShell Provider Bridge (Minimal, Credential-Only) |
| 190 | + |
| 191 | +Adopt only the credential placeholder/proxy pattern without the full sandbox. Write a thin Python adapter that: |
| 192 | + |
| 193 | +1. Starts a local HTTP CONNECT proxy in the runner pod |
| 194 | +2. Holds real secrets in proxy memory (separate process, higher privilege) |
| 195 | +3. Injects placeholders into `os.environ` |
| 196 | +4. Rewrites placeholders to real values on outbound requests |
| 197 | + |
| 198 | +**Pros:** No Rust dependency, no kernel features (Landlock/seccomp) needed. Works on any kernel version. Smallest change surface. |
| 199 | + |
| 200 | +**Cons:** No Landlock/seccomp/netns isolation — only credential isolation. Agent can still bypass the proxy if it makes raw socket calls (no network namespace enforcement). No L7 inspection or OPA policy evaluation. |
| 201 | + |
| 202 | +--- |
| 203 | + |
| 204 | +## Strategy Comparison |
| 205 | + |
| 206 | +| Criterion | Strategy 1 (Sidecar) | Strategy 2 (Pod Runtime) | Strategy 3 (Proxy Only) | |
| 207 | +|-----------|---------------------|------------------------|------------------------| |
| 208 | +| Credential isolation | Full (placeholder/proxy) | Full (placeholder/proxy) | Partial (no netns enforcement) | |
| 209 | +| Network isolation | Full (netns + iptables) | Full (netns + iptables) | None | |
| 210 | +| Filesystem isolation | Landlock LSM | Landlock LSM | None | |
| 211 | +| Syscall filtering | seccomp-BPF | seccomp-BPF | None | |
| 212 | +| L7 inspection (OPA) | Yes | Yes | No | |
| 213 | +| Runner code changes | Moderate (`auth.py`, `Dockerfile`) | None | Small (new proxy module) | |
| 214 | +| Operator changes | Moderate (provider + policy config) | Heavy (new compute driver) | None | |
| 215 | +| Kernel requirements | Linux 5.13+ (Landlock) | Linux 5.13+ (Landlock) | None | |
| 216 | +| OpenShell maturity dependency | Supervisor (stable) | K8s driver (alpha) | None (custom code) | |
| 217 | +| Defense depth | 5 layers | 5 layers | 1 layer | |
| 218 | + |
| 219 | +--- |
| 220 | + |
| 221 | +## Recommendation |
| 222 | + |
| 223 | +**Strategy 1 (Sidecar Supervisor)** is the right path. It provides: |
| 224 | + |
| 225 | +- Agent never sees real secrets (even `/proc/self/environ` inspection fails) |
| 226 | +- L7 inspection via OPA policies (audit which APIs the agent calls) |
| 227 | +- Landlock + seccomp hardening within the container |
| 228 | +- Binary identity via SHA256 TOFU (only known binaries can make network calls) |
| 229 | +- The existing `secret_redaction.py` becomes a true defense-in-depth layer rather than the primary defense |
| 230 | + |
| 231 | +The critical architectural insight: OpenShell's credential proxy pattern eliminates the single point of failure in the current design. Today, `populate_runtime_credentials()` puts real secrets into a space the agent fully controls. OpenShell moves real secrets into Supervisor memory — a separate privilege domain the agent cannot access. |
| 232 | + |
| 233 | +### Prerequisite: Kernel Version |
| 234 | + |
| 235 | +OpenShell's Landlock LSM requires Linux 5.13+. The runner containers run on UBI 10 (RHEL 10), which ships kernel 6.x — this is satisfied. OpenShell's `best_effort` Landlock mode also provides graceful degradation if the kernel lacks support. |
| 236 | + |
| 237 | +### Migration Path |
| 238 | + |
| 239 | +1. **Phase 1 — Credential proxy only (Strategy 3):** Ship a Python-only credential proxy as a proof of concept. Validates the placeholder/rewrite pattern works with git credential helper, `gh` wrapper, and Claude CLI without requiring OpenShell binary. |
| 240 | + |
| 241 | +2. **Phase 2 — Sidecar Supervisor (Strategy 1):** Add OpenShell Supervisor binary, network namespace isolation, Landlock, and seccomp. This is the production target. |
| 242 | + |
| 243 | +3. **Phase 3 — OPA policies:** Add L7 inspection with per-session OPA policies generated by the Operator from the session's credential bindings and project settings. |
0 commit comments