You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Deliver a reviewed managed-SCM containment walking skeleton for Cloud Agent sandboxes: use one catch-all outbound handler, preserve managed GitHub support with default-HTTPS LFS repository-control validation, add GitLab HTTPS support without host preregistration, and enable DIND only after proving nested routing and propagation of Cloudflare's private runtime HTTPS-interception CA/trusted bundle.
6
+
7
+
## Current State
8
+
9
+
- The walking skeleton is implemented, independently reviewed task by task, fixed where required, and locally validated.
10
+
- The walking skeleton is committed as `ab4fe320e`. The standalone `services/git-session-proxy` foundation is parked on the `quilled-meteoroid` worktree for possible later relay hardening; this active branch removes it and relies on Cloudflare outbound HTTP(S) interception rather than proxy-service wiring.
11
+
- Deployment order is mandatory: provision the SCM capability secret, deploy `git-token-service`, then deploy `cloud-agent-next`.
12
+
13
+
## Implemented Architecture
14
+
15
+
Eligible sandboxes use one HTTP(S) boundary:
16
+
17
+
```ts
18
+
Sandbox.outbound=handleManagedScmOutbound;
19
+
```
20
+
21
+
| Request class | Implemented behavior |
22
+
|---|---|
23
+
| Unmatched request | Pass through unchanged. |
24
+
| Recognized Kilo capability carrier | Redeem server-side; fail closed when invalid, including malformed whitespace/tab carrier cases. |
25
+
| Redeemed managed request | Replace sandbox-visible capability auth with redeemed provider auth outside the sandbox. |
26
+
| Redirect from redeemed request | Follow manually so managed auth is forwarded only after target validation. |
27
+
| Cross-provider or unsupported recognized carrier | Fail closed rather than falling back to raw forwarding. |
28
+
29
+
Provider-issued signed LFS action URLs and headers intentionally remain visible to the sandbox in this skeleton.
30
+
31
+
## Implemented Provider Coverage
32
+
33
+
| Surface | GitHub | GitLab |
34
+
|---|---|---|
35
+
| Capability |`kgh1.<opaque>` marker with one-hour encrypted claims. |`kgl1.<opaque>` marker with one-hour encrypted claims, separate GitLab purpose, and the shared encryption secret. |
36
+
| Origin | Existing GitHub origins. |`gitlab.com` and active self-managed standard HTTPS integration origins on port `443`. |
37
+
| Repository path | Exact GitHub repository validation. | Exact nested namespace project validation, for example `group/subgroup/project`. |
| LFS control | Repository-bound `POST .../.git/info/lfs/objects/batch` and `POST .../.git/info/lfs/locks/verify`. | Repository-bound batch and lock verification. |
40
+
| CLI API | Existing broad `api.github.com` compatibility for `gh`. | Broad `/api/v4/**` and `/api/graphql` compatibility for `glab`. |
41
+
| Managed auth rewrite | Redeemed GitHub auth. | Basic, Bearer, and `PRIVATE-TOKEN` rewriting for managed OAuth/PAT auth. |
42
+
| Explicit profile token | Outside managed containment. | Pass through unchanged as intentional user-controlled auth. |
43
+
44
+
GitLab integration handling uses sanitized refresh logging and per-use database clients. Eligible GitLab session preparation emits a canonical `.git` remote URL, trusted `GITLAB_HOST`, and a capability-backed `GITLAB_TOKEN`; raw managed-auth fallback has been removed.
45
+
46
+
## Capability Marker Decision
47
+
48
+
| Provider | Capability marker |
49
+
|---|---|
50
+
| GitHub |`kgh1.<opaque>`|
51
+
| GitLab |`kgl1.<opaque>`|
52
+
53
+
The short prefix routes the provider codec, fails closed for unsupported formats, and versions the marker. It is not the security boundary: authenticated AES-GCM claims remain authoritative. The `kgh1.` / `kgl1.` rollout intentionally invalidates previously issued verbose-marker capabilities. Coordinate rollout in the required order - provision the SCM capability secret, deploy `git-token-service`, then deploy `cloud-agent-next` - or accept up to one hour of transient failures for in-flight old capabilities.
54
+
55
+
Fresh capabilities are issued on every dispatched message or command. Remotes and environment are refreshed before prompt delivery, so timer refresh is unnecessary for the skeleton. Only autonomous turns or terminal usage extending beyond one hour remain edge cases.
56
+
57
+
## DIND Result
58
+
59
+
The nested-DIND real-Git rewrite probe proved that `--network=host` supplies routing to the catch-all boundary and that nested devcontainers require propagation of Cloudflare's private runtime HTTPS-interception CA/trusted bundle.
60
+
61
+
`SandboxDIND` catch-all interception is enabled. Managed GitHub and GitLab DIND preparation/wrapper paths use capabilities. Devcontainer setup copies the outer trusted CA bundle to a stable session-home path and injects trust environment variables. This does not imply that provider certificates are the production issue or that the runtime interception certificate is necessarily self-signed. The local missing-bundle negative control empirically returned a TLS rejection matching `server certificate verification failed|SSL certificate problem|certificate verify failed|self-signed certificate in certificate chain`; preserve that as observed probe output rather than a production certificate diagnosis. Probes clean up invocation artifacts.
62
+
63
+
## Completion Record
64
+
65
+
| Gate | Status | Evidence |
66
+
|---|---|---|
67
+
| Task 1: catch-all and GitHub LFS | Complete, reviewed, fixed | Catch-all `Sandbox.outbound`; GitHub LFS batch and lock verification; fail-closed recognized carriers including whitespace/tab; signed actions remain sandbox-visible. |
| Capability markers | Approved | GitHub `kgh1.<opaque>`; GitLab `kgl1.<opaque>`; short routing/fail-closed/version prefix only; AES-GCM claims remain authoritative; dispatch refresh makes one-hour expiry a long-running edge case. |
71
+
| Task 4/4b: DIND | Complete, reviewed | Probe proved host-network routing and nested propagation of Cloudflare's private runtime HTTPS-interception CA/trusted bundle; `SandboxDIND` catch-all enabled; GitHub/GitLab DIND paths use capabilities; devcontainer trust injection and probe cleanup implemented. |
72
+
| Final validation | Complete | Token service `102` tests; Cloud Agent `1545` passed, `3` skipped; changed-package typecheck `10/53`; both probes passed; whitespace clean; no review blockers remain. |
73
+
74
+
## Validation Caveat
75
+
76
+
Full Cloud Agent wrapper validation encountered an unchanged committed baseline timing-sensitive flake in `wrapper/src/lifecycle.test.ts`: `clears aborted state when activity cancels an aborted drain`. Its fixed `50 ms` wait races a real branch subprocess. Marker-focused and package checks pass. Track wrapper test stabilization separately rather than bundling it into the SCM diff.
77
+
78
+
## Containment Claims
79
+
80
+
| Path | Skeleton claim |
81
+
|---|---|
82
+
| Managed GitHub, eligible sandbox including DIND | Contained for recognized capability-bearing smart HTTP, broad `gh` API, and repository-bound LFS control requests. |
83
+
| Managed GitLab OAuth/PAT, eligible sandbox including DIND | Contained for recognized capability-bearing smart HTTP, broad `glab` API, and repository-bound LFS control requests on claimed standard HTTPS origins. |
84
+
| Provider-issued signed LFS actions | Not contained; action URLs and provider headers remain sandbox-visible. |
85
+
| Explicit profile tokens | Not contained; intentional pass-through. |
86
+
87
+
## Follow-up Discussions
88
+
89
+
These are explicit follow-ups, not blockers for this walking skeleton:
90
+
91
+
| Area | Follow-up |
92
+
|---|---|
93
+
| Provider-signed LFS actions | Use the standalone relay parked on `quilled-meteoroid` later if a stronger boundary is required. |
94
+
| Self-managed GitLab origins | Add SSRF hardening and admin allowlisting for active-integration approval. |
| Capability continuity | Add refresh within long-running autonomous turns or terminal sessions that outlive the one-hour capability lifetime; dispatched messages and commands already refresh remotes and environment before prompt delivery. |
99
+
| Wrapper test stability | Stabilize the unchanged baseline timing-sensitive lifecycle test separately from the SCM diff. |
100
+
| Capability carriers | Harden query/body carrier handling. |
101
+
| Nested trust | Cover propagation of Cloudflare's private runtime HTTPS-interception CA/trusted bundle into Dockerfile build stages, unusual custom images/trust stores, CA rotation, and non-host nested networks. |
102
+
| Cleanup behavior | Cover abrupt cleanup. The stale `dev/local` standalone proxy WIP is isolated on `quilled-meteoroid`. |
0 commit comments