Skip to content

Commit d8a390a

Browse files
committed
docs(plans): multi-node roadmap for the volume-permission direction
Records the options ladder for 'nodes joining my cluster' (far-future, not scheduled) so near-term permission decisions don't foreclose it: the host-FS-canonical coupling and its four conditions, the agent-homes-only scope, what breaks on day one of a join (WaitForFirstConsumer + no nodeSelector in any render), options 0-4 (home-node pattern -> NFS -> distributed storage -> API-mediated access -> hybrid by data class), join mechanics (native k3s server + remote agents; macOS stays k3d), and the decisions that bind today: group-1000 sharing over render-time UID matching, home-node pinning as a join prerequisite, inputs to API objects per Remaining Debt.
1 parent c667432 commit d8a390a

1 file changed

Lines changed: 104 additions & 0 deletions

File tree

plans/volume-permission-hardening.md

Lines changed: 104 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -96,3 +96,107 @@ to Kubernetes-native inputs without making editable skills/workspaces immutable:
9696
Until that migration is done, do not reintroduce global provisioner chown,
9797
root init containers, or UID 1000 alignment as the primary fix for local-path
9898
PVC permissions.
99+
100+
## Roadmap: multi-node ("nodes joining my cluster")
101+
102+
Not scheduled — far-future. Recorded here so near-term permission decisions
103+
don't foreclose it.
104+
105+
### The coupling we accepted
106+
107+
"Host FS is canonical" is an identity: one directory is simultaneously a
108+
plain host path the CLI writes (`SeedHostFiles`, `syncRuntimeFiles`, wallet
109+
staging, flow-16 host asserts) AND the backing store of a PVC a pod mounts.
110+
The identity holds only while ALL of these are true:
111+
112+
1. The PV is path-addressable on a node (local-path/hostPath, not block or
113+
network storage).
114+
2. The node's FS is the host's FS (k3d bind-mount of `OBOL_DATA_DIR`, or
115+
native k3s where host == node).
116+
3. There is effectively one node, so "the node that has the directory" and
117+
"the node the pod runs on" cannot diverge.
118+
4. Host and container share UID/GID semantics (same kernel; on macOS faked
119+
by Docker Desktop file sharing).
120+
121+
A second node breaks (2) and (3) independently. Every permission mechanism
122+
in this plan (group sharing, fsGroup walks, the removed chowns) operates on
123+
(4) and silently assumes (1)–(3).
124+
125+
### Scope of the coupling — agent homes only
126+
127+
| Data | Host access needed? |
128+
|---|---|
129+
| hermes-data / agent homes (config, SOUL.md, skills) | Yes — the product promise |
130+
| OpenClaw skills | Yes |
131+
| x402-buyer-state (consumed.json) | No — pod-private |
132+
| Chain data (reth, aztec) | No — pod-private |
133+
| Wallet keystores | Only at creation; Remaining Debt moves them to Secrets |
134+
135+
Scaling out means shrinking the host-canonical surface to the first two
136+
rows, not replicating it onto every node.
137+
138+
### What breaks on day one of a join (as of this writing)
139+
140+
- `local-path` is `volumeBindingMode: WaitForFirstConsumer` and NO render
141+
(hermes.go, agent_render.go, llm.yaml) sets a nodeSelector/affinity: a new
142+
agent's pod can schedule on node B, the PV is provisioned on node B's
143+
disk, while `SeedHostFiles` writes to `$DATA_DIR` on the home host. The
144+
pod boots against an empty home with no error anywhere.
145+
- `ensureVolumeWritable` is a `docker exec` into the k3d node container —
146+
no transport to a remote node (already early-returns on the k3s backend).
147+
- Existing PVs pin pods to their node forever via nodeAffinity, so wrong
148+
first placements are sticky.
149+
150+
### Options ladder (increasing decoupling)
151+
152+
0. **Home-node pattern (recommended v1, prerequisite for any join path).**
153+
Label the home node (`obol.org/home=true`), render nodeSelector into
154+
every host-canonical workload (hermes master, CRD agents, litellm/buyer).
155+
Joined nodes take stateless or pod-private work only: vLLM/Ollama
156+
upstreams, network nodes (the biggest storage consumers, zero host
157+
visibility needed), demo servers. Agents cannot migrate; home node is
158+
the SPOF; cheap and non-breaking.
159+
1. **Host exports `$DATA_DIR` over NFS** (csi-driver-nfs), agent-home PVCs
160+
become RWX network mounts; files still physically live on the host so
161+
direct editing keeps working. `all_squash,anonuid=1000` solves ownership
162+
flapping at the protocol level. Hard caveat: Hermes `state.db` is SQLite
163+
— SQLite over NFS is a corruption hazard. Workable shape is inputs over
164+
NFS + state.db on a node-local PV, which is already half of option 3.
165+
2. **Distributed storage (Longhorn/Rook).** Solves migration, destroys host
166+
access entirely, far too heavy for local-first. Ruled out except as the
167+
storage class users bring on managed k8s.
168+
3. **API-mediated host access (the #610 direction, revisited).** Inputs
169+
(config, SOUL.md, skills, markers) as ConfigMaps/Secrets/OCI artifacts —
170+
delivered to any node, checksum-rolled; state as pod-private PVs on any
171+
provisioner; host access becomes a verb (`obol agent fs ls|cat|edit|cp`
172+
over kubectl exec/cp or a sidecar) instead of a shared mount. Survives
173+
arbitrary topology and PSS-restricted namespaces. Cost: live-editing a
174+
skill needs a sync round-trip instead of `:w` — UX problem, solvable
175+
with a `--watch` loop; this is why #610 was reverted and why it comes
176+
back when (3) in the coupling list stops being true.
177+
4. **Hybrid by data class (target end-state).** Operator-authored inputs →
178+
API objects; machine state → pod-private PVs; human-inspectable outputs
179+
`obol agent fs` or a write-once RWX exports share. Single-node k3d
180+
keeps the local-path fast path as an optimization, not as the contract.
181+
182+
### Join mechanics
183+
184+
k3d `node create` only adds agent containers on the same host. The real
185+
multi-node story is the native k3s backend: Linux home runs `k3s server`
186+
(host FS == node FS, home-node pattern costs nothing), remote boxes join
187+
with `k3s agent --server ... --token ...`. macOS stays single-node k3d;
188+
remote GPU capacity is better reached as an external endpoint
189+
(`obol model setup custom --endpoint http://gpu-box:8000/v1`), which the
190+
stack already supports and which sidesteps this entire section.
191+
192+
### Decisions binding today
193+
194+
- Prefer pure group-1000 sharing (setgid dirs, g+rw, nobody chowns owners)
195+
over render-time `os.Getuid()` UID matching: group sharing is
196+
topology-neutral; UID matching bakes one machine's identity into
197+
manifests and deepens the coupling.
198+
- Before any join path ships: home-node nodeSelector rendering + a
199+
join-time preflight that names the host-canonical volumes pinned to the
200+
home node.
201+
- Input migration to API objects proceeds on the Remaining Debt schedule
202+
above; it is also the multi-node prerequisite.

0 commit comments

Comments
 (0)