Skip to content

Commit b170343

Browse files
author
Daniel
committed
adding deployment-layers docs
1 parent aaad305 commit b170343

2 files changed

Lines changed: 301 additions & 0 deletions

File tree

go/docs/deployment-layers.md

Lines changed: 301 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,301 @@
1+
# Deployment layers: from workflow requirements to running code
2+
3+
A deployed workflow is **binding-free**: it declares *what* it needs (a GPIO input,
4+
an MQTT topic, a custom model) but says nothing about *where* those live on this
5+
particular device or network. Turning that abstract requirement into a live driver
6+
handle, an open broker connection, or a registered LLM endpoint is the job of the
7+
deploy plumbing. It happens in three layers, joined by two mappings:
8+
9+
```
10+
┌───────────────────────────────────────────────────────────────────────────┐
11+
│ LAYER 1 — Workflow requirements │
12+
│ Channels[] + Models[] (declared in the workflow, keyed by logical id) │
13+
│ "I need a GPIO input `door_sensor`, an MQTT channel `alarm`, model `mistral-7b`" │
14+
└───────────────────────────────────────────────────────────────────────────┘
15+
16+
│ DeploymentMapping : logical id ─► ResourceBinding{ ref, index? }
17+
│ (one entry per channel id and per declared model id)
18+
19+
┌───────────────────────────────────────────────────────────────────────────┐
20+
│ LAYER 2 — Resolved configs (the "where") │
21+
│ │
22+
│ DeviceManifest ◄── boot-time, device-owned, NOT swappable │
23+
│ gpios/adcs/dacs/serials/pwms : ref ─► {chip|device|port} │
24+
│ │
25+
│ ExternalResources ◄── deploy-time, delivered with each /deploy │
26+
│ MQTTs : ref ─► MQTTConnection {brokerUrl, prefixes, will, ...} │
27+
│ Providers : ref ─► LLMProviderConfig {url, apiKey, model} │
28+
└───────────────────────────────────────────────────────────────────────────┘
29+
30+
│ engine registries (built once per boot / per deploy)
31+
32+
┌───────────────────────────────────────────────────────────────────────────┐
33+
│ LAYER 3 — Code implementations │
34+
│ driver.Registry : ref ─► GPIODriver / ADCDriver / ... (opened at boot) │
35+
│ transport.Registry: ref ─► MQTTTransport (paho conn, opened at deploy) │
36+
│ llmproxy.Client : modelID ─► selfhosted.Provider (registered at deploy) │
37+
└───────────────────────────────────────────────────────────────────────────┘
38+
```
39+
40+
The two arrows are the whole story:
41+
42+
1. **DeploymentMapping** binds each *logical id* the workflow declares to a *platform
43+
resource id* (`ref`), plus an optional physical sub-address (`index`).
44+
2. **Engine registries** turn each platform resource id (and its config) into a live
45+
code object.
46+
47+
`ref` is a *sharing identity*: many workflow channels can point at the same `ref`,
48+
and the engine opens that driver / transport exactly once and shares the handle.
49+
50+
---
51+
52+
## Layer 1 — what the workflow declares
53+
54+
A workflow carries two binding-free requirement lists (`contract/workflow.yaml`):
55+
56+
**`Channels[]`** — a discriminated union (`type`) of hardware and transport needs.
57+
Each channel has a logical `id` and type-specific config that is *intrinsic to the
58+
workflow*, not to the device:
59+
60+
| Channel type | Logical config (workflow-owned) | Needs `index`? | Resource pool |
61+
|--------------|----------------------------------------|:--------------:|--------------------|
62+
| `GPIOIN` | `bias`, `debounceMs` | yes (line) | DeviceManifest |
63+
| `GPIOOUT` || yes (line) | DeviceManifest |
64+
| `ADC` || yes (channel) | DeviceManifest |
65+
| `DAC` || yes (channel) | DeviceManifest |
66+
| `PWM` | `frequency` | yes (channel) | DeviceManifest |
67+
| `UART` || no | DeviceManifest |
68+
| `MQTT` | `topic` | no | ExternalResources |
69+
70+
**`Models[]`** — declared custom/self-hosted `LLMModel`s (`id`, `label`,
71+
`capabilities`). Static catalog models (built into the llmproxy) are referenced by
72+
id from agent nodes and need **no** declaration here; only custom models are listed,
73+
because only they need a deploy-time endpoint.
74+
75+
The split is deliberate: `frequency`, `bias`, `topic`, `capabilities` describe *the
76+
workflow's intent* and travel with it everywhere. The physical pin, the broker URL,
77+
the inference endpoint are *environment facts* and are supplied separately.
78+
79+
---
80+
81+
## The join — DeploymentMapping & ResourceBinding
82+
83+
`contract/engine.yaml``go/engine/manifest.go`:
84+
85+
```go
86+
type DeploymentMapping map[string]ResourceBinding // keyed by workflow logical id
87+
88+
type ResourceBinding struct {
89+
Ref string `json:"ref"` // shared platform resource id
90+
Index *int `json:"index,omitempty"` // physical sub-address; nil for UART/MQTT/model
91+
}
92+
```
93+
94+
One entry per declared channel id **and** per declared model id. The pool a `ref`
95+
resolves against is **not** stored in the binding — it is implied by the *type of the
96+
workflow resource* with that id:
97+
98+
- a hardware channel's `ref` → a key in the boot **DeviceManifest**;
99+
- an MQTT channel's `ref` → a key in the deploy **ExternalResources.MQTTs**;
100+
- a declared model's `ref` → a key in the deploy **ExternalResources.Providers**.
101+
102+
`index` is the per-channel physical line/channel number *within* the bound driver
103+
instance. This is why a single `gpiochip0` driver (`ref`) can back many GPIO
104+
channels — each with a distinct `index`.
105+
106+
> **Completeness is enforced at deploy, not at runtime.** A channel with no mapping
107+
> entry, an addressable channel with a nil `index`, or a model bound to a `ref` that
108+
> has no config are all hard build failures — see "Validation" below. Silent
109+
> degradation would hide config bugs until a node fires hours later.
110+
111+
---
112+
113+
## Layer 2 — the two config sources
114+
115+
There are two config sources with **different lifecycles and ownership**, which is
116+
the reason they are separate artifacts:
117+
118+
### DeviceManifest — boot-time, device-owned
119+
120+
`go/engine/manifest.go`. The hardware physically present on this device. Loaded once
121+
at engine **boot** from a local file, reported to the backend in the
122+
`AgentBootCallback`, and used to open the driver registry. It does **not** change on
123+
deploy — swapping a workflow never re-opens GPIO chips.
124+
125+
```go
126+
type DeviceManifest struct {
127+
GPIOs map[string]GPIOConfig // id ─► {Chip: "/dev/gpiochip0"}
128+
ADCs map[string]ADCConfig // id ─► {Device: "/sys/bus/iio/devices/iio:device0"}
129+
DACs map[string]DACConfig // id ─► {Device: ".../iio:device1"}
130+
Serials map[string]SerialConfig // id ─► {Port: "/dev/ttyUSB0", Baud: 115200}
131+
PWMs map[string]PWMConfig // id ─► {Chip: "/sys/class/pwm/pwmchip0"}
132+
}
133+
```
134+
135+
### ExternalResources — deploy-time, swappable
136+
137+
`go/engine/manifest.go`. Delivered in the body of every `POST /deploy` alongside the
138+
workflow and mapping. These are the configs that *can* differ per deploy and are not
139+
owned by the device:
140+
141+
```go
142+
type ExternalResources struct {
143+
MQTTs map[string]MQTTConnection // ref ─► broker connection
144+
Providers map[string]LLMProviderConfig // ref ─► self-hosted LLM endpoint
145+
}
146+
```
147+
148+
`MQTTConnection` carries `brokerUrl`, optional credentials, the `publishPrefix` /
149+
`subscribePrefix` the engine prepends to workflow topics, and an optional last-will.
150+
`LLMProviderConfig` carries `url` + optional `apiKey`; the model's `id` and
151+
`capabilities` come from the workflow's `LLMModel` declaration and are **not**
152+
repeated here.
153+
154+
`ExternalResourceConfig` is a tagged union discriminated by `type`
155+
(`mqtt` | `selfhosted`); new external-resource kinds extend that `oneOf`.
156+
157+
---
158+
159+
## Layer 3 — the registries that produce code
160+
161+
Each pool has a registry that maps `ref` → a live implementation. Both `Registry`
162+
types follow the same discipline: open everything up front, and on any partial
163+
failure close what was opened so callers never see a half-built registry.
164+
165+
### driver.Registry — built at boot
166+
167+
`go/engine/driver/registry.go`. `NewRegistry(*DeviceManifest)` opens one driver per
168+
manifest entry, **typed per family** so a miswired manifest (a GPIO id looked up as
169+
an ADC) fails at registration, not first use:
170+
171+
```go
172+
func (r *Registry) GPIO(id string) (GPIODriver, error)
173+
func (r *Registry) ADC(id string) (ADCDriver, error)
174+
func (r *Registry) DAC(id string) (DACDriver, error)
175+
func (r *Registry) PWM(id string) (PWMDriver, error)
176+
func (r *Registry) Serial(id string) (SerialDriver, error)
177+
```
178+
179+
This registry lives on the long-lived `Builder` (`go/engine/build/build.go`) and is
180+
reused across every deploy.
181+
182+
### transport.Registry — built per deploy
183+
184+
`go/engine/transport/registry.go`. `NewRegistry(*ExternalResources)` opens one paho
185+
MQTT connection per `ext.MQTTs` entry. Constructed fresh for each deploy, closed and
186+
replaced on the next one; ownership transfers to the `Runner`.
187+
188+
### llmproxy.Client — composed per deploy
189+
190+
`go/engine/build/llm.go` + `go/llmproxy`. `buildDeployProviders` walks `wf.Models`,
191+
resolves each via the mapping to a `Providers[ref]` config, and packs them into a
192+
single `selfhosted.Provider`. `Build` then composes that with the boot provider set
193+
into a fresh per-deploy `llmproxy.Client`. The provider for a chat call is resolved
194+
implicitly from the **model id** — there is no client-level default.
195+
196+
---
197+
198+
## The full resolution walk
199+
200+
Tracing one deploy through `go/engine/build/`:
201+
202+
1. **`Engine.Deploy(wf, dm, ext)`** (`engine.go`) builds the new runner *before*
203+
tearing down the old one — a config bug keeps the previous workflow serving
204+
instead of dropping the engine to idle.
205+
206+
2. **`Builder.Build`** (`build.go`):
207+
- `buildDeployProviders(wf, dm, ext)` → resolve declared models → per-deploy
208+
LLM client; `validateModelsResolvable` fails fast if an agent node references a
209+
model no provider can serve.
210+
- `transport.NewRegistry(ext)` → open all MQTT connections.
211+
212+
3. **`buildChannels(wf.Channels, dm, drivers, transports, ext)`** (`channel.go`) —
213+
the heart of the join. For each declared channel, by type:
214+
215+
```
216+
GPIOIN "door_sensor"
217+
├─ bindingFor(dm, "door_sensor") → ResourceBinding{ref:"gpiochip0", index:17}
218+
├─ indexFor(b, "door_sensor") → 17 (nil index = error)
219+
├─ drivers.GPIO("gpiochip0") → GPIODriver (not registered = error)
220+
└─ &channel.GPIOInput{Driver, Line:17, Bias, DebounceMs}
221+
222+
MQTT "alarm"
223+
├─ bindingFor(dm, "alarm") → ResourceBinding{ref:"site-broker"}
224+
├─ ext.MQTTs["site-broker"] → MQTTConnection (missing = error)
225+
├─ transports.MQTT("site-broker") → MQTTTransport
226+
└─ &channel.MQTT{Transport, Topic, PublishPrefix, SubscribePrefix}
227+
```
228+
229+
4. **`chs.SetupAll()`** runs after all nodes are built, applying each channel's
230+
accumulated requirements to its driver once (bias, PWM frequency, opening
231+
subscriptions).
232+
233+
Nodes look up their linked channel in the per-build typed `channels` registry by
234+
logical id and hold the pointer; every node referencing the same id shares one
235+
instance, so subscriber lists and driver reservations stay consistent.
236+
237+
### Validation (deploy-time, fail-fast)
238+
239+
| Failure | Where |
240+
|------------------------------------------------------|--------------------------------|
241+
| channel id has no mapping entry / empty `ref` | `bindingFor` (`channel.go`) |
242+
| addressable channel has nil `index` | `indexFor` (`channel.go`) |
243+
| hardware `ref` not in driver registry | `drivers.GPIO/ADC/...` |
244+
| MQTT `ref` not in `ext.MQTTs` | `buildChannels` MQTT arm |
245+
| model declared but not bound by the mapping | `buildDeployProviders` |
246+
| model bound to a `ref` with no provider config | `buildDeployProviders` |
247+
| agent node references an unservable model | `validateModelsResolvable` |
248+
249+
---
250+
251+
## A note on RAG / memory
252+
253+
`contract/engine.yaml` describes a fourth resource class — RAG memory resolving
254+
"against the boot-configured backend (the ref is the collection id)". In the current
255+
engine that binding does **not** flow through `DeploymentMapping`: the `Retriever`
256+
backend is injected into the `Builder` at boot, and a `RetrieverNode` references its
257+
collection directly via `arguments.memoryReference` (`go/engine/build/graph.go`).
258+
Treat memory as boot-bound for now; if a deploy wizard surfaces it, model it as a
259+
boot/backend concern rather than a per-deploy `ResourceBinding`.
260+
261+
---
262+
263+
## Designing a deploy wizard
264+
265+
The three layers map almost directly onto wizard stages. The wizard's job is to
266+
**produce a complete `DeploymentMapping` + `ExternalResources`** for a given
267+
workflow against a given device/environment.
268+
269+
1. **Read the requirements (Layer 1).** Parse the workflow's `Channels[]` and
270+
`Models[]`. This is the exact, finite checklist the wizard must satisfy — one
271+
row per logical id. The channel `type` tells you which pool and whether an
272+
`index` is required (see the Layer-1 table).
273+
274+
2. **Offer bindings from the right pool (the join).** For each requirement, the
275+
candidate `ref`s come from a *type-specific* pool:
276+
- hardware channel → keys of the matching `DeviceManifest` family (already known
277+
from the boot callback the backend stored);
278+
- MQTT channel → existing/new MQTT connection definitions;
279+
- declared model → existing/new self-hosted endpoint definitions.
280+
281+
For addressable hardware, also collect the `index` (the physical line/channel).
282+
Surface sharing explicitly: many channels may legitimately pick the same `ref`.
283+
284+
3. **Collect configs for newly-referenced resources (Layer 2).** Any `ref` the user
285+
picks that isn't device-owned needs an `ExternalResources` entry: broker URL +
286+
prefixes + credentials for MQTT, endpoint URL + key for a model. Device-owned
287+
refs need nothing — their config is already in the boot manifest.
288+
289+
4. **Validate before submit.** Re-run the deploy-time checks client-side so the user
290+
fixes gaps in the wizard, not via a 422 from `/deploy`: every channel mapped,
291+
every addressable channel indexed, every model bound to a configured provider.
292+
The table under "Validation" is the authoritative checklist.
293+
294+
5. **Emit the deploy.** The wizard's output is exactly a `DeployRequest`:
295+
`{ workflow, mapping, externalResources }`. Layer 3 (the registries and live
296+
handles) is entirely the engine's concern — the wizard never touches it.
297+
298+
Design implication: the wizard only ever authors **Layer 2 facts and the join**. It
299+
should treat the boot `DeviceManifest` as read-only ground truth (what hardware
300+
exists) and the workflow as a read-only requirement list (what's needed); everything
301+
it writes is the binding between them plus the swappable external configs.
File renamed without changes.

0 commit comments

Comments
 (0)