|
| 1 | +# Deployment layers: from workflow requirements to running code |
| 2 | + |
| 3 | +A deployed workflow is **binding-free**: it declares *what* it needs (a GPIO input, |
| 4 | +an MQTT topic, a custom model) but says nothing about *where* those live on this |
| 5 | +particular device or network. Turning that abstract requirement into a live driver |
| 6 | +handle, an open broker connection, or a registered LLM endpoint is the job of the |
| 7 | +deploy plumbing. It happens in three layers, joined by two mappings: |
| 8 | + |
| 9 | +``` |
| 10 | +┌───────────────────────────────────────────────────────────────────────────┐ |
| 11 | +│ LAYER 1 — Workflow requirements │ |
| 12 | +│ Channels[] + Models[] (declared in the workflow, keyed by logical id) │ |
| 13 | +│ "I need a GPIO input `door_sensor`, an MQTT channel `alarm`, model `mistral-7b`" │ |
| 14 | +└───────────────────────────────────────────────────────────────────────────┘ |
| 15 | + │ |
| 16 | + │ DeploymentMapping : logical id ─► ResourceBinding{ ref, index? } |
| 17 | + │ (one entry per channel id and per declared model id) |
| 18 | + ▼ |
| 19 | +┌───────────────────────────────────────────────────────────────────────────┐ |
| 20 | +│ LAYER 2 — Resolved configs (the "where") │ |
| 21 | +│ │ |
| 22 | +│ DeviceManifest ◄── boot-time, device-owned, NOT swappable │ |
| 23 | +│ gpios/adcs/dacs/serials/pwms : ref ─► {chip|device|port} │ |
| 24 | +│ │ |
| 25 | +│ ExternalResources ◄── deploy-time, delivered with each /deploy │ |
| 26 | +│ MQTTs : ref ─► MQTTConnection {brokerUrl, prefixes, will, ...} │ |
| 27 | +│ Providers : ref ─► LLMProviderConfig {url, apiKey, model} │ |
| 28 | +└───────────────────────────────────────────────────────────────────────────┘ |
| 29 | + │ |
| 30 | + │ engine registries (built once per boot / per deploy) |
| 31 | + ▼ |
| 32 | +┌───────────────────────────────────────────────────────────────────────────┐ |
| 33 | +│ LAYER 3 — Code implementations │ |
| 34 | +│ driver.Registry : ref ─► GPIODriver / ADCDriver / ... (opened at boot) │ |
| 35 | +│ transport.Registry: ref ─► MQTTTransport (paho conn, opened at deploy) │ |
| 36 | +│ llmproxy.Client : modelID ─► selfhosted.Provider (registered at deploy) │ |
| 37 | +└───────────────────────────────────────────────────────────────────────────┘ |
| 38 | +``` |
| 39 | + |
| 40 | +The two arrows are the whole story: |
| 41 | + |
| 42 | +1. **DeploymentMapping** binds each *logical id* the workflow declares to a *platform |
| 43 | + resource id* (`ref`), plus an optional physical sub-address (`index`). |
| 44 | +2. **Engine registries** turn each platform resource id (and its config) into a live |
| 45 | + code object. |
| 46 | + |
| 47 | +`ref` is a *sharing identity*: many workflow channels can point at the same `ref`, |
| 48 | +and the engine opens that driver / transport exactly once and shares the handle. |
| 49 | + |
| 50 | +--- |
| 51 | + |
| 52 | +## Layer 1 — what the workflow declares |
| 53 | + |
| 54 | +A workflow carries two binding-free requirement lists (`contract/workflow.yaml`): |
| 55 | + |
| 56 | +**`Channels[]`** — a discriminated union (`type`) of hardware and transport needs. |
| 57 | +Each channel has a logical `id` and type-specific config that is *intrinsic to the |
| 58 | +workflow*, not to the device: |
| 59 | + |
| 60 | +| Channel type | Logical config (workflow-owned) | Needs `index`? | Resource pool | |
| 61 | +|--------------|----------------------------------------|:--------------:|--------------------| |
| 62 | +| `GPIOIN` | `bias`, `debounceMs` | yes (line) | DeviceManifest | |
| 63 | +| `GPIOOUT` | — | yes (line) | DeviceManifest | |
| 64 | +| `ADC` | — | yes (channel) | DeviceManifest | |
| 65 | +| `DAC` | — | yes (channel) | DeviceManifest | |
| 66 | +| `PWM` | `frequency` | yes (channel) | DeviceManifest | |
| 67 | +| `UART` | — | no | DeviceManifest | |
| 68 | +| `MQTT` | `topic` | no | ExternalResources | |
| 69 | + |
| 70 | +**`Models[]`** — declared custom/self-hosted `LLMModel`s (`id`, `label`, |
| 71 | +`capabilities`). Static catalog models (built into the llmproxy) are referenced by |
| 72 | +id from agent nodes and need **no** declaration here; only custom models are listed, |
| 73 | +because only they need a deploy-time endpoint. |
| 74 | + |
| 75 | +The split is deliberate: `frequency`, `bias`, `topic`, `capabilities` describe *the |
| 76 | +workflow's intent* and travel with it everywhere. The physical pin, the broker URL, |
| 77 | +the inference endpoint are *environment facts* and are supplied separately. |
| 78 | + |
| 79 | +--- |
| 80 | + |
| 81 | +## The join — DeploymentMapping & ResourceBinding |
| 82 | + |
| 83 | +`contract/engine.yaml` → `go/engine/manifest.go`: |
| 84 | + |
| 85 | +```go |
| 86 | +type DeploymentMapping map[string]ResourceBinding // keyed by workflow logical id |
| 87 | + |
| 88 | +type ResourceBinding struct { |
| 89 | + Ref string `json:"ref"` // shared platform resource id |
| 90 | + Index *int `json:"index,omitempty"` // physical sub-address; nil for UART/MQTT/model |
| 91 | +} |
| 92 | +``` |
| 93 | + |
| 94 | +One entry per declared channel id **and** per declared model id. The pool a `ref` |
| 95 | +resolves against is **not** stored in the binding — it is implied by the *type of the |
| 96 | +workflow resource* with that id: |
| 97 | + |
| 98 | +- a hardware channel's `ref` → a key in the boot **DeviceManifest**; |
| 99 | +- an MQTT channel's `ref` → a key in the deploy **ExternalResources.MQTTs**; |
| 100 | +- a declared model's `ref` → a key in the deploy **ExternalResources.Providers**. |
| 101 | + |
| 102 | +`index` is the per-channel physical line/channel number *within* the bound driver |
| 103 | +instance. This is why a single `gpiochip0` driver (`ref`) can back many GPIO |
| 104 | +channels — each with a distinct `index`. |
| 105 | + |
| 106 | +> **Completeness is enforced at deploy, not at runtime.** A channel with no mapping |
| 107 | +> entry, an addressable channel with a nil `index`, or a model bound to a `ref` that |
| 108 | +> has no config are all hard build failures — see "Validation" below. Silent |
| 109 | +> degradation would hide config bugs until a node fires hours later. |
| 110 | +
|
| 111 | +--- |
| 112 | + |
| 113 | +## Layer 2 — the two config sources |
| 114 | + |
| 115 | +There are two config sources with **different lifecycles and ownership**, which is |
| 116 | +the reason they are separate artifacts: |
| 117 | + |
| 118 | +### DeviceManifest — boot-time, device-owned |
| 119 | + |
| 120 | +`go/engine/manifest.go`. The hardware physically present on this device. Loaded once |
| 121 | +at engine **boot** from a local file, reported to the backend in the |
| 122 | +`AgentBootCallback`, and used to open the driver registry. It does **not** change on |
| 123 | +deploy — swapping a workflow never re-opens GPIO chips. |
| 124 | + |
| 125 | +```go |
| 126 | +type DeviceManifest struct { |
| 127 | + GPIOs map[string]GPIOConfig // id ─► {Chip: "/dev/gpiochip0"} |
| 128 | + ADCs map[string]ADCConfig // id ─► {Device: "/sys/bus/iio/devices/iio:device0"} |
| 129 | + DACs map[string]DACConfig // id ─► {Device: ".../iio:device1"} |
| 130 | + Serials map[string]SerialConfig // id ─► {Port: "/dev/ttyUSB0", Baud: 115200} |
| 131 | + PWMs map[string]PWMConfig // id ─► {Chip: "/sys/class/pwm/pwmchip0"} |
| 132 | +} |
| 133 | +``` |
| 134 | + |
| 135 | +### ExternalResources — deploy-time, swappable |
| 136 | + |
| 137 | +`go/engine/manifest.go`. Delivered in the body of every `POST /deploy` alongside the |
| 138 | +workflow and mapping. These are the configs that *can* differ per deploy and are not |
| 139 | +owned by the device: |
| 140 | + |
| 141 | +```go |
| 142 | +type ExternalResources struct { |
| 143 | + MQTTs map[string]MQTTConnection // ref ─► broker connection |
| 144 | + Providers map[string]LLMProviderConfig // ref ─► self-hosted LLM endpoint |
| 145 | +} |
| 146 | +``` |
| 147 | + |
| 148 | +`MQTTConnection` carries `brokerUrl`, optional credentials, the `publishPrefix` / |
| 149 | +`subscribePrefix` the engine prepends to workflow topics, and an optional last-will. |
| 150 | +`LLMProviderConfig` carries `url` + optional `apiKey`; the model's `id` and |
| 151 | +`capabilities` come from the workflow's `LLMModel` declaration and are **not** |
| 152 | +repeated here. |
| 153 | + |
| 154 | +`ExternalResourceConfig` is a tagged union discriminated by `type` |
| 155 | +(`mqtt` | `selfhosted`); new external-resource kinds extend that `oneOf`. |
| 156 | + |
| 157 | +--- |
| 158 | + |
| 159 | +## Layer 3 — the registries that produce code |
| 160 | + |
| 161 | +Each pool has a registry that maps `ref` → a live implementation. Both `Registry` |
| 162 | +types follow the same discipline: open everything up front, and on any partial |
| 163 | +failure close what was opened so callers never see a half-built registry. |
| 164 | + |
| 165 | +### driver.Registry — built at boot |
| 166 | + |
| 167 | +`go/engine/driver/registry.go`. `NewRegistry(*DeviceManifest)` opens one driver per |
| 168 | +manifest entry, **typed per family** so a miswired manifest (a GPIO id looked up as |
| 169 | +an ADC) fails at registration, not first use: |
| 170 | + |
| 171 | +```go |
| 172 | +func (r *Registry) GPIO(id string) (GPIODriver, error) |
| 173 | +func (r *Registry) ADC(id string) (ADCDriver, error) |
| 174 | +func (r *Registry) DAC(id string) (DACDriver, error) |
| 175 | +func (r *Registry) PWM(id string) (PWMDriver, error) |
| 176 | +func (r *Registry) Serial(id string) (SerialDriver, error) |
| 177 | +``` |
| 178 | + |
| 179 | +This registry lives on the long-lived `Builder` (`go/engine/build/build.go`) and is |
| 180 | +reused across every deploy. |
| 181 | + |
| 182 | +### transport.Registry — built per deploy |
| 183 | + |
| 184 | +`go/engine/transport/registry.go`. `NewRegistry(*ExternalResources)` opens one paho |
| 185 | +MQTT connection per `ext.MQTTs` entry. Constructed fresh for each deploy, closed and |
| 186 | +replaced on the next one; ownership transfers to the `Runner`. |
| 187 | + |
| 188 | +### llmproxy.Client — composed per deploy |
| 189 | + |
| 190 | +`go/engine/build/llm.go` + `go/llmproxy`. `buildDeployProviders` walks `wf.Models`, |
| 191 | +resolves each via the mapping to a `Providers[ref]` config, and packs them into a |
| 192 | +single `selfhosted.Provider`. `Build` then composes that with the boot provider set |
| 193 | +into a fresh per-deploy `llmproxy.Client`. The provider for a chat call is resolved |
| 194 | +implicitly from the **model id** — there is no client-level default. |
| 195 | + |
| 196 | +--- |
| 197 | + |
| 198 | +## The full resolution walk |
| 199 | + |
| 200 | +Tracing one deploy through `go/engine/build/`: |
| 201 | + |
| 202 | +1. **`Engine.Deploy(wf, dm, ext)`** (`engine.go`) builds the new runner *before* |
| 203 | + tearing down the old one — a config bug keeps the previous workflow serving |
| 204 | + instead of dropping the engine to idle. |
| 205 | + |
| 206 | +2. **`Builder.Build`** (`build.go`): |
| 207 | + - `buildDeployProviders(wf, dm, ext)` → resolve declared models → per-deploy |
| 208 | + LLM client; `validateModelsResolvable` fails fast if an agent node references a |
| 209 | + model no provider can serve. |
| 210 | + - `transport.NewRegistry(ext)` → open all MQTT connections. |
| 211 | + |
| 212 | +3. **`buildChannels(wf.Channels, dm, drivers, transports, ext)`** (`channel.go`) — |
| 213 | + the heart of the join. For each declared channel, by type: |
| 214 | + |
| 215 | + ``` |
| 216 | + GPIOIN "door_sensor" |
| 217 | + ├─ bindingFor(dm, "door_sensor") → ResourceBinding{ref:"gpiochip0", index:17} |
| 218 | + ├─ indexFor(b, "door_sensor") → 17 (nil index = error) |
| 219 | + ├─ drivers.GPIO("gpiochip0") → GPIODriver (not registered = error) |
| 220 | + └─ &channel.GPIOInput{Driver, Line:17, Bias, DebounceMs} |
| 221 | + |
| 222 | + MQTT "alarm" |
| 223 | + ├─ bindingFor(dm, "alarm") → ResourceBinding{ref:"site-broker"} |
| 224 | + ├─ ext.MQTTs["site-broker"] → MQTTConnection (missing = error) |
| 225 | + ├─ transports.MQTT("site-broker") → MQTTTransport |
| 226 | + └─ &channel.MQTT{Transport, Topic, PublishPrefix, SubscribePrefix} |
| 227 | + ``` |
| 228 | + |
| 229 | +4. **`chs.SetupAll()`** runs after all nodes are built, applying each channel's |
| 230 | + accumulated requirements to its driver once (bias, PWM frequency, opening |
| 231 | + subscriptions). |
| 232 | + |
| 233 | +Nodes look up their linked channel in the per-build typed `channels` registry by |
| 234 | +logical id and hold the pointer; every node referencing the same id shares one |
| 235 | +instance, so subscriber lists and driver reservations stay consistent. |
| 236 | + |
| 237 | +### Validation (deploy-time, fail-fast) |
| 238 | + |
| 239 | +| Failure | Where | |
| 240 | +|------------------------------------------------------|--------------------------------| |
| 241 | +| channel id has no mapping entry / empty `ref` | `bindingFor` (`channel.go`) | |
| 242 | +| addressable channel has nil `index` | `indexFor` (`channel.go`) | |
| 243 | +| hardware `ref` not in driver registry | `drivers.GPIO/ADC/...` | |
| 244 | +| MQTT `ref` not in `ext.MQTTs` | `buildChannels` MQTT arm | |
| 245 | +| model declared but not bound by the mapping | `buildDeployProviders` | |
| 246 | +| model bound to a `ref` with no provider config | `buildDeployProviders` | |
| 247 | +| agent node references an unservable model | `validateModelsResolvable` | |
| 248 | + |
| 249 | +--- |
| 250 | + |
| 251 | +## A note on RAG / memory |
| 252 | + |
| 253 | +`contract/engine.yaml` describes a fourth resource class — RAG memory resolving |
| 254 | +"against the boot-configured backend (the ref is the collection id)". In the current |
| 255 | +engine that binding does **not** flow through `DeploymentMapping`: the `Retriever` |
| 256 | +backend is injected into the `Builder` at boot, and a `RetrieverNode` references its |
| 257 | +collection directly via `arguments.memoryReference` (`go/engine/build/graph.go`). |
| 258 | +Treat memory as boot-bound for now; if a deploy wizard surfaces it, model it as a |
| 259 | +boot/backend concern rather than a per-deploy `ResourceBinding`. |
| 260 | + |
| 261 | +--- |
| 262 | + |
| 263 | +## Designing a deploy wizard |
| 264 | + |
| 265 | +The three layers map almost directly onto wizard stages. The wizard's job is to |
| 266 | +**produce a complete `DeploymentMapping` + `ExternalResources`** for a given |
| 267 | +workflow against a given device/environment. |
| 268 | + |
| 269 | +1. **Read the requirements (Layer 1).** Parse the workflow's `Channels[]` and |
| 270 | + `Models[]`. This is the exact, finite checklist the wizard must satisfy — one |
| 271 | + row per logical id. The channel `type` tells you which pool and whether an |
| 272 | + `index` is required (see the Layer-1 table). |
| 273 | + |
| 274 | +2. **Offer bindings from the right pool (the join).** For each requirement, the |
| 275 | + candidate `ref`s come from a *type-specific* pool: |
| 276 | + - hardware channel → keys of the matching `DeviceManifest` family (already known |
| 277 | + from the boot callback the backend stored); |
| 278 | + - MQTT channel → existing/new MQTT connection definitions; |
| 279 | + - declared model → existing/new self-hosted endpoint definitions. |
| 280 | + |
| 281 | + For addressable hardware, also collect the `index` (the physical line/channel). |
| 282 | + Surface sharing explicitly: many channels may legitimately pick the same `ref`. |
| 283 | + |
| 284 | +3. **Collect configs for newly-referenced resources (Layer 2).** Any `ref` the user |
| 285 | + picks that isn't device-owned needs an `ExternalResources` entry: broker URL + |
| 286 | + prefixes + credentials for MQTT, endpoint URL + key for a model. Device-owned |
| 287 | + refs need nothing — their config is already in the boot manifest. |
| 288 | + |
| 289 | +4. **Validate before submit.** Re-run the deploy-time checks client-side so the user |
| 290 | + fixes gaps in the wizard, not via a 422 from `/deploy`: every channel mapped, |
| 291 | + every addressable channel indexed, every model bound to a configured provider. |
| 292 | + The table under "Validation" is the authoritative checklist. |
| 293 | + |
| 294 | +5. **Emit the deploy.** The wizard's output is exactly a `DeployRequest`: |
| 295 | + `{ workflow, mapping, externalResources }`. Layer 3 (the registries and live |
| 296 | + handles) is entirely the engine's concern — the wizard never touches it. |
| 297 | + |
| 298 | +Design implication: the wizard only ever authors **Layer 2 facts and the join**. It |
| 299 | +should treat the boot `DeviceManifest` as read-only ground truth (what hardware |
| 300 | +exists) and the workflow as a read-only requirement list (what's needed); everything |
| 301 | +it writes is the binding between them plus the swappable external configs. |
0 commit comments