|
| 1 | +# Phase 7.0 Addendum: Sidecar Transport Mapping |
| 2 | + |
| 3 | +Date: 2026-06-12 |
| 4 | +Status: Proposed (same review process as the Phase 7 spec freeze) |
| 5 | +Owner: Threshold Labs |
| 6 | +Scope: maps the frozen interactive-session API |
| 7 | +(`phase-7-interactive-session-spec-freeze.md`, section 5) onto the |
| 8 | +sidecar process boundary chosen in Decision Log entry 2, and scopes |
| 9 | +what that boundary means for #4007 (the decision-gated TEE checker |
| 10 | +stack). This document changes no contract: the sidecar is a |
| 11 | +transport swap by construction, and anything here that would alter |
| 12 | +the frozen spec is a defect in this document. |
| 13 | + |
| 14 | +## 1. What the sidecar is |
| 15 | + |
| 16 | +A separate OS process that owns the signer engine and every secret |
| 17 | +it holds: key-share state, the state-encryption key path, and (after |
| 18 | +Phase 7.1) the in-memory interactive nonces. The keep-client host |
| 19 | +process — Go runtime, libp2p, Ethereum client, every transitive |
| 20 | +dependency — talks to it over local IPC. |
| 21 | + |
| 22 | +**Boundary scope (important, and a hard prerequisite for #4007).** |
| 23 | +The "host holds no signing secrets" property is *scoped to the |
| 24 | +signing path* and holds once Phase 7.1's engine-held nonce custody |
| 25 | +ships: key shares are env/command-only and nonces never leave the |
| 26 | +engine. It does **not** yet hold for **DKG**: the transitional DKG |
| 27 | +APIs that section 3 maps unchanged still return and accept |
| 28 | +`secret_package_hex` through the host (frozen Phase 7 spec section 4 |
| 29 | +names DKG secret-package custody as an out-of-scope follow-up). So in |
| 30 | +any deployment that runs DKG through this transport, the host process |
| 31 | +still sees DKG secret material. #4007 must therefore treat the |
| 32 | +host↔sidecar **signing** interface as a secret boundary but must NOT |
| 33 | +treat the DKG interface as one until the DKG-custody follow-up moves |
| 34 | +that material inside the sidecar (or DKG is run out-of-band). Closing |
| 35 | +that gap is a precondition for the sidecar being a complete secret |
| 36 | +boundary. |
| 37 | + |
| 38 | +The isolation claim, stated precisely: today a memory-disclosure |
| 39 | +bug anywhere in the host address space can read whatever the |
| 40 | +in-process engine holds, because the dlopen FFI is an API boundary, |
| 41 | +not a security boundary. The sidecar makes the boundary an OS |
| 42 | +process boundary. It is also the deliberate stepping stone to the |
| 43 | +TEE deployment: a sidecar process becomes an enclave process with |
| 44 | +the same wire protocol, which is precisely why decision 2 told |
| 45 | +isolation-sensitive work to assume this shape. |
| 46 | + |
| 47 | +## 2. Why the frozen API maps cleanly |
| 48 | + |
| 49 | +Two prior decisions did the work in advance: |
| 50 | + |
| 51 | +* The engine API is already coarse JSON request/response over a C |
| 52 | + ABI — chosen over round-level FFI compatibility partly FOR |
| 53 | + "cleaner future sidecar extraction" |
| 54 | + (`signer-api-contract-decision-brief.md`). |
| 55 | +* The frozen section-5 calls are idempotent-or-fail-closed, |
| 56 | + self-contained request/response with no callbacks and no shared |
| 57 | + memory. |
| 58 | + |
| 59 | +One tension to resolve explicitly: the old decision brief argued |
| 60 | +against round-level APIs because they kept "nonce/round details |
| 61 | +crossing the FFI boundary" and made the transport swap harder. The |
| 62 | +Phase 7 API *is* round-level (`Round1`/`Round2`) — interactivity is |
| 63 | +forced by true two-round FROST with a network exchange between |
| 64 | +rounds — but the brief's actual objection is dissolved by the |
| 65 | +frozen spec's section 4: rounds cross the boundary, **nonces do |
| 66 | +not**. What transits is public commitments, signing packages, and |
| 67 | +shares. The chattiness objection is inherent to interactive FROST |
| 68 | +and is bounded (two round trips per attempt against a ~41-block |
| 69 | +attempt budget; the Annex B arithmetic gives ~175x headroom). |
| 70 | + |
| 71 | +## 3. Transport mapping |
| 72 | + |
| 73 | +Same JSON envelopes, different carrier: |
| 74 | + |
| 75 | +| Engine call (frozen spec §5 / existing API) | dlopen transitional | Sidecar | |
| 76 | +|---|---|---| |
| 77 | +| `InstallNativeTBTCSignerConfig` (init) | `frost_tbtc_init_signer_config` symbol | First request after connect (handshake step 2) | |
| 78 | +| `InteractiveSessionOpen/Round1/Round2/Aggregate/Abort` | per-call symbols (Phase 7.1/7.2) | One method each, identical JSON bodies | |
| 79 | +| Coarse transitional calls (until deleted per spec §7) | existing symbols | Same mapping rule | |
| 80 | + |
| 81 | +Carrier (proposed defaults, section 8): a UNIX domain socket with |
| 82 | +length-prefixed JSON frames, a small connection pool, and exactly |
| 83 | +one in-flight request per connection. No request multiplexing in |
| 84 | +v1: the engine's concurrency model and registries are unchanged, |
| 85 | +and the pool bounds parallelism exactly as the host's call sites do |
| 86 | +today. Errors keep the structured `ErrorResponse` contract |
| 87 | +(`consumed_attempt_replay` etc.) — the codes are the cross-version |
| 88 | +interface and MUST NOT fork between transports. |
| 89 | + |
| 90 | +Transport conformance: the contract tests that pin the FFI behavior |
| 91 | +become transport-parameterized — the same request/response suites |
| 92 | +run against the dlopen bridge and the sidecar, and divergence is a |
| 93 | +release blocker. This is the mechanism that keeps "transport swap, |
| 94 | +not API rework" true over time. |
| 95 | + |
| 96 | +## 4. Process model and lifecycle |
| 97 | + |
| 98 | +* **Spawn/supervision (proposed default)**: keep-client spawns the |
| 99 | + sidecar as a child process and supervises it (restart with |
| 100 | + backoff). The alternative — independent systemd unit — is open |
| 101 | + question (a); the child model keeps the operator surface to one |
| 102 | + service and lets the existing init-config demand semantics apply |
| 103 | + without a coordination protocol. |
| 104 | +* **Handshake**: (1) version exchange — the host refuses to operate |
| 105 | + a sidecar outside its supported range, fail closed; (2) init- |
| 106 | + config install — the host reads `TBTC_SIGNER_INIT_CONFIG_PATH` |
| 107 | + and posts the install request as the first message, exactly the |
| 108 | + #4037/#4041 flow. **Decision 7 carries over unchanged**: with the |
| 109 | + path set, a sidecar that cannot be spawned, cannot complete the |
| 110 | + handshake, or rejects the config is process-fatal for the host, |
| 111 | + in every profile. The enforcement point |
| 112 | + (`enforceNativeInitConfigDemand`) gains "sidecar unreachable" as |
| 113 | + one more member of the same failure family. |
| 114 | +* **Crash semantics**: a sidecar crash loses in-flight nonces — by |
| 115 | + the frozen spec's section 4 and ratified question 4 |
| 116 | + (markers-only), this is exactly the restart story: live attempts |
| 117 | + fail safe, durable consumption markers prevent any replay, the |
| 118 | + supervisor restarts the sidecar, re-init runs (idempotent by |
| 119 | + config fingerprint), and the attestation TTL applies at re-init |
| 120 | + (runbook prerequisite 6). No new failure mode is introduced; the |
| 121 | + sidecar converts "host process restart" into the strictly smaller |
| 122 | + "signer process restart." |
| 123 | +* **Shutdown**: host-initiated graceful stop sends `SessionAbort` |
| 124 | + for live sessions (zeroize), then terminates. SIGKILL is |
| 125 | + equivalent to a crash and is safe by the same argument. |
| 126 | + |
| 127 | +## 5. Security boundary |
| 128 | + |
| 129 | +* Socket: filesystem-permission-guarded UDS (owner-only directory), |
| 130 | + peer-credential check (`SO_PEERCRED`/`LOCAL_PEERCRED`) pinning |
| 131 | + the host UID. Never a network listener — a TCP mode is explicitly |
| 132 | + out of scope and should be rejected in review if proposed. |
| 133 | +* Authentication beyond UID pinning is deliberately deferred: the |
| 134 | + v1 trust model is same-host, same-operator. The TEE phase |
| 135 | + replaces this with an attestation-bound channel; designing that |
| 136 | + channel is part of #4007's scope, not this addendum's. |
| 137 | +* Secrets: the state-encryption key provider (env/command) runs in |
| 138 | + the **sidecar's** process environment, not the host's. The config |
| 139 | + file may carry `state_key_command` (its 0600 guidance stands); |
| 140 | + the command executes sidecar-side. Host environment variables |
| 141 | + stop being a secret channel entirely. |
| 142 | + |
| 143 | +## 6. What does not change |
| 144 | + |
| 145 | +JSON schema ownership (Rust), the error-code contract, idempotency |
| 146 | +and fail-closed semantics, registries and persistence |
| 147 | +(sidecar-local files, same formats), provenance gating, the frozen |
| 148 | +section-5 verification rules, and the section-7 deletion trigger. |
| 149 | +The dlopen bridge remains the shipping transport until the sidecar |
| 150 | +lands; Phases 7.1-7.5 build and validate on dlopen without waiting. |
| 151 | + |
| 152 | +## 7. #4007 (TEE checker stack) scoping |
| 153 | + |
| 154 | +#4007 gates *whether a signer may register* on TEE attestation |
| 155 | +evidence and stays decision-gated on the DAO's TEE policy — this |
| 156 | +addendum does not undraft it. What the sidecar decision gives it is |
| 157 | +a concrete subject: the artifact whose identity gets attested is |
| 158 | +the sidecar binary (later, the enclave image), not the composite |
| 159 | +keep-client process. #4007's open scoping questions become: which |
| 160 | +measurement (binary hash / enclave MRENCLAVE-equivalent), who |
| 161 | +verifies (the DAO-whitelist checker), and how the attestation binds |
| 162 | +to the UDS channel. Those land in #4007's own design doc; the |
| 163 | +interface contract it must respect is sections 3-5 here. |
| 164 | + |
| 165 | +## 8. Open questions (proposed defaults; decide at this addendum's |
| 166 | +sign-off) |
| 167 | + |
| 168 | +* (a) **Spawn model**: keep-client child process (default) vs. |
| 169 | + independent systemd unit. |
| 170 | +* (b) **Wire framing**: length-prefixed JSON frames (default) vs. |
| 171 | + newline-delimited JSON. |
| 172 | +* (c) **Connection model**: small pool, one in-flight request per |
| 173 | + connection (default) vs. request-id multiplexing. |
| 174 | +* (d) **Packaging**: sidecar binary ships in the same release |
| 175 | + artifact as keep-client (default) vs. separate artifact with its |
| 176 | + own version line. |
| 177 | + |
| 178 | +## 9. Sequencing |
| 179 | + |
| 180 | +The sidecar is not on the 7.1-7.5 critical path: those phases build |
| 181 | +on the dlopen transport, and the frozen API guarantees the swap is |
| 182 | +transport-only. The sidecar track runs in parallel and must |
| 183 | +converge **before the ECDSA-retirement phases** (decision 1's |
| 184 | +timing: take the isolation step before mainnet TVL migrates). |
| 185 | +Suggested shape: 7.S1 sidecar process + handshake + conformance |
| 186 | +suite; 7.S2 operational hardening (supervision, packaging, |
| 187 | +runbook); 7.S3 cutover of the production default with dlopen kept |
| 188 | +as the rollback transport for one release. |
0 commit comments