Skip to content

Commit 472dfa6

Browse files
authored
docs(tbtc/signer): Phase 7.0 sidecar transport addendum (#4050)
## What The outstanding 7.0 item from the frozen Phase 7 spec (§9): maps the frozen §5 session API onto the sidecar process boundary (Decision Log entry 2) and scopes #4007's TEE checker stack against it. Doc-only; by construction it changes no contract — anything in it that would alter the frozen spec is a defect in the addendum. ## Highlights - **The swap was pre-paid**: the coarse JSON request/response contract was chosen over round-level FFI partly *for* sidecar extraction, and the frozen spec's engine-held nonce custody dissolves the old brief's objection to round-level APIs — rounds cross the boundary, nonces never do. - **Decision 7 carries over unchanged**: with `TBTC_SIGNER_INIT_CONFIG_PATH` set, a sidecar that can't spawn, can't handshake, or rejects the config is process-fatal for the host — "sidecar unreachable" joins the existing fatal failure family at the same enforcement point. - **Crash = the restart story we already ratified**: markers-only durability means a sidecar crash fails live attempts safe, consumption markers prevent replay, re-init is fingerprint-idempotent, attestation TTL applies at re-init. No new failure mode. - **Security boundary**: owner-only UDS + peer-credential UID pinning, never a network listener (TCP explicitly rejected); state-key provider executes in the sidecar's environment, removing host env as a secret channel. Attestation-bound channels are #4007's scope. - **Conformance mechanism**: the FFI contract tests become transport-parameterized; dlopen/sidecar divergence is a release blocker — that's what keeps "transport swap, not API rework" true over time. - **Sequencing**: not on the 7.1–7.5 critical path (those build on dlopen); the sidecar track (7.S1–S3) runs parallel and must converge before ECDSA retirement per decision 1. ## Open questions for sign-off (§8, proposed defaults) (a) spawn model: keep-client child process; (b) framing: length-prefixed JSON; (c) connection model: pool with one in-flight request per connection; (d) packaging: same release artifact. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
2 parents e4a8734 + e45b975 commit 472dfa6

1 file changed

Lines changed: 188 additions & 0 deletions

File tree

Lines changed: 188 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,188 @@
1+
# Phase 7.0 Addendum: Sidecar Transport Mapping
2+
3+
Date: 2026-06-12
4+
Status: Proposed (same review process as the Phase 7 spec freeze)
5+
Owner: Threshold Labs
6+
Scope: maps the frozen interactive-session API
7+
(`phase-7-interactive-session-spec-freeze.md`, section 5) onto the
8+
sidecar process boundary chosen in Decision Log entry 2, and scopes
9+
what that boundary means for #4007 (the decision-gated TEE checker
10+
stack). This document changes no contract: the sidecar is a
11+
transport swap by construction, and anything here that would alter
12+
the frozen spec is a defect in this document.
13+
14+
## 1. What the sidecar is
15+
16+
A separate OS process that owns the signer engine and every secret
17+
it holds: key-share state, the state-encryption key path, and (after
18+
Phase 7.1) the in-memory interactive nonces. The keep-client host
19+
process — Go runtime, libp2p, Ethereum client, every transitive
20+
dependency — talks to it over local IPC.
21+
22+
**Boundary scope (important, and a hard prerequisite for #4007).**
23+
The "host holds no signing secrets" property is *scoped to the
24+
signing path* and holds once Phase 7.1's engine-held nonce custody
25+
ships: key shares are env/command-only and nonces never leave the
26+
engine. It does **not** yet hold for **DKG**: the transitional DKG
27+
APIs that section 3 maps unchanged still return and accept
28+
`secret_package_hex` through the host (frozen Phase 7 spec section 4
29+
names DKG secret-package custody as an out-of-scope follow-up). So in
30+
any deployment that runs DKG through this transport, the host process
31+
still sees DKG secret material. #4007 must therefore treat the
32+
host↔sidecar **signing** interface as a secret boundary but must NOT
33+
treat the DKG interface as one until the DKG-custody follow-up moves
34+
that material inside the sidecar (or DKG is run out-of-band). Closing
35+
that gap is a precondition for the sidecar being a complete secret
36+
boundary.
37+
38+
The isolation claim, stated precisely: today a memory-disclosure
39+
bug anywhere in the host address space can read whatever the
40+
in-process engine holds, because the dlopen FFI is an API boundary,
41+
not a security boundary. The sidecar makes the boundary an OS
42+
process boundary. It is also the deliberate stepping stone to the
43+
TEE deployment: a sidecar process becomes an enclave process with
44+
the same wire protocol, which is precisely why decision 2 told
45+
isolation-sensitive work to assume this shape.
46+
47+
## 2. Why the frozen API maps cleanly
48+
49+
Two prior decisions did the work in advance:
50+
51+
* The engine API is already coarse JSON request/response over a C
52+
ABI — chosen over round-level FFI compatibility partly FOR
53+
"cleaner future sidecar extraction"
54+
(`signer-api-contract-decision-brief.md`).
55+
* The frozen section-5 calls are idempotent-or-fail-closed,
56+
self-contained request/response with no callbacks and no shared
57+
memory.
58+
59+
One tension to resolve explicitly: the old decision brief argued
60+
against round-level APIs because they kept "nonce/round details
61+
crossing the FFI boundary" and made the transport swap harder. The
62+
Phase 7 API *is* round-level (`Round1`/`Round2`) — interactivity is
63+
forced by true two-round FROST with a network exchange between
64+
rounds — but the brief's actual objection is dissolved by the
65+
frozen spec's section 4: rounds cross the boundary, **nonces do
66+
not**. What transits is public commitments, signing packages, and
67+
shares. The chattiness objection is inherent to interactive FROST
68+
and is bounded (two round trips per attempt against a ~41-block
69+
attempt budget; the Annex B arithmetic gives ~175x headroom).
70+
71+
## 3. Transport mapping
72+
73+
Same JSON envelopes, different carrier:
74+
75+
| Engine call (frozen spec §5 / existing API) | dlopen transitional | Sidecar |
76+
|---|---|---|
77+
| `InstallNativeTBTCSignerConfig` (init) | `frost_tbtc_init_signer_config` symbol | First request after connect (handshake step 2) |
78+
| `InteractiveSessionOpen/Round1/Round2/Aggregate/Abort` | per-call symbols (Phase 7.1/7.2) | One method each, identical JSON bodies |
79+
| Coarse transitional calls (until deleted per spec §7) | existing symbols | Same mapping rule |
80+
81+
Carrier (proposed defaults, section 8): a UNIX domain socket with
82+
length-prefixed JSON frames, a small connection pool, and exactly
83+
one in-flight request per connection. No request multiplexing in
84+
v1: the engine's concurrency model and registries are unchanged,
85+
and the pool bounds parallelism exactly as the host's call sites do
86+
today. Errors keep the structured `ErrorResponse` contract
87+
(`consumed_attempt_replay` etc.) — the codes are the cross-version
88+
interface and MUST NOT fork between transports.
89+
90+
Transport conformance: the contract tests that pin the FFI behavior
91+
become transport-parameterized — the same request/response suites
92+
run against the dlopen bridge and the sidecar, and divergence is a
93+
release blocker. This is the mechanism that keeps "transport swap,
94+
not API rework" true over time.
95+
96+
## 4. Process model and lifecycle
97+
98+
* **Spawn/supervision (proposed default)**: keep-client spawns the
99+
sidecar as a child process and supervises it (restart with
100+
backoff). The alternative — independent systemd unit — is open
101+
question (a); the child model keeps the operator surface to one
102+
service and lets the existing init-config demand semantics apply
103+
without a coordination protocol.
104+
* **Handshake**: (1) version exchange — the host refuses to operate
105+
a sidecar outside its supported range, fail closed; (2) init-
106+
config install — the host reads `TBTC_SIGNER_INIT_CONFIG_PATH`
107+
and posts the install request as the first message, exactly the
108+
#4037/#4041 flow. **Decision 7 carries over unchanged**: with the
109+
path set, a sidecar that cannot be spawned, cannot complete the
110+
handshake, or rejects the config is process-fatal for the host,
111+
in every profile. The enforcement point
112+
(`enforceNativeInitConfigDemand`) gains "sidecar unreachable" as
113+
one more member of the same failure family.
114+
* **Crash semantics**: a sidecar crash loses in-flight nonces — by
115+
the frozen spec's section 4 and ratified question 4
116+
(markers-only), this is exactly the restart story: live attempts
117+
fail safe, durable consumption markers prevent any replay, the
118+
supervisor restarts the sidecar, re-init runs (idempotent by
119+
config fingerprint), and the attestation TTL applies at re-init
120+
(runbook prerequisite 6). No new failure mode is introduced; the
121+
sidecar converts "host process restart" into the strictly smaller
122+
"signer process restart."
123+
* **Shutdown**: host-initiated graceful stop sends `SessionAbort`
124+
for live sessions (zeroize), then terminates. SIGKILL is
125+
equivalent to a crash and is safe by the same argument.
126+
127+
## 5. Security boundary
128+
129+
* Socket: filesystem-permission-guarded UDS (owner-only directory),
130+
peer-credential check (`SO_PEERCRED`/`LOCAL_PEERCRED`) pinning
131+
the host UID. Never a network listener — a TCP mode is explicitly
132+
out of scope and should be rejected in review if proposed.
133+
* Authentication beyond UID pinning is deliberately deferred: the
134+
v1 trust model is same-host, same-operator. The TEE phase
135+
replaces this with an attestation-bound channel; designing that
136+
channel is part of #4007's scope, not this addendum's.
137+
* Secrets: the state-encryption key provider (env/command) runs in
138+
the **sidecar's** process environment, not the host's. The config
139+
file may carry `state_key_command` (its 0600 guidance stands);
140+
the command executes sidecar-side. Host environment variables
141+
stop being a secret channel entirely.
142+
143+
## 6. What does not change
144+
145+
JSON schema ownership (Rust), the error-code contract, idempotency
146+
and fail-closed semantics, registries and persistence
147+
(sidecar-local files, same formats), provenance gating, the frozen
148+
section-5 verification rules, and the section-7 deletion trigger.
149+
The dlopen bridge remains the shipping transport until the sidecar
150+
lands; Phases 7.1-7.5 build and validate on dlopen without waiting.
151+
152+
## 7. #4007 (TEE checker stack) scoping
153+
154+
#4007 gates *whether a signer may register* on TEE attestation
155+
evidence and stays decision-gated on the DAO's TEE policy — this
156+
addendum does not undraft it. What the sidecar decision gives it is
157+
a concrete subject: the artifact whose identity gets attested is
158+
the sidecar binary (later, the enclave image), not the composite
159+
keep-client process. #4007's open scoping questions become: which
160+
measurement (binary hash / enclave MRENCLAVE-equivalent), who
161+
verifies (the DAO-whitelist checker), and how the attestation binds
162+
to the UDS channel. Those land in #4007's own design doc; the
163+
interface contract it must respect is sections 3-5 here.
164+
165+
## 8. Open questions (proposed defaults; decide at this addendum's
166+
sign-off)
167+
168+
* (a) **Spawn model**: keep-client child process (default) vs.
169+
independent systemd unit.
170+
* (b) **Wire framing**: length-prefixed JSON frames (default) vs.
171+
newline-delimited JSON.
172+
* (c) **Connection model**: small pool, one in-flight request per
173+
connection (default) vs. request-id multiplexing.
174+
* (d) **Packaging**: sidecar binary ships in the same release
175+
artifact as keep-client (default) vs. separate artifact with its
176+
own version line.
177+
178+
## 9. Sequencing
179+
180+
The sidecar is not on the 7.1-7.5 critical path: those phases build
181+
on the dlopen transport, and the frozen API guarantees the swap is
182+
transport-only. The sidecar track runs in parallel and must
183+
converge **before the ECDSA-retirement phases** (decision 1's
184+
timing: take the isolation step before mainnet TVL migrates).
185+
Suggested shape: 7.S1 sidecar process + handshake + conformance
186+
suite; 7.S2 operational hardening (supervision, packaging,
187+
runbook); 7.S3 cutover of the production default with dlopen kept
188+
as the rollback transport for one release.

0 commit comments

Comments
 (0)