Skip to content

Commit 289df95

Browse files
mswilkisonclaude
andcommitted
fix(tbtc/signer): resolve interactive key material from DKG state, not the request
Two findings, the first central to the whole effort: P2 (secret boundary) - InteractiveSessionOpenRequest carried key_package_hex, forcing the private FROST key package through the host/FFI request buffer on every open. That directly contradicts the frozen spec section 4 ("key shares already env/command-only ... no secret signing material transits the Go/Rust interface") and would leave the sidecar unable to provide the signing-secret boundary that is the entire point of Phase 7. Open now resolves the member's key package from the session's own DKG state (run_dkg-populated), exactly like the coarse start_sign_round: the session must already exist with completed DKG, the request carries no key material, and the threshold is validated against the DKG threshold. The key_package fields are removed from the request; the spec section 5 is corrected accordingly. Because interactive sessions now always ride a DKG-populated session and never create registry entries, the empty-session-churn fixed last round is impossible by construction, so the is_disposable disposal logic (and its churn test) is reverted as dead code; sweep/abort just clear the live attempt and retain the DKG session. P2 (rollback) - a delayed InteractiveSessionOpen for an older attempt could replace a newer live attempt and wipe its nonces. Open now replaces a different live attempt ONLY when the incoming attempt_number strictly advances the live one; an older-or-equal attempt is rejected. Tests: aggregation, framing, replay, restart, persist-fault, TTL, capacity, lifecycle, quarantine, firewall, and the new TOCTOU recheck all rebuilt on DKG-seeded sessions (key material from engine state); added open-requires-DKG-session and non-participant rejection; threshold mismatch now reports the DKG threshold. Full suite 264 passed / 1 ignored, clippy -D warnings clean, chaos green. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
1 parent edf7952 commit 289df95

6 files changed

Lines changed: 298 additions & 473 deletions

File tree

pkg/tbtc/signer/docs/phase-7-interactive-session-spec-freeze.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -145,6 +145,17 @@ self-contained):
145145
mode is the only mode here: no legacy-shape fallback), checks
146146
policy gates and provenance, registers the session. Idempotent
147147
by full-request fingerprint; conflicting reopen fails closed.
148+
**The member's key package is resolved from the session's own DKG
149+
state (run_dkg), NOT carried in the request** — so the session
150+
must already exist with completed DKG, and no signing secret
151+
crosses the FFI/host boundary (section 4). This is a correction to
152+
an earlier draft of this spec that had Open accept the key package
153+
in the request; accepting it would have left key shares outside
154+
the engine and defeated the sidecar's signing-secret boundary. A
155+
request `threshold` is still carried but must equal the DKG
156+
threshold. As a consequence, an interactive session always rides a
157+
DKG-populated session and never creates registry entries of its
158+
own.
148159
2. `InteractiveRound1` — fresh nonces + commitments as in section
149160
4. Per (session, attempt, member) at most one live handle;
150161
repeat calls return the same commitments (idempotent) until

pkg/tbtc/signer/src/api.rs

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -158,17 +158,16 @@ pub struct InteractiveSessionOpenRequest {
158158
pub member_identifier: u16,
159159
pub message_hex: String,
160160
pub key_group: String,
161+
/// Signing threshold; must equal the session's DKG threshold. The
162+
/// key material itself is resolved from the engine's DKG state and
163+
/// is never carried in this request - no signing secret crosses the
164+
/// FFI (frozen spec section 4).
161165
pub threshold: u16,
162166
#[serde(default, skip_serializing_if = "Option::is_none")]
163167
pub taproot_merkle_root_hex: Option<String>,
164168
/// Required: interactive sessions are strict-mode only; there is
165169
/// no legacy-shape fallback on this path.
166170
pub attempt_context: AttemptContext,
167-
/// The member's key package, supplied once per session and held by
168-
/// the engine for the session's lifetime (in memory only:
169-
/// interactive session state follows markers-only durability).
170-
pub key_package_identifier: String,
171-
pub key_package_hex: String,
172171
}
173172

174173
#[derive(Clone, Debug, Deserialize, PartialEq, Eq, Serialize)]

pkg/tbtc/signer/src/engine/interactive.rs

Lines changed: 135 additions & 119 deletions
Original file line numberDiff line numberDiff line change
@@ -60,56 +60,6 @@ pub fn interactive_session_open(
6060
// marker and sign again.
6161
request.attempt_context = canonical_attempt_context(&request.attempt_context);
6262

63-
// Strict-mode-only attempt context: required, fully validated,
64-
// coordinator recomputed per RFC-21 Annex A.
65-
let canonical_included_participants = validate_attempt_context(
66-
&request.session_id,
67-
&request.key_group,
68-
&message_bytes,
69-
&message_digest_hex,
70-
request.threshold,
71-
Some(&request.attempt_context),
72-
true,
73-
)?
74-
.ok_or_else(|| {
75-
EngineError::Internal(
76-
"strict attempt context validation returned no participants".to_string(),
77-
)
78-
})?;
79-
80-
if !canonical_included_participants.contains(&request.member_identifier) {
81-
return Err(EngineError::Validation(
82-
"member_identifier must be included in attempt_context.included_participants"
83-
.to_string(),
84-
));
85-
}
86-
87-
let key_package = decode_key_package(
88-
"InteractiveSessionOpen",
89-
&request.key_package_identifier,
90-
&request.key_package_hex,
91-
)?;
92-
let expected_identifier =
93-
participant_identifier_to_frost_identifier(request.member_identifier)?;
94-
if *key_package.identifier() != expected_identifier {
95-
return Err(EngineError::Validation(
96-
"key_package_identifier must match member_identifier".to_string(),
97-
));
98-
}
99-
// The signing threshold is fixed by the key material. Reject a
100-
// mismatch at Open: otherwise Round2 would accept a signing package
101-
// of the requested (wrong) size, persist the consumed marker, and
102-
// only then have frost::round2::sign fail on the commitment count -
103-
// burning the nonce handle for a validation error, against the
104-
// verify-before-consume contract.
105-
if *key_package.min_signers() != request.threshold {
106-
return Err(EngineError::Validation(format!(
107-
"threshold [{}] does not match the key package min_signers [{}]",
108-
request.threshold,
109-
*key_package.min_signers()
110-
)));
111-
}
112-
11363
let request_fingerprint = interactive_open_request_fingerprint(&request)?;
11464
let attempt_id = request.attempt_context.attempt_id.clone();
11565

@@ -118,46 +68,116 @@ pub fn interactive_session_open(
11868
.map_err(|_| EngineError::Internal("engine lock poisoned".to_string()))?;
11969
sweep_expired_interactive_state(&mut guard);
12070

121-
ensure_session_insert_capacity(&guard.sessions, &request.session_id)?;
122-
123-
// Lifecycle + quarantine + signing-policy-firewall gates (frozen
124-
// spec section 5: Open "checks policy gates"). The SAME helper runs
125-
// again at Round2 (the share-release moment) so a policy change
126-
// recorded after Open - emergency rekey, finalization, quarantine,
127-
// or a re-bound policy-checked tx - cannot let a share escape.
12871
let auto_quarantine_config = load_auto_quarantine_config()?;
129-
let existing_session = guard.sessions.get(&request.session_id);
130-
enforce_interactive_signing_gates(
131-
&request.session_id,
132-
request.member_identifier,
133-
&request.message_hex,
134-
existing_session.and_then(|session| session.emergency_rekey_event.as_ref()),
135-
existing_session.is_some_and(|session| session.finalize_request_fingerprint.is_some()),
136-
existing_session.and_then(|session| session.tx_result.as_ref()),
137-
&guard.quarantined_operator_identifiers,
138-
auto_quarantine_config.as_ref(),
139-
)?;
14072

141-
// Decide everything from a read-only view BEFORE inserting anything,
142-
// so the reject paths (consumed marker, conflict, capacity) never
143-
// leave an empty SessionState behind. Returns: whether the attempt
144-
// is already consumed, the disposition of any live attempt under
145-
// this exact attempt_id (Some(true)=idempotent, Some(false)=
146-
// conflicting fingerprint, None=no matching live attempt), and
147-
// whether a live interactive attempt is being replaced.
148-
let (already_consumed, matching_attempt_idempotent, replacing) = {
149-
let existing = guard.sessions.get(&request.session_id);
150-
let already_consumed = existing.is_some_and(|session| {
151-
session
152-
.consumed_interactive_attempt_markers
153-
.contains(&attempt_id)
154-
});
155-
let matching_attempt_idempotent = existing
156-
.and_then(|session| session.interactive_signing.as_ref())
73+
// The session must already exist with completed DKG. Key material
74+
// lives in the engine's own DKG-populated state and is NEVER
75+
// supplied through the request, so no signing secret crosses the
76+
// FFI/host boundary (frozen spec section 4). Resolve the member's
77+
// key package, run the policy gates, and validate the strict
78+
// attempt context against the DKG threshold/key group - mirroring
79+
// the coarse start_sign_round - all under one immutable borrow,
80+
// then do the mutable install.
81+
let (key_package, canonical_included_participants) = {
82+
let session = guard.sessions.get(&request.session_id).ok_or_else(|| {
83+
EngineError::SessionNotFound {
84+
session_id: request.session_id.clone(),
85+
}
86+
})?;
87+
let dkg = session
88+
.dkg_result
89+
.as_ref()
90+
.ok_or_else(|| EngineError::DkgNotReady {
91+
session_id: request.session_id.clone(),
92+
})?;
93+
if request.key_group != dkg.key_group {
94+
return Err(EngineError::Validation(
95+
"key_group does not match DKG output for this session".to_string(),
96+
));
97+
}
98+
if request.threshold != dkg.threshold {
99+
return Err(EngineError::Validation(format!(
100+
"threshold [{}] does not match the DKG threshold [{}] for this session",
101+
request.threshold, dkg.threshold
102+
)));
103+
}
104+
let key_package = session
105+
.dkg_key_packages
106+
.as_ref()
107+
.ok_or_else(|| EngineError::Internal("missing DKG key package cache".to_string()))?
108+
.get(&request.member_identifier)
109+
.ok_or_else(|| {
110+
EngineError::Validation(
111+
"member_identifier is not a DKG participant for this session".to_string(),
112+
)
113+
})?
114+
.clone();
115+
116+
// Lifecycle + quarantine + signing-policy-firewall gates (frozen
117+
// spec section 5: Open "checks policy gates"). The SAME helper
118+
// runs again at Round2 (the share-release moment) so a policy
119+
// change recorded after Open - emergency rekey, finalization,
120+
// quarantine, or a re-bound policy-checked tx - cannot let a
121+
// share escape.
122+
enforce_interactive_signing_gates(
123+
&request.session_id,
124+
request.member_identifier,
125+
&request.message_hex,
126+
session.emergency_rekey_event.as_ref(),
127+
session.finalize_request_fingerprint.is_some(),
128+
session.tx_result.as_ref(),
129+
&guard.quarantined_operator_identifiers,
130+
auto_quarantine_config.as_ref(),
131+
)?;
132+
133+
// Strict-mode-only attempt context: required, fully validated
134+
// against the DKG threshold/key group, coordinator recomputed
135+
// per RFC-21 Annex A.
136+
let canonical_included_participants = validate_attempt_context(
137+
&request.session_id,
138+
&dkg.key_group,
139+
&message_bytes,
140+
&message_digest_hex,
141+
dkg.threshold,
142+
Some(&request.attempt_context),
143+
true,
144+
)?
145+
.ok_or_else(|| {
146+
EngineError::Internal(
147+
"strict attempt context validation returned no participants".to_string(),
148+
)
149+
})?;
150+
if !canonical_included_participants.contains(&request.member_identifier) {
151+
return Err(EngineError::Validation(
152+
"member_identifier must be included in attempt_context.included_participants"
153+
.to_string(),
154+
));
155+
}
156+
(key_package, canonical_included_participants)
157+
};
158+
159+
// Disposition over the (now-confirmed) existing session: consumed
160+
// marker, idempotent/conflicting reopen of this exact attempt, and
161+
// the live attempt (id + number) for the replacement decision.
162+
let (already_consumed, matching_attempt_idempotent, live_attempt) = {
163+
let session = guard
164+
.sessions
165+
.get(&request.session_id)
166+
.expect("session existed under the held engine lock");
167+
let already_consumed = session
168+
.consumed_interactive_attempt_markers
169+
.contains(&attempt_id);
170+
let live = session.interactive_signing.as_ref();
171+
let matching_attempt_idempotent = live
157172
.filter(|interactive| interactive.attempt_context.attempt_id == attempt_id)
158173
.map(|interactive| interactive.open_request_fingerprint == request_fingerprint);
159-
let replacing = existing.is_some_and(|session| session.interactive_signing.is_some());
160-
(already_consumed, matching_attempt_idempotent, replacing)
174+
let live_attempt = live.map(|interactive| {
175+
(
176+
interactive.attempt_context.attempt_id.clone(),
177+
interactive.attempt_context.attempt_number,
178+
)
179+
});
180+
(already_consumed, matching_attempt_idempotent, live_attempt)
161181
};
162182

163183
if already_consumed {
@@ -180,13 +200,26 @@ pub fn interactive_session_open(
180200
session_id: request.session_id.clone(),
181201
});
182202
}
183-
// None: no live attempt under this attempt_id. If a DIFFERENT
184-
// attempt is live it is implicitly aborted below - the retry
185-
// loop has moved on and a stuck prior attempt must not strand
186-
// its nonces.
187203
None => {}
188204
}
189205

206+
// A DIFFERENT live attempt is replaced ONLY by a strictly newer
207+
// attempt: the retry loop advanced. A stale/delayed open for an
208+
// older or equal attempt must not roll the session back and wipe
209+
// the newer attempt's nonces.
210+
let replacing = live_attempt.is_some();
211+
if let Some((live_attempt_id, live_attempt_number)) = live_attempt {
212+
if live_attempt_id != attempt_id
213+
&& request.attempt_context.attempt_number <= live_attempt_number
214+
{
215+
return Err(EngineError::Validation(format!(
216+
"attempt_number [{}] does not advance the live interactive attempt [{}]; \
217+
refusing to roll back to an older or equal attempt",
218+
request.attempt_context.attempt_number, live_attempt_number
219+
)));
220+
}
221+
}
222+
190223
// Capacity counts every live interactive session. When replacing,
191224
// this session already holds one of those slots, so the cap does
192225
// not apply; when not replacing, a new slot is being taken.
@@ -208,8 +241,8 @@ pub fn interactive_session_open(
208241

209242
let session = guard
210243
.sessions
211-
.entry(request.session_id.clone())
212-
.or_default();
244+
.get_mut(&request.session_id)
245+
.expect("session existed under the held engine lock");
213246

214247
if let Some(mut replaced) = session.interactive_signing.take() {
215248
zeroize_interactive_round1(&mut replaced);
@@ -537,18 +570,6 @@ pub fn interactive_session_abort(
537570
None => false,
538571
};
539572

540-
// Drop the session if aborting left it with nothing durable, so an
541-
// open-then-abort churn cannot accumulate empty entries against
542-
// TBTC_SIGNER_MAX_SESSIONS.
543-
if aborted
544-
&& guard
545-
.sessions
546-
.get(&request.session_id)
547-
.is_some_and(SessionState::is_disposable)
548-
{
549-
guard.sessions.remove(&request.session_id);
550-
}
551-
552573
record_hardening_telemetry(|telemetry| {
553574
telemetry.interactive_session_abort_success_total = telemetry
554575
.interactive_session_abort_success_total
@@ -729,28 +750,23 @@ pub(crate) fn zeroize_interactive_round1(interactive: &mut InteractiveSigningSta
729750
pub(crate) fn sweep_expired_interactive_state(engine_state: &mut EngineState) {
730751
let ttl_seconds = interactive_session_ttl_seconds();
731752
let now = now_unix();
732-
engine_state.sessions.retain(|_session_id, session| {
753+
// Interactive sessions always ride a DKG-populated session (Open
754+
// requires existing DKG state), so expiry only clears the live
755+
// attempt's nonces; the session itself - DKG material, consumed
756+
// markers - is retained for future signing.
757+
for session in engine_state.sessions.values_mut() {
733758
let expired = session
734759
.interactive_signing
735760
.as_ref()
736761
.is_some_and(|interactive| {
737762
now.saturating_sub(interactive.opened_at_unix) > ttl_seconds
738763
});
739-
if !expired {
740-
// Untouched sessions are kept as-is; only sessions whose
741-
// live attempt we just expired are candidates for removal.
742-
return true;
743-
}
744-
if let Some(mut removed) = session.interactive_signing.take() {
745-
zeroize_interactive_round1(&mut removed);
764+
if expired {
765+
if let Some(mut removed) = session.interactive_signing.take() {
766+
zeroize_interactive_round1(&mut removed);
767+
}
746768
}
747-
// Having cleared the expired attempt, drop the session if it now
748-
// holds nothing durable, so churned interactive opens cannot
749-
// accumulate empty entries against TBTC_SIGNER_MAX_SESSIONS. A
750-
// session that still carries consumed markers or DKG material is
751-
// kept.
752-
!session.is_disposable()
753-
});
769+
}
754770
}
755771

756772
pub(crate) fn max_live_interactive_sessions_limit() -> usize {

pkg/tbtc/signer/src/engine/state.rs

Lines changed: 0 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -113,44 +113,6 @@ pub(crate) struct SessionState {
113113
pub(crate) consumed_interactive_attempt_markers: HashSet<String>,
114114
}
115115

116-
impl SessionState {
117-
// True when the session holds no durable or live state worth a
118-
// registry slot: removing it loses nothing. Used to drop a session
119-
// that only ever held a now-cleared interactive attempt, so churned
120-
// interactive opens (open -> expire/abort before Round2) cannot
121-
// accumulate empty entries and exhaust TBTC_SIGNER_MAX_SESSIONS.
122-
//
123-
// EVERY field must be checked here: a field omitted from this
124-
// conjunction risks dropping a session that still carries replay
125-
// protection (consumed markers) or DKG material. When adding a
126-
// field to SessionState, add it here too.
127-
pub(crate) fn is_disposable(&self) -> bool {
128-
self.dkg_request_fingerprint.is_none()
129-
&& self.dkg_key_packages.is_none()
130-
&& self.dkg_public_key_package.is_none()
131-
&& self.dkg_result.is_none()
132-
&& self.sign_request_fingerprint.is_none()
133-
&& self.sign_message_bytes.is_none()
134-
&& self.round_state.is_none()
135-
&& self.active_attempt_context.is_none()
136-
&& self.attempt_transition_records.is_empty()
137-
&& self.consumed_attempt_ids.is_empty()
138-
&& self.consumed_sign_round_ids.is_empty()
139-
&& self.finalize_request_fingerprint.is_none()
140-
&& self.signature_result.is_none()
141-
&& self.consumed_finalize_round_ids.is_empty()
142-
&& self.consumed_finalize_request_fingerprints.is_empty()
143-
&& self.build_tx_request_fingerprint.is_none()
144-
&& self.tx_result.is_none()
145-
&& self.refresh_request_fingerprint.is_none()
146-
&& self.refresh_result.is_none()
147-
&& self.refresh_history.is_empty()
148-
&& self.emergency_rekey_event.is_none()
149-
&& self.interactive_signing.is_none()
150-
&& self.consumed_interactive_attempt_markers.is_empty()
151-
}
152-
}
153-
154116
#[derive(Default)]
155117
pub(crate) struct EngineState {
156118
pub(crate) sessions: HashMap<String, SessionState>,

0 commit comments

Comments
 (0)