Skip to content

Commit 0d739a5

Browse files
mswilkisonclaude
andcommitted
fix(tbtc/signer): re-evaluate signing gates at the Round2 share release
The coarse start_sign_round checks the lifecycle/quarantine/firewall gates and produces the signature share in one atomic call. The interactive path splits Open from Round2, so a kill switch recorded in that window - emergency rekey, finalization, quarantine, or a re-bound policy-checked tx - was not seen when the share actually left the engine at Round2 (review finding: gates checked only at Open are stale at release time). The gate logic is extracted into enforce_interactive_signing_gates and called at BOTH Open and Round2, so the two sites cannot drift and the share is gated at the moment it leaves the engine. The Round2 recheck runs before consumption (verify-before-consume): a gate rejection leaves the nonces live and the attempt recoverable, so a transient kill switch does not burn the attempt. The message rechecked is the one bound at Open (held in interactive state), and the signing package's message is independently verified equal to it. Test: open + round1 + build package, record an emergency rekey, then Round2 is rejected (emergency_rekey_required) WITHOUT consuming the attempt; clearing the rekey lets the same attempt complete. Full suite 262 passed / 1 ignored, clippy -D warnings clean, chaos suite green. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
1 parent 03fb6d6 commit 0d739a5

2 files changed

Lines changed: 179 additions & 48 deletions

File tree

pkg/tbtc/signer/src/engine/interactive.rs

Lines changed: 83 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -107,60 +107,24 @@ pub fn interactive_session_open(
107107

108108
ensure_session_insert_capacity(&guard.sessions, &request.session_id)?;
109109

110-
// Session lifecycle gates (frozen spec section 5: Open "checks
111-
// policy gates"). The interactive path must refuse in exactly the
112-
// states the coarse start_sign_round refuses: a session under an
113-
// emergency rekey, or one already terminally finalized. Without
114-
// these, InteractiveRound1/Round2 could emit a share where the
115-
// established path would not.
116-
if let Some(existing_session) = guard.sessions.get(&request.session_id) {
117-
if let Some(emergency_rekey_event) = existing_session.emergency_rekey_event.as_ref() {
118-
return Err(EngineError::LifecyclePolicyRejected {
119-
session_id: request.session_id.clone(),
120-
reason_code: "emergency_rekey_required".to_string(),
121-
detail: format!(
122-
"emergency rekey required for session [{}] since [{}]: {}",
123-
request.session_id,
124-
emergency_rekey_event.triggered_at_unix,
125-
emergency_rekey_event.reason
126-
),
127-
});
128-
}
129-
if existing_session.finalize_request_fingerprint.is_some() {
130-
return Err(EngineError::SessionFinalized {
131-
session_id: request.session_id.clone(),
132-
});
133-
}
134-
}
135-
136-
// Quarantine gate: this node is about to produce a share for
137-
// member_identifier, so an auto-quarantined member (absent a DAO
138-
// allowlist override) must not be able to sign through the
139-
// interactive path either.
110+
// Lifecycle + quarantine + signing-policy-firewall gates (frozen
111+
// spec section 5: Open "checks policy gates"). The SAME helper runs
112+
// again at Round2 (the share-release moment) so a policy change
113+
// recorded after Open - emergency rekey, finalization, quarantine,
114+
// or a re-bound policy-checked tx - cannot let a share escape.
140115
let auto_quarantine_config = load_auto_quarantine_config()?;
141-
enforce_not_quarantined_identifiers(
116+
let existing_session = guard.sessions.get(&request.session_id);
117+
enforce_interactive_signing_gates(
142118
&request.session_id,
143-
&[request.member_identifier],
119+
request.member_identifier,
120+
&request.message_hex,
121+
existing_session.and_then(|session| session.emergency_rekey_event.as_ref()),
122+
existing_session.is_some_and(|session| session.finalize_request_fingerprint.is_some()),
123+
existing_session.and_then(|session| session.tx_result.as_ref()),
144124
&guard.quarantined_operator_identifiers,
145125
auto_quarantine_config.as_ref(),
146126
)?;
147127

148-
// Signing-policy firewall (frozen spec section 5: Open "checks
149-
// policy gates"). When the firewall is enabled, the message must be
150-
// bound to a prior policy-checked build_taproot_tx for this
151-
// session, exactly as the coarse start_sign_round path enforces it
152-
// - otherwise a caller holding a key package could open an
153-
// interactive session on a fresh session_id and sign an arbitrary
154-
// message. A session with no policy-checked tx fails closed here.
155-
enforce_signing_message_binding_to_policy_checked_build_tx(
156-
&request.session_id,
157-
&request.message_hex,
158-
guard
159-
.sessions
160-
.get(&request.session_id)
161-
.and_then(|session| session.tx_result.as_ref()),
162-
)?;
163-
164128
// Decide everything from a read-only view BEFORE inserting anything,
165129
// so the reject paths (consumed marker, conflict, capacity) never
166130
// leave an empty SessionState behind. Returns: whether the attempt
@@ -367,6 +331,11 @@ pub fn interactive_round2(
367331
.map_err(|_| EngineError::Internal("engine lock poisoned".to_string()))?;
368332
sweep_expired_interactive_state(&mut guard);
369333

334+
// Quarantine inputs must be read before the session is borrowed
335+
// mutably from the same guard below.
336+
let auto_quarantine_config = load_auto_quarantine_config()?;
337+
let quarantined_operator_identifiers = guard.quarantined_operator_identifiers.clone();
338+
370339
let session = guard.sessions.get_mut(&request.session_id).ok_or_else(|| {
371340
EngineError::SessionNotFound {
372341
session_id: request.session_id.clone(),
@@ -390,6 +359,31 @@ pub fn interactive_round2(
390359
&request.session_id,
391360
)?;
392361

362+
// Re-evaluate the signing gates at the share-release moment. The
363+
// gates checked at Open are stale here: a kill switch recorded
364+
// after Open (emergency rekey, finalization, quarantine, or a
365+
// re-bound policy-checked tx) must stop the share leaving the
366+
// engine. Read via immutable borrows of the live attempt before the
367+
// mutable consume/sign borrow below. Skipped when no matching live
368+
// attempt exists - there is no share to release in that case, and
369+
// interactive_state_for_attempt_mut produces the canonical error.
370+
if let Some(interactive) = session.interactive_signing.as_ref().filter(|interactive| {
371+
interactive.attempt_context.attempt_id == attempt_id
372+
&& interactive.member_identifier == request.member_identifier
373+
}) {
374+
let bound_message_hex = hex::encode(interactive.message_bytes.as_slice());
375+
enforce_interactive_signing_gates(
376+
&request.session_id,
377+
request.member_identifier,
378+
&bound_message_hex,
379+
session.emergency_rekey_event.as_ref(),
380+
session.finalize_request_fingerprint.is_some(),
381+
session.tx_result.as_ref(),
382+
&quarantined_operator_identifiers,
383+
auto_quarantine_config.as_ref(),
384+
)?;
385+
}
386+
393387
let interactive = interactive_state_for_attempt_mut(
394388
session,
395389
&request.session_id,
@@ -647,6 +641,47 @@ fn verify_round2_signing_package(
647641
Ok(())
648642
}
649643

644+
// The signing gates the interactive path enforces at BOTH Open and
645+
// the Round2 share-release moment, mirroring the coarse
646+
// start_sign_round: emergency-rekey and finalized lifecycle, quarantine
647+
// of this node's own member, and the signing-policy firewall binding of
648+
// the message to a policy-checked build_taproot_tx. Centralized in one
649+
// function so the two call sites cannot drift apart.
650+
#[allow(clippy::too_many_arguments)]
651+
fn enforce_interactive_signing_gates(
652+
session_id: &str,
653+
member_identifier: u16,
654+
message_hex: &str,
655+
emergency_rekey_event: Option<&EmergencyRekeyEvent>,
656+
session_finalized: bool,
657+
tx_result: Option<&TransactionResult>,
658+
quarantined_operator_identifiers: &HashSet<u16>,
659+
auto_quarantine_config: Option<&AutoQuarantineConfig>,
660+
) -> Result<(), EngineError> {
661+
if let Some(emergency_rekey_event) = emergency_rekey_event {
662+
return Err(EngineError::LifecyclePolicyRejected {
663+
session_id: session_id.to_string(),
664+
reason_code: "emergency_rekey_required".to_string(),
665+
detail: format!(
666+
"emergency rekey required for session [{}] since [{}]: {}",
667+
session_id, emergency_rekey_event.triggered_at_unix, emergency_rekey_event.reason
668+
),
669+
});
670+
}
671+
if session_finalized {
672+
return Err(EngineError::SessionFinalized {
673+
session_id: session_id.to_string(),
674+
});
675+
}
676+
enforce_not_quarantined_identifiers(
677+
session_id,
678+
&[member_identifier],
679+
quarantined_operator_identifiers,
680+
auto_quarantine_config,
681+
)?;
682+
enforce_signing_message_binding_to_policy_checked_build_tx(session_id, message_hex, tx_result)
683+
}
684+
650685
// Canonical key form for an attempt_id at the round entry points,
651686
// matching canonicalize_attempt_context_for_fingerprint (which
652687
// lowercases attempt_id). The wire accepts attempt_id case-

pkg/tbtc/signer/src/engine/tests.rs

Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12591,3 +12591,99 @@ fn interactive_open_rejected_for_quarantined_member_honors_dao_allowlist() {
1259112591

1259212592
outcome.expect("quarantine gate lifecycle");
1259312593
}
12594+
12595+
#[test]
12596+
fn interactive_round2_rechecks_gates_at_share_release() {
12597+
let _guard = lock_test_state();
12598+
reset_for_tests();
12599+
12600+
let key_packages = interactive_test_key_packages();
12601+
let key_group = "interactive-test-key-group";
12602+
let message = [0x19u8; 32];
12603+
let included = [1u16, 2];
12604+
12605+
// Open + Round1 normally (gates pass at Open), build the package,
12606+
// THEN record an emergency rekey before Round2. The share must not
12607+
// leave the engine: Round2 re-evaluates the gates at release time.
12608+
let session_id = "interactive-toctou-rekey";
12609+
let opened = open_interactive_for_test(
12610+
&key_packages,
12611+
session_id,
12612+
key_group,
12613+
&message,
12614+
&included,
12615+
1,
12616+
1,
12617+
2,
12618+
)
12619+
.expect("opens");
12620+
let round1 = interactive_round1(InteractiveRound1Request {
12621+
session_id: session_id.to_string(),
12622+
attempt_id: opened.attempt_id.clone(),
12623+
member_identifier: 1,
12624+
})
12625+
.expect("round 1");
12626+
let member2 = generate_nonces_and_commitments(GenerateNoncesAndCommitmentsRequest {
12627+
key_package_identifier: key_packages[&2].identifier.clone(),
12628+
key_package_hex: key_packages[&2].data_hex.clone(),
12629+
})
12630+
.expect("member 2 nonces");
12631+
let signing_package_hex = interactive_package_for_test(
12632+
&message,
12633+
vec![
12634+
NativeFrostCommitment {
12635+
identifier: key_packages[&1].identifier.clone(),
12636+
data_hex: round1.commitments_hex,
12637+
},
12638+
member2.commitment,
12639+
],
12640+
);
12641+
12642+
// Kill switch recorded AFTER Open/Round1.
12643+
{
12644+
let mut guard = state().expect("state").lock().expect("lock");
12645+
let session = guard.sessions.get_mut(session_id).expect("session exists");
12646+
session.emergency_rekey_event = Some(EmergencyRekeyEvent {
12647+
reason: "post-open rekey".to_string(),
12648+
triggered_at_unix: now_unix(),
12649+
});
12650+
}
12651+
12652+
let blocked = interactive_round2(InteractiveRound2Request {
12653+
session_id: session_id.to_string(),
12654+
attempt_id: opened.attempt_id.clone(),
12655+
member_identifier: 1,
12656+
signing_package_hex: signing_package_hex.clone(),
12657+
})
12658+
.expect_err("a post-open emergency rekey must block the Round2 share");
12659+
assert!(
12660+
matches!(blocked, EngineError::LifecyclePolicyRejected { ref reason_code, .. }
12661+
if reason_code == "emergency_rekey_required"),
12662+
"unexpected error: {blocked:?}"
12663+
);
12664+
12665+
// The block at release time must be fail-closed WITHOUT consuming
12666+
// the nonces: no marker was written (verify-before-consume applies
12667+
// to the gate recheck too), so clearing the kill switch lets the
12668+
// same attempt complete. This proves the recheck rejects before
12669+
// consumption rather than after.
12670+
{
12671+
let mut guard = state().expect("state").lock().expect("lock");
12672+
let session = guard.sessions.get_mut(session_id).expect("session exists");
12673+
assert!(
12674+
!session
12675+
.consumed_interactive_attempt_markers
12676+
.contains(&opened.attempt_id),
12677+
"a gate rejection must not consume the attempt"
12678+
);
12679+
session.emergency_rekey_event = None;
12680+
}
12681+
12682+
interactive_round2(InteractiveRound2Request {
12683+
session_id: session_id.to_string(),
12684+
attempt_id: opened.attempt_id,
12685+
member_identifier: 1,
12686+
signing_package_hex,
12687+
})
12688+
.expect("the same attempt completes once the kill switch clears");
12689+
}

0 commit comments

Comments
 (0)