feat(tbtc/signer): Phase 7.1 hardened interactive signing session#4051
Open
mswilkison wants to merge 7 commits into
Open
Conversation
Implements sections 4-5 of the frozen Phase 7 spec: the interactive two-round signing path with engine-held nonce custody. Round-1 nonces are generated from OS randomness, live only in in-memory session state bound to (session_id, attempt_id), zeroize on consumption, abort, expiry, and replacement, and never appear in a request, response, or persisted state. The only durable artifact is the per-attempt consumption marker, persisted BEFORE the signature share leaves the engine; a persist failure rolls the marker back with the nonces left live, and a marker without a released share just kills the attempt - fail closed. A restart can therefore never produce a second share under one nonce pair, and the cloned-state nonce-reuse class is structurally gone. Entry points (strict-mode attempt contexts only, no legacy fallback): InteractiveSessionOpen (key package supplied once per session, validated against the member; idempotent by request fingerprint; conflicting reopen fails closed; a newer attempt implicitly aborts the prior live one), InteractiveRound1 (idempotent until consumed), InteractiveRound2 (verification precedes consumption: message binding, subset-of-included, exactly-threshold size, own membership, and the own-commitment byte-identity check (f) that defeats coordinator framing of honest members), InteractiveSessionAbort (idempotent; destroys nonces without a marker, so a never-consumed attempt may reopen with fresh nonces). Live sessions are capacity-capped (fail-closed at TBTC_SIGNER_MAX_LIVE_INTERACTIVE_SESSIONS, default 64) and TTL-swept lazily with abort semantics (TBTC_SIGNER_INTERACTIVE_SESSION_TTL_SECONDS, default 3600); both knobs ride the init-config surface. New structured error code consumed_nonce_replay; call/success counters for all four operations and latency tracking for the two cryptographic rounds. The four FFI exports (frost_tbtc_interactive_*) ship additively per the established pattern - the Go host adopts them in Phase 7.3. Tests: 10 engine tests pin the e2e round trip (one member through the session API, one through the stateless primitive, aggregating to a verified BIP-340 signature), the framing-attack rejection and verify-before-consume recovery, package-shape rejections, replay and restart-marker durability, persist-fault marker rollback, open lifecycle, abort, TTL expiry, and capacity; one lib test pins FFI dispatch for all four exports. Full suite 255 passed / 1 ignored, clippy -D warnings clean across all targets including the bench-restart-hook shape, Phase 5 chaos suite green. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…ject Two resource issues found reviewing the 7.1 session layer, both of my own making: 1. Round2 took the nonces but left interactive_signing = Some with the key package and message still resident, so a completed session held its live-session capacity slot AND kept key material in memory until the TTL sweep. A node doing many sequential signings could exhaust the cap of 64 with completed-but-unswept sessions and fail-close new opens. Round2 now clears interactive_signing once the attempt is terminal (success or post-marker share failure); the durable marker carries all further replay protection. 2. interactive_session_open inserted an empty SessionState via entry().or_default() BEFORE the consumed-marker / conflict / capacity reject checks, so a flood of rejected opens for distinct session_ids accumulated empty sessions against the global 1024 session cap, which could then starve DKG. Open now decides from a read-only view and only inserts once committed to creating the attempt. Tests extended: the e2e test asserts the live state is freed (marker retained) after Round2; the capacity test asserts a rejected open leaves no empty session. Full suite 255 passed / 1 ignored, clippy -D warnings clean, chaos suite green. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The four frost_tbtc_interactive_* symbols were exported from Rust but absent from the hand-maintained public header, so a cgo consumer could not compile against the 7.1 ABI without hand-declaring them - blocking the Phase 7.3 Go adoption path (review finding). Adds the declarations with a header comment drawing the custody contrast against the stateless nonce block above: here secret nonces never cross the boundary in either direction. Verified: header symbols now exactly match the Rust exports, header parses as valid C. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Three findings on the 7.1 interactive path, all mine:
P1 (security) - InteractiveSessionOpen bypassed the signing-policy
firewall. The coarse start_sign_round binds the signed message to a
prior policy-checked build_taproot_tx
(enforce_signing_message_binding_to_policy_checked_build_tx); the
interactive open never called it, so with the firewall enabled a
caller holding a key package could open a fresh session and sign an
arbitrary message - a direct violation of frozen spec section 5 ("Open
checks policy gates"). Open now enforces the same binding before
creating any state; a session with no policy-checked tx fails closed.
P2 (replay) - validate_attempt_context accepts the attempt_id hash
field case-insensitively (eq_ignore_ascii_case), but the marker
registry and live-state lookups keyed on the raw string. A consumed
attempt retried with different hex casing missed the marker and could
be signed again. Open now canonicalizes the whole attempt context via
canonical_attempt_context (matching the coarse path), and the round
entry points canonicalize the incoming attempt_id, so all keying is on
the lowercase canonical form.
P3 (TTL) - InteractiveSessionAbort was the only entry point that did
not sweep expired interactive state, so an abort for an unrelated
session left other sessions' expired nonces in memory past the TTL.
Abort now sweeps like every other locked entry point.
Tests: firewall reject-without-build-tx (the security assertion) +
bound-message accept/mismatch-reject; consumed-marker case
insensitivity across Round2 and reopen; abort sweeps an unrelated
session's expired state. Full suite 259 passed / 1 ignored, clippy -D
warnings clean, chaos suite green.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…eractive open The interactive path accepted an open after only the signing-policy firewall check, so on a session start_sign_round would refuse it could still emit a share - bypassing the established lifecycle/quarantine gates (review finding). InteractiveSessionOpen now enforces, before installing any interactive state, the same gates the coarse path does: - emergency_rekey_event on an existing session -> LifecyclePolicyRejected (emergency_rekey_required), - a terminally finalized session -> SessionFinalized, - an auto-quarantined member_identifier (absent a DAO allowlist override) -> QuarantinePolicyRejected, reusing enforce_not_quarantined_identifiers so the allowlist override is honored identically. The quarantine check targets this node's own member_identifier - the member it is about to produce a share for - rather than the whole included set, since under t-of-included a quarantined included member simply will not be among the responsive subset. Tests: an emergency-rekey session and a finalized session both refuse interactive open; a quarantined member is rejected and a DAO allowlist override restores signing. Full suite 261 passed / 1 ignored, clippy -D warnings clean, chaos suite green. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The coarse start_sign_round checks the lifecycle/quarantine/firewall gates and produces the signature share in one atomic call. The interactive path splits Open from Round2, so a kill switch recorded in that window - emergency rekey, finalization, quarantine, or a re-bound policy-checked tx - was not seen when the share actually left the engine at Round2 (review finding: gates checked only at Open are stale at release time). The gate logic is extracted into enforce_interactive_signing_gates and called at BOTH Open and Round2, so the two sites cannot drift and the share is gated at the moment it leaves the engine. The Round2 recheck runs before consumption (verify-before-consume): a gate rejection leaves the nonces live and the attempt recoverable, so a transient kill switch does not burn the attempt. The message rechecked is the one bound at Open (held in interactive state), and the signing package's message is independently verified equal to it. Test: open + round1 + build package, record an emergency rekey, then Round2 is rejected (emergency_rekey_required) WITHOUT consuming the attempt; clearing the rekey lets the same attempt complete. Full suite 262 passed / 1 ignored, clippy -D warnings clean, chaos suite green. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…eshold Two findings on the 7.1 path: P2 (liveness/DoS) - an interactive open that later expired or was aborted before Round2 left an otherwise-empty SessionState in the registry. Since open inserts new session IDs and ensure_session_insert_capacity counts every map entry, a caller could churn unique sessions until TBTC_SIGNER_MAX_SESSIONS filled, after which DKG / build_tx / new interactive sessions were rejected until restart. The TTL sweep and abort now drop a session that holds nothing durable once its live attempt is cleared, via a new SessionState::is_disposable that checks EVERY field so a session still carrying consumed markers or DKG material is never removed. P2 (verify-before-consume) - Open accepted a threshold below the key package's min_signers; Round2 would then accept a too-small signing package, persist the consumed marker, and only then have frost::round2::sign fail on the commitment count - burning the nonce for a validation error. Open now rejects threshold != key package min_signers before storing the session. Tests: open-then-abort churn under a 2-session cap stays bounded (no accumulation); the abort-sweep test now asserts the empty session is dropped, not just cleared; threshold-below-min_signers is rejected and the matching threshold opens. Full suite 264 passed / 1 ignored, clippy -D warnings clean, chaos suite green. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Phase 7.1 of the frozen spec (#4049,
docs/phase-7-interactive-session-spec-freeze.md§§4–5): the engine session layer for the interactive two-round signing path, with engine-held nonce custody as the defining property.The custody contract (spec §4)
(session_id, attempt_id), zeroized on consumption/abort/expiry/replacement, never in any request, response, or persisted state.The API (spec §5, strict-mode only)
InteractiveSessionOpenInteractiveRound1InteractiveRound2tsize, own membership, and check (f) — own-commitment byte-identity, defeating coordinator framing of honest membersInteractiveSessionAbortLive-state bounds per the spec: fail-closed capacity cap (
TBTC_SIGNER_MAX_LIVE_INTERACTIVE_SESSIONS, default 64) + lazy TTL sweep with abort semantics (TBTC_SIGNER_INTERACTIVE_SESSION_TTL_SECONDS, default 3600); both knobs ride the init-config surface. New structured errorconsumed_nonce_replay. Telemetry: call/success counters ×4, latency for the two cryptographic rounds.The four
frost_tbtc_interactive_*FFI exports ship additively per the established pattern (Go adoption is Phase 7.3; nothing breaks until the host calls them).Verification
clippy -D warningsclean on all targets incl. the bench shape, Phase 5 chaos suite green, persisted-state schema additive (existing state files load unchanged).Scope boundary
InteractiveAggregate+ package-envelope evidence + cross-language vectors are Phase 7.2 by the frozen phasing; Go orchestration is 7.3. Out of scope per the freeze: DKG secret-package custody (named follow-up).🤖 Generated with Claude Code