Skip to content

feat(tbtc/signer): Phase 7.1 hardened interactive signing session#4051

Open
mswilkison wants to merge 7 commits into
extraction/frost-signer-mirror-2026-05-26from
feat/phase-7-1-interactive-session-2026-06-12
Open

feat(tbtc/signer): Phase 7.1 hardened interactive signing session#4051
mswilkison wants to merge 7 commits into
extraction/frost-signer-mirror-2026-05-26from
feat/phase-7-1-interactive-session-2026-06-12

Conversation

@mswilkison

Copy link
Copy Markdown
Contributor

What

Phase 7.1 of the frozen spec (#4049, docs/phase-7-interactive-session-spec-freeze.md §§4–5): the engine session layer for the interactive two-round signing path, with engine-held nonce custody as the defining property.

The custody contract (spec §4)

  • Round-1 nonces: OS randomness, in-memory only, bound to (session_id, attempt_id), zeroized on consumption/abort/expiry/replacement, never in any request, response, or persisted state.
  • Consumption-before-release: the durable per-attempt marker persists BEFORE the share leaves the engine. Persist failure → marker rolled back, nonces stay live, no share escaped (pinned by a persist-fault-injection test). Marker-without-share → attempt dead, fail closed.
  • Restart can never yield a second share under one nonce pair; the cloned-state nonce-reuse class is structurally eliminated. Pinned by a restart test: the marker survives reload and rejects the consumed attempt at every entry point while a fresh attempt proceeds.

The API (spec §5, strict-mode only)

Call Contract highlights
InteractiveSessionOpen Key package once per session (validated against member); idempotent by fingerprint; conflicting reopen fails closed; newer attempt implicitly aborts the prior live one; consumed attempts rejected
InteractiveRound1 Fresh nonces + commitments; idempotent until consumed
InteractiveRound2 All verification precedes consumption: message binding, subset ⊆ included set, exactly-t size, own membership, and check (f) — own-commitment byte-identity, defeating coordinator framing of honest members
InteractiveSessionAbort Idempotent; destroys nonces without a marker (never-consumed attempts may reopen with fresh nonces)

Live-state bounds per the spec: fail-closed capacity cap (TBTC_SIGNER_MAX_LIVE_INTERACTIVE_SESSIONS, default 64) + lazy TTL sweep with abort semantics (TBTC_SIGNER_INTERACTIVE_SESSION_TTL_SECONDS, default 3600); both knobs ride the init-config surface. New structured error consumed_nonce_replay. Telemetry: call/success counters ×4, latency for the two cryptographic rounds.

The four frost_tbtc_interactive_* FFI exports ship additively per the established pattern (Go adoption is Phase 7.3; nothing breaks until the host calls them).

Verification

  • 10 engine tests: e2e round trip with one member through the session API and one through the stateless primitive, aggregating to a verified BIP-340 signature (custody changed, cryptography didn't); framing-attack rejection then honest-package acceptance (verify-before-consume); package-shape rejections (outside-set, message mismatch, oversized, self-missing); round-1 idempotency → consumed replay; restart-marker durability; persist-fault rollback; open lifecycle (idempotent/conflict/replacement); abort; TTL expiry; capacity fail-closed.
  • 1 FFI dispatch smoke test across all four exports.
  • Full suite 255 passed / 1 ignored (baseline 244+1 plus the 11 new), clippy -D warnings clean on all targets incl. the bench shape, Phase 5 chaos suite green, persisted-state schema additive (existing state files load unchanged).

Scope boundary

InteractiveAggregate + package-envelope evidence + cross-language vectors are Phase 7.2 by the frozen phasing; Go orchestration is 7.3. Out of scope per the freeze: DKG secret-package custody (named follow-up).

🤖 Generated with Claude Code

Implements sections 4-5 of the frozen Phase 7 spec: the interactive
two-round signing path with engine-held nonce custody. Round-1 nonces
are generated from OS randomness, live only in in-memory session state
bound to (session_id, attempt_id), zeroize on consumption, abort,
expiry, and replacement, and never appear in a request, response, or
persisted state. The only durable artifact is the per-attempt
consumption marker, persisted BEFORE the signature share leaves the
engine; a persist failure rolls the marker back with the nonces left
live, and a marker without a released share just kills the attempt -
fail closed. A restart can therefore never produce a second share
under one nonce pair, and the cloned-state nonce-reuse class is
structurally gone.

Entry points (strict-mode attempt contexts only, no legacy fallback):
InteractiveSessionOpen (key package supplied once per session,
validated against the member; idempotent by request fingerprint;
conflicting reopen fails closed; a newer attempt implicitly aborts the
prior live one), InteractiveRound1 (idempotent until consumed),
InteractiveRound2 (verification precedes consumption: message binding,
subset-of-included, exactly-threshold size, own membership, and the
own-commitment byte-identity check (f) that defeats coordinator
framing of honest members), InteractiveSessionAbort (idempotent;
destroys nonces without a marker, so a never-consumed attempt may
reopen with fresh nonces). Live sessions are capacity-capped
(fail-closed at TBTC_SIGNER_MAX_LIVE_INTERACTIVE_SESSIONS, default 64)
and TTL-swept lazily with abort semantics
(TBTC_SIGNER_INTERACTIVE_SESSION_TTL_SECONDS, default 3600); both
knobs ride the init-config surface. New structured error code
consumed_nonce_replay; call/success counters for all four operations
and latency tracking for the two cryptographic rounds.

The four FFI exports (frost_tbtc_interactive_*) ship additively per
the established pattern - the Go host adopts them in Phase 7.3.

Tests: 10 engine tests pin the e2e round trip (one member through the
session API, one through the stateless primitive, aggregating to a
verified BIP-340 signature), the framing-attack rejection and
verify-before-consume recovery, package-shape rejections, replay and
restart-marker durability, persist-fault marker rollback, open
lifecycle, abort, TTL expiry, and capacity; one lib test pins FFI
dispatch for all four exports. Full suite 255 passed / 1 ignored,
clippy -D warnings clean across all targets including the
bench-restart-hook shape, Phase 5 chaos suite green.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 29501211-0f46-4dc6-bc6a-d6286a3cb89a

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/phase-7-1-interactive-session-2026-06-12

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

mswilkison and others added 6 commits June 12, 2026 19:40
…ject

Two resource issues found reviewing the 7.1 session layer, both of my
own making:

1. Round2 took the nonces but left interactive_signing = Some with the
   key package and message still resident, so a completed session held
   its live-session capacity slot AND kept key material in memory until
   the TTL sweep. A node doing many sequential signings could exhaust
   the cap of 64 with completed-but-unswept sessions and fail-close new
   opens. Round2 now clears interactive_signing once the attempt is
   terminal (success or post-marker share failure); the durable marker
   carries all further replay protection.

2. interactive_session_open inserted an empty SessionState via
   entry().or_default() BEFORE the consumed-marker / conflict /
   capacity reject checks, so a flood of rejected opens for distinct
   session_ids accumulated empty sessions against the global 1024
   session cap, which could then starve DKG. Open now decides from a
   read-only view and only inserts once committed to creating the
   attempt.

Tests extended: the e2e test asserts the live state is freed (marker
retained) after Round2; the capacity test asserts a rejected open
leaves no empty session. Full suite 255 passed / 1 ignored, clippy
-D warnings clean, chaos suite green.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The four frost_tbtc_interactive_* symbols were exported from Rust but
absent from the hand-maintained public header, so a cgo consumer could
not compile against the 7.1 ABI without hand-declaring them - blocking
the Phase 7.3 Go adoption path (review finding). Adds the declarations
with a header comment drawing the custody contrast against the
stateless nonce block above: here secret nonces never cross the
boundary in either direction. Verified: header symbols now exactly
match the Rust exports, header parses as valid C.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Three findings on the 7.1 interactive path, all mine:

P1 (security) - InteractiveSessionOpen bypassed the signing-policy
firewall. The coarse start_sign_round binds the signed message to a
prior policy-checked build_taproot_tx
(enforce_signing_message_binding_to_policy_checked_build_tx); the
interactive open never called it, so with the firewall enabled a
caller holding a key package could open a fresh session and sign an
arbitrary message - a direct violation of frozen spec section 5 ("Open
checks policy gates"). Open now enforces the same binding before
creating any state; a session with no policy-checked tx fails closed.

P2 (replay) - validate_attempt_context accepts the attempt_id hash
field case-insensitively (eq_ignore_ascii_case), but the marker
registry and live-state lookups keyed on the raw string. A consumed
attempt retried with different hex casing missed the marker and could
be signed again. Open now canonicalizes the whole attempt context via
canonical_attempt_context (matching the coarse path), and the round
entry points canonicalize the incoming attempt_id, so all keying is on
the lowercase canonical form.

P3 (TTL) - InteractiveSessionAbort was the only entry point that did
not sweep expired interactive state, so an abort for an unrelated
session left other sessions' expired nonces in memory past the TTL.
Abort now sweeps like every other locked entry point.

Tests: firewall reject-without-build-tx (the security assertion) +
bound-message accept/mismatch-reject; consumed-marker case
insensitivity across Round2 and reopen; abort sweeps an unrelated
session's expired state. Full suite 259 passed / 1 ignored, clippy -D
warnings clean, chaos suite green.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…eractive open

The interactive path accepted an open after only the signing-policy
firewall check, so on a session start_sign_round would refuse it could
still emit a share - bypassing the established lifecycle/quarantine
gates (review finding). InteractiveSessionOpen now enforces, before
installing any interactive state, the same gates the coarse path does:

- emergency_rekey_event on an existing session -> LifecyclePolicyRejected
  (emergency_rekey_required),
- a terminally finalized session -> SessionFinalized,
- an auto-quarantined member_identifier (absent a DAO allowlist
  override) -> QuarantinePolicyRejected, reusing
  enforce_not_quarantined_identifiers so the allowlist override is
  honored identically.

The quarantine check targets this node's own member_identifier - the
member it is about to produce a share for - rather than the whole
included set, since under t-of-included a quarantined included member
simply will not be among the responsive subset.

Tests: an emergency-rekey session and a finalized session both refuse
interactive open; a quarantined member is rejected and a DAO allowlist
override restores signing. Full suite 261 passed / 1 ignored, clippy
-D warnings clean, chaos suite green.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The coarse start_sign_round checks the lifecycle/quarantine/firewall
gates and produces the signature share in one atomic call. The
interactive path splits Open from Round2, so a kill switch recorded in
that window - emergency rekey, finalization, quarantine, or a re-bound
policy-checked tx - was not seen when the share actually left the
engine at Round2 (review finding: gates checked only at Open are stale
at release time).

The gate logic is extracted into enforce_interactive_signing_gates and
called at BOTH Open and Round2, so the two sites cannot drift and the
share is gated at the moment it leaves the engine. The Round2 recheck
runs before consumption (verify-before-consume): a gate rejection
leaves the nonces live and the attempt recoverable, so a transient
kill switch does not burn the attempt. The message rechecked is the
one bound at Open (held in interactive state), and the signing
package's message is independently verified equal to it.

Test: open + round1 + build package, record an emergency rekey, then
Round2 is rejected (emergency_rekey_required) WITHOUT consuming the
attempt; clearing the rekey lets the same attempt complete. Full suite
262 passed / 1 ignored, clippy -D warnings clean, chaos suite green.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…eshold

Two findings on the 7.1 path:

P2 (liveness/DoS) - an interactive open that later expired or was
aborted before Round2 left an otherwise-empty SessionState in the
registry. Since open inserts new session IDs and
ensure_session_insert_capacity counts every map entry, a caller could
churn unique sessions until TBTC_SIGNER_MAX_SESSIONS filled, after
which DKG / build_tx / new interactive sessions were rejected until
restart. The TTL sweep and abort now drop a session that holds nothing
durable once its live attempt is cleared, via a new
SessionState::is_disposable that checks EVERY field so a session still
carrying consumed markers or DKG material is never removed.

P2 (verify-before-consume) - Open accepted a threshold below the key
package's min_signers; Round2 would then accept a too-small signing
package, persist the consumed marker, and only then have
frost::round2::sign fail on the commitment count - burning the nonce
for a validation error. Open now rejects threshold != key package
min_signers before storing the session.

Tests: open-then-abort churn under a 2-session cap stays bounded (no
accumulation); the abort-sweep test now asserts the empty session is
dropped, not just cleared; threshold-below-min_signers is rejected and
the matching threshold opens. Full suite 264 passed / 1 ignored,
clippy -D warnings clean, chaos suite green.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant