Skip to content

fix(archiver): move L2 tips cache refresh out of write transactions#23110

Open
spalladino wants to merge 1 commit intomerge-train/spartanfrom
palla/move-l2-tips-cache-refresh-out-of-tx
Open

fix(archiver): move L2 tips cache refresh out of write transactions#23110
spalladino wants to merge 1 commit intomerge-train/spartanfrom
palla/move-l2-tips-cache-refresh-out-of-tx

Conversation

@spalladino
Copy link
Copy Markdown
Contributor

@spalladino spalladino commented May 8, 2026

Move every l2TipsCache.refresh() call in ArchiverDataStoreUpdater out of the surrounding db.transactionAsync callback and into the post-commit code path. This addresses two issues with the previous in-transaction refresh:

  1. Mid-tx visibility. L2TipsCache.refresh() reassigns its internal #tipsPromise synchronously, which was observable to other callers before LMDB committed. A concurrent reader calling getL2Tips() after the reassignment but before commit would pick up a promise loaded against in-flight tx state, while a sibling read on the store directly outside the tx still saw pre-commit state.
  2. No rollback on tx abort. If the LMDB transaction threw or aborted, the cache was already replaced with a promise loaded against in-flight writes that would never commit. Future readers saw a cache reflecting rolled-back state.

Refresh now runs after the writer transaction has fully committed, so it loads from the committed store and is never replaced when the writer aborts.

Notes

  • This intentionally leaves a narrow JS-side race window between LMDB commit returning and refresh() finishing its loadFromStore.

The ArchiverDataStoreUpdater used to call `l2TipsCache.refresh()` inside
the `db.transactionAsync()` callback for every writer path. Two issues:

1. Mid-tx visibility. `refresh()` reassigns its internal #tipsPromise
   synchronously, which was observable to other callers before LMDB had
   actually committed. A concurrent reader calling `getL2Tips()` after
   the reassignment but before commit picks up a promise loaded against
   the in-flight tx state, while a sibling read on
   `#proposedCheckpoints` directly outside the tx still sees pre-commit
   state — split-snapshot reads in the sequencer's `checkSync()`.

2. No rollback on tx abort. If the LMDB transaction threw or aborted,
   the cache had already been replaced with a promise loaded against
   in-flight writes that would never commit. Future readers would see a
   cache reflecting rolled-back state.

Refresh now runs after the writer transaction has fully committed, so
it loads from the committed store and is never replaced when the writer
aborts.

This does not close the JS-side race window completely — there is still
a small "tips lag store" window between LMDB commit returning and
`refresh()` finishing its `loadFromStore`. The sequencer's `checkSync()`
consistency checks (sequencer-client/src/sequencer/sequencer.ts ~L700)
already handle that residual window by detecting the mismatch and
returning undefined; those checks are intentionally left in place.
spalladino added a commit that referenced this pull request May 8, 2026
…in validators

The proposer-side fix from #23110 (parent checkpointOutHash splice under
pipelining) was inlined as a private method on `CheckpointProposalJob`. The
validator's block re-execution and checkpoint-proposal validation paths in
`proposal_handler.ts` compute the same `previousCheckpointOutHashes` list
through the same archiver-driven query, so they have the same off-by-one
window: if the parent cp lands on L1 between when the validator pulls and
when it re-derives, only the proposer would carry the spliced parent and
attestations would mismatch.

Extract the proposer's logic into a shared `getPreviousCheckpointOutHashes`
helper in `stdlib/src/checkpoint/`. The helper accepts the proposer's
already-loaded `proposedCheckpointData` directly, and falls back on
`L2BlockSource.getProposedCheckpointData(...)` for callers that don't have
it on hand (validator). Wire the helper into the proposer (replacing the
private method) and into both validator sites.

Add a few diagnostics that helped pinpoint this class of bug:
- `prover-node-publisher.ts`: when the L1-recomputed `RootRollupPublicInputs`
  vector mismatches the prover's, decode the differing indices into labels
  (`previousArchiveRoot`, `endArchiveRoot`, `outHash`,
  `checkpointHeaderHashes[i]`, `fees[i].recipient/value`, `constants.*`,
  `blobPublicInputs[*]`), fetch the L1 `CheckpointLog` for any mismatching
  `checkpointHeaderHashes[i]`, and emit a structured error log alongside
  the throw — much easier to triage than the previous opaque dump.
- `BlockRollupPublicInputs.toInspect()` and
  `CheckpointRollupPublicInputs.toInspect()` to keep per-stage orchestrator
  debug logs short.
- Per-stage debug logs in the orchestrator (block-root, block-merge,
  checkpoint-root) consume the new `toInspect()` outputs.
- Lightweight checkpoint builder logs `headerHash` and the size of
  `previousCheckpointOutHashes` at debug.
- Epoch proving job's per-checkpoint start log trimmed to the fields that
  are actually useful for cross-comparison.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant