Skip to content

Predictable ledger state snapshots#6526

Draft
geo2a wants to merge 4 commits intomasterfrom
geo2a/predictable-snapshots
Draft

Predictable ledger state snapshots#6526
geo2a wants to merge 4 commits intomasterfrom
geo2a/predictable-snapshots

Conversation

@geo2a
Copy link
Copy Markdown
Contributor

@geo2a geo2a commented Apr 13, 2026

Description

This PR brings in the Consensus feature of predictable ledger state snapshots:

  • snapshots will be taken by all nodes at the same deterministic slots numbers, rather then depending on a node's start time.
  • to avoid the thundering herd effect, when all nodes take the snapshot at the same time and stop the network, every node will introduce a randomised time delay before taking a snapshot.

Changes to cardano-node configuration

The LedgerDB section of the config.yaml file is re-worked to have the following parameters:

LedgerDB:
  # remains as-is
  Backend: V2InMemory
  Snapshots:

    # start taking the snaphots at slot 172800, after Byron
    SlotOffset: 172800

    # take snapshots every 432000 slots, at the end of every Shelley epoch
    SnapshotInterval: 432000

    # A minimum duration between snapshots, in seconds (used to avoid excessive snapshots while syncing).
    RateLimit: 0 # default is 10 minutes

    # randomised snapshot delay range, in seconds.
    # Both Min and Max need to be specified, otherwise the default delay of (5min, 10min) will be used
    MinDelay: 60
    MaxDelay: 120

Note that the snapshot-related options are now grouped under "LedgerDB->Snapshot". The legacy format is still supported, but it has slightly changed it's meaning: the SnapshotInterval parameter is interpreted as slots, rather than seconds.

New tracing events

  • SnapshotRequestDelayed snapshotRequestTime delayBeforeSnapshotting slots --- traces the fact that a snapshot was requested for slots, but the request will be executed after delayBeforeSnapshotting.
  • SnapshotRequestCompleted signifies the completion of a delayed snapshot request.

Manual Testing

This feature is a little tricky to test automatically, and I have not found any end-to-end tests for the ledger state snapshot functionality. I've done some manual testing by analysing the logs. This process could be automated using cardano-testnet, but I'm afraid that the test could be flaky and very verbose.

To test the feature, I've ran started a sync with mainnet and used Claude Code to grep the logs and construct a table that verifies that the snapshots are indeed taken at the announces slots after the expected delay:

# SnapshotRequestDelayed Scheduled Slots TookSnapshot Taken Slot Reported Delay (s) Actual Delta (s)
1 2026-04-10 12:21:49.5021 4492799 2026-04-10 12:23:02.5057 4492799 73 73.0036
2 2026-04-10 12:23:03.3550 4924780 2026-04-10 12:24:07.3566 4924780 64 64.0016
3 2026-04-10 12:24:08.6039 5356780, 5788780 2026-04-10 12:25:56.6049 5356780 108 108.0010
2026-04-10 12:25:58.6811 5788780 110.0772
4 2026-04-10 12:26:00.0986 6220777, 6652775, 7084774 2026-04-10 12:27:13.1009 6220777 73 73.0023
2026-04-10 12:27:15.2028 6652775 75.1042
2026-04-10 12:27:16.2926 7084774 76.1940
5 2026-04-10 12:27:17.9178 7516773, 7948772, 8380772 2026-04-10 12:28:40.9199 7516773 83 83.0021
2026-04-10 12:28:42.1981 7948772 84.2803
2026-04-10 12:28:43.6356 8380772 85.7178

The relevant fragment of the log file is attached:

Details [2026-04-10 12:21:49.5021Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.SnapshotRequestDelayed](Info,30) Scheduling to take ledger state snapshots at slots [SlotNo 4492799] , with a randomised delay of 73s [2026-04-10 12:23:02.5057Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.TookSnapshot](Info,30) Taking ledger snapshot DiskSnapshot {dsNumber = 4492799, dsSuffix = Nothing} at f8084c61b6a238acec985b59310b6ecec49c0ab8352249afd7268da5cff2a457 at slot 4492799 [2026-04-10 12:23:03.2456Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.TookSnapshot](Info,30) Took ledger snapshot DiskSnapshot {dsNumber = 4492799, dsSuffix = Nothing} at f8084c61b6a238acec985b59310b6ecec49c0ab8352249afd7268da5cff2a457 at slot 4492799 , duration: 0.739851377s [2026-04-10 12:23:03.2530Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.SnapshotRequestCompleted](Info,30) Completed taking a ledger state snapshot [2026-04-10 12:23:03.3550Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.SnapshotRequestDelayed](Info,30) Scheduling to take ledger state snapshots at slots [SlotNo 4924780] , with a randomised delay of 64s [2026-04-10 12:24:07.3566Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.TookSnapshot](Info,30) Taking ledger snapshot DiskSnapshot {dsNumber = 4924780, dsSuffix = Nothing} at a0805ae8e52318f0e499be7f85d3f1d5c7dddeacdca0dab9e9d9a8ae6c49a22c at slot 4924780 [2026-04-10 12:24:08.2799Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.TookSnapshot](Info,30) Took ledger snapshot DiskSnapshot {dsNumber = 4924780, dsSuffix = Nothing} at a0805ae8e52318f0e499be7f85d3f1d5c7dddeacdca0dab9e9d9a8ae6c49a22c at slot 4924780 , duration: 0.923355707s [2026-04-10 12:24:08.2872Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.SnapshotRequestCompleted](Info,30) Completed taking a ledger state snapshot [2026-04-10 12:24:08.6039Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.SnapshotRequestDelayed](Info,30) Scheduling to take ledger state snapshots at slots [SlotNo 5356780,SlotNo 5788780] , with a randomised delay of 108s [2026-04-10 12:25:56.6049Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.TookSnapshot](Info,30) Taking ledger snapshot DiskSnapshot {dsNumber = 5356780, dsSuffix = Nothing} at 4ddf277b3aff32931843da9f7900f5ef2fffed15b124891c485be4b3a06fca72 at slot 5356780 [2026-04-10 12:25:58.6810Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.TookSnapshot](Info,30) Took ledger snapshot DiskSnapshot {dsNumber = 5356780, dsSuffix = Nothing} at 4ddf277b3aff32931843da9f7900f5ef2fffed15b124891c485be4b3a06fca72 at slot 5356780 , duration: 2.076082693s [2026-04-10 12:25:58.6811Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.TookSnapshot](Info,30) Taking ledger snapshot DiskSnapshot {dsNumber = 5788780, dsSuffix = Nothing} at 9e6fc811d9b09f7c8c6d7a23dc8b3360a9c4a3930ba640ce107e944d5e2750e2 at slot 5788780 [2026-04-10 12:25:59.7608Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.TookSnapshot](Info,30) Took ledger snapshot DiskSnapshot {dsNumber = 5788780, dsSuffix = Nothing} at 9e6fc811d9b09f7c8c6d7a23dc8b3360a9c4a3930ba640ce107e944d5e2750e2 at slot 5788780 , duration: 1.079557458s [2026-04-10 12:25:59.7861Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.SnapshotRequestCompleted](Info,30) Completed taking a ledger state snapshot [2026-04-10 12:26:00.0986Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.SnapshotRequestDelayed](Info,30) Scheduling to take ledger state snapshots at slots [SlotNo 6220777,SlotNo 6652775,SlotNo 7084774] , with a randomised delay of 73s [2026-04-10 12:27:13.1009Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.TookSnapshot](Info,30) Taking ledger snapshot DiskSnapshot {dsNumber = 6220777, dsSuffix = Nothing} at bc98eda36819d00f424e63aeb4eb43950bd5eacf37f2c35a2b8f807aa68cd895 at slot 6220777 [2026-04-10 12:27:15.2022Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.TookSnapshot](Info,30) Took ledger snapshot DiskSnapshot {dsNumber = 6220777, dsSuffix = Nothing} at bc98eda36819d00f424e63aeb4eb43950bd5eacf37f2c35a2b8f807aa68cd895 at slot 6220777 , duration: 2.101244366s [2026-04-10 12:27:15.2028Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.TookSnapshot](Info,30) Taking ledger snapshot DiskSnapshot {dsNumber = 6652775, dsSuffix = Nothing} at 6707ef3c2e885c25d5081a1aa0dd03e81492e21c5955208f23eee3d92ae28f9f at slot 6652775 [2026-04-10 12:27:16.2922Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.TookSnapshot](Info,30) Took ledger snapshot DiskSnapshot {dsNumber = 6652775, dsSuffix = Nothing} at 6707ef3c2e885c25d5081a1aa0dd03e81492e21c5955208f23eee3d92ae28f9f at slot 6652775 , duration: 1.089450966s [2026-04-10 12:27:16.2926Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.TookSnapshot](Info,30) Taking ledger snapshot DiskSnapshot {dsNumber = 7084774, dsSuffix = Nothing} at 057c01d0a0f0b6c554589ac5baf6b72b63cd22b2d668ee86f7421199eab1c46c at slot 7084774 [2026-04-10 12:27:17.4298Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.TookSnapshot](Info,30) Took ledger snapshot DiskSnapshot {dsNumber = 7084774, dsSuffix = Nothing} at 057c01d0a0f0b6c554589ac5baf6b72b63cd22b2d668ee86f7421199eab1c46c at slot 7084774 , duration: 1.137220622s [2026-04-10 12:27:17.4573Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.SnapshotRequestCompleted](Info,30) Completed taking a ledger state snapshot [2026-04-10 12:27:17.9178Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.SnapshotRequestDelayed](Info,30) Scheduling to take ledger state snapshots at slots [SlotNo 7516773,SlotNo 7948772,SlotNo 8380772] , with a randomised delay of 83s [2026-04-10 12:28:40.9199Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.TookSnapshot](Info,30) Taking ledger snapshot DiskSnapshot {dsNumber = 7516773, dsSuffix = Nothing} at cd0dad9ea278cc82d9c3dbefa1769ddbfb9358dc800e4a70a4cc1e671489c493 at slot 7516773 [2026-04-10 12:28:42.1979Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.TookSnapshot](Info,30) Took ledger snapshot DiskSnapshot {dsNumber = 7516773, dsSuffix = Nothing} at cd0dad9ea278cc82d9c3dbefa1769ddbfb9358dc800e4a70a4cc1e671489c493 at slot 7516773 , duration: 1.277999016s [2026-04-10 12:28:42.1981Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.TookSnapshot](Info,30) Taking ledger snapshot DiskSnapshot {dsNumber = 7948772, dsSuffix = Nothing} at cff7c23b9f62ad48a2436b2270a10bb9286999a721f1da3bde35f6f1579d1464 at slot 7948772 [2026-04-10 12:28:43.6354Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.TookSnapshot](Info,30) Took ledger snapshot DiskSnapshot {dsNumber = 7948772, dsSuffix = Nothing} at cff7c23b9f62ad48a2436b2270a10bb9286999a721f1da3bde35f6f1579d1464 at slot 7948772 , duration: 1.43729202s [2026-04-10 12:28:43.6356Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.TookSnapshot](Info,30) Taking ledger snapshot DiskSnapshot {dsNumber = 8380772, dsSuffix = Nothing} at 47fef957a7152647dacbcff13242b3ef3c416930e23cd55722c36c1fd126c721 at slot 8380772 [2026-04-10 12:28:45.5275Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.TookSnapshot](Info,30) Took ledger snapshot DiskSnapshot {dsNumber = 8380772, dsSuffix = Nothing} at 47fef957a7152647dacbcff13242b3ef3c416930e23cd55722c36c1fd126c721 at slot 8380772 , duration: 1.891909794s [2026-04-10 12:28:45.5622Z][geo2a-workstation:ChainDB.LedgerEvent.Snapshot.SnapshotRequestCompleted](Info,30) Completed taking a ledger state snapshot
# Checklist
  • Commit sequence broadly makes sense and commits have useful messages
  • New tests are added if needed and existing tests are updated. These may include:
    • golden tests
    • property tests
    • roundtrip tests
    • integration tests
      See Running tests for more details
  • Any changes are noted in the CHANGELOG.md for affected package
  • The version bounds in .cabal files are updated
  • CI passes. See note on CI. The following CI checks are required:
    • Code is linted with hlint. See .github/workflows/check-hlint.yml to get the hlint version
    • Code is formatted with stylish-haskell. See .github/workflows/stylish-haskell.yml to get the stylish-haskell version
    • Code builds on Linux, MacOS and Windows for ghc-9.6 and ghc-9.12
  • Self-reviewed the diff

Note on CI

If your PR is from a fork, the necessary CI jobs won't trigger automatically for security reasons.
You will need to get someone with write privileges. Please contact IOG node developers to do this
for you.

@geo2a geo2a self-assigned this Apr 13, 2026
@geo2a geo2a changed the title Predictable snapshots ledger state snapshots Predictable ledger state snapshots Apr 14, 2026
@geo2a geo2a force-pushed the geo2a/predictable-snapshots branch 6 times, most recently from c9b2597 to 0062d94 Compare April 15, 2026 13:09
@geo2a geo2a force-pushed the geo2a/predictable-snapshots branch from 0062d94 to 0379807 Compare April 22, 2026 07:40
@geo2a geo2a moved this to 👀 In review in Consensus Team Backlog Apr 28, 2026
@geo2a geo2a force-pushed the geo2a/predictable-snapshots branch from 0379807 to db260ed Compare April 28, 2026 12:48
@geo2a geo2a force-pushed the geo2a/predictable-snapshots branch from db260ed to 8e9fbae Compare April 28, 2026 15:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: 👀 In review

Development

Successfully merging this pull request may close these issues.

2 participants