From 294a3ccb1c275ea2cc57c3c5264ae0b64a66c7aa Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Toni=20Wahrst=C3=A4tter?= Date: Mon, 11 May 2026 10:04:48 +0200 Subject: [PATCH 1/5] caps/snap2.md: add snap v2 documentation --- README.md | 3 +- caps/snap.md | 7 +++ caps/snap2.md | 134 ++++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 143 insertions(+), 1 deletion(-) create mode 100644 caps/snap2.md diff --git a/README.md b/README.md index 77391d4f..ba1c4c5a 100644 --- a/README.md +++ b/README.md @@ -18,7 +18,7 @@ We have several specifications for low-level protocols: The repository also contains specifications of many RLPx-based application-level protocols: - [Ethereum Wire Protocol] (eth/68) -- [Ethereum Snapshot Protocol] (snap/1) +- [Ethereum Snapshot Protocol] (snap/1), [Snapshot Protocol v2] (snap/2) - [Light Ethereum Subprotocol] (les/4) - [Parity Light Protocol] (pip/1) - [Ethereum Witness Protocol] (wit/0) @@ -81,6 +81,7 @@ WireShark dissectors are available here: Date: Sun, 28 Jun 2026 11:12:15 +0200 Subject: [PATCH 2/5] caps/snap2.md: align with snap/1 style, fix BAL hash validation, drop unused link --- caps/snap2.md | 131 ++++++++++++++++++++++++++++++++++++-------------- 1 file changed, 96 insertions(+), 35 deletions(-) diff --git a/caps/snap2.md b/caps/snap2.md index e7c18c6f..5a919022 100644 --- a/caps/snap2.md +++ b/caps/snap2.md @@ -1,8 +1,14 @@ # Ethereum Snapshot Protocol Version 2 (SNAP/2) -This document specifies version 2 of the `snap` protocol. It is a delta over version 1; everything not redefined here is inherited unchanged from [snap.md][snap1]. That includes the overview, the satellite relationship with `eth`, the data format, the `GetAccountRange`/`AccountRange`/`GetStorageRanges`/`StorageRanges`/`GetByteCodes`/`ByteCodes` messages (0x00–0x05), and the general framing of snap sync. +This document specifies version 2 of the `snap` protocol. It is a delta over version 1; +everything not redefined here is inherited unchanged from [snap.md][snap1]. That includes +the overview, the satellite relationship with `eth`, the data format, the account-range, +storage-range and bytecode messages (0x00–0x05), and the general framing of snap sync. -snap/2 was introduced by [EIP-8189]. It replaces snap/1's trie-node healing mechanism with state-diff application using block-level access lists ([EIP-7928]). snap/2 is meaningful only for blocks after [EIP-7928] activation, when the header field `block-access-list-hash` is present; for pre-activation blocks the snap/1 mechanism continues to apply. +snap/2 was introduced by [EIP-8189]. It replaces snap/1's trie-node healing mechanism with +state-diff application using block-level access lists ([EIP-7928]). snap/2 is meaningful +only for blocks after [EIP-7928] activation, when the header field `block_access_list_hash` +is present; for pre-activation blocks the snap/1 mechanism continues to apply. ## Differences from snap/1 @@ -12,44 +18,71 @@ snap/2 was introduced by [EIP-8189]. It replaces snap/1's trie-node healing mech | Catch-up | Iterative trie-node discovery | Sequential application of verified BALs | | Pivot advancement during sync | Free retarget; healing reconciles afterwards | In-line BAL catch-up required before retarget | | Reorg past current pivot | Handled by trie healing | Re-fetch of diverged leaves, gated on orphaned-BAL availability | -| Required header field | none | `block-access-list-hash` ([EIP-7928]) | - -Messages 0x00–0x05 are unchanged; their definitions remain in [snap.md][snap1]. Messages 0x06 and 0x07 are removed and their IDs **must not** be reused. +| Required header field | none | `block_access_list_hash` ([EIP-7928]) | ## Synchronization algorithm -The high-level structure of snap sync (pivot selection, byte-bounded contiguous range download with Merkle-proven boundaries, the 128-block snapshot serving window) is unchanged; see [snap.md][snap1]. The change is the replacement of trie-node healing with BAL-based catch-up. Healing in snap/1 reacts to whatever inconsistencies the syncing node observes during trie reconstruction; snap/2's catch-up is upfront-deterministic: the set of blocks to apply is known from the header chain alone. +The high-level structure of snap sync (pivot selection, byte-bounded contiguous range +download with Merkle-proven boundaries, the 128-block snapshot serving window) is unchanged; +see [snap.md][snap1]. The change is the replacement of trie-node healing with BAL-based +catch-up. Healing in snap/1 reacts to whatever inconsistencies the syncing node observes +during trie reconstruction; snap/2's catch-up is upfront-deterministic: the set of blocks to +apply is known from the header chain alone. Concretely, the sync loop becomes: 1. Select a pivot `P` (typically `HEAD-64`). -2. Bulk-download flat state at `P` via `GetAccountRange`, `GetStorageRanges`, `GetByteCodes`. -3. As the chain advances from `P` to `P+K`, fetch BALs for `P+1..P+K` via `GetBlockAccessLists`, verify each against the `block-access-list-hash` of its header (`keccak256(rlp.encode(bal))`), and apply the resulting state diff to the partial flat state. `P+K` is then the target for any remaining range requests. +2. Bulk-download flat state at `P` via `GetAccountRange`, `GetStorageRanges`, and + `GetByteCodes`. +3. As the chain advances from `P` to `P+K`, fetch BALs for `P+1..P+K` via + `GetBlockAccessLists`, verify each against the `block_access_list_hash` of its header, and + apply the resulting state diff to the partial flat state. `P+K` is then the target for any + remaining range requests. 4. Repeat step 3 if the pivot advances again during catch-up. -5. Once the flat state is consistent with the latest pivot, reconstruct tries locally and verify the resulting root against the corresponding header. +5. Once the flat state is consistent with the latest pivot, reconstruct tries locally and + verify the resulting root against the corresponding header. There is no separate healing phase. ### Pivot advancement -In snap/1, when the pivot advances from `P` to `P+K` during state download, the syncing node retargets the new pivot and lets the healing phase reconcile the gap. snap/2 has no later healing pass, so the advance itself is the catch-up: BALs for `P+1..P+K` **must** be fetched, verified, and applied to the partially-synced flat state **before** any further range request is issued against the new pivot. Range data downloaded prior to the advance is only consistent with the new pivot once those BALs have been applied. +In snap/1, when the pivot advances from `P` to `P+K` during state download, the syncing node +retargets the new pivot and lets the healing phase reconcile the gap. snap/2 has no later +healing pass, so the advance itself is the catch-up: BALs for `P+1..P+K` **must** be fetched, +verified, and applied to the partially-synced flat state **before** any further range request +is issued against the new pivot. Range data downloaded prior to the advance is only consistent +with the new pivot once those BALs have been applied. ### Reorg past the current pivot -If the canonical chain reorgs past the current pivot `P`, the bulk-downloaded state may contain leaves written by the now-orphaned fork. Let `W` be the common ancestor of the old and new canonical chains. Recovery: - -1. Fetch BALs for `W+1..P` on the orphaned fork via `GetBlockAccessLists`. Requests are keyed by block hash, so orphaned BALs are addressable identically to canonical ones, provided peers have retained them (see [Retention](#retention)). -2. From the orphaned-fork and new-fork BALs, compute the set of accounts and storage slots mutated on the orphaned fork but **not** on the new canonical fork. Entries mutated on both forks will be overwritten in step 4 and need no special handling. -3. Re-fetch the diverged entries via `GetAccountRange` and `GetStorageRanges` against a fresh pivot `P'` on the new canonical chain. +If the canonical chain reorgs past the current pivot `P`, the bulk-downloaded state may +contain leaves written by the now-orphaned fork. Let `W` be the common ancestor of the old +and new canonical chains. Recovery: + +1. Fetch BALs for `W+1..P` on the orphaned fork via `GetBlockAccessLists`. Orphaned BALs are + addressable by hash like canonical ones, provided peers have retained them (see + [Retention](#retention)). +2. From the orphaned-fork and new-fork BALs, compute the set of accounts and storage slots + mutated on the orphaned fork but **not** on the new canonical fork. Entries mutated on both + forks are overwritten in step 4 and need no special handling. +3. Re-fetch the diverged entries via `GetAccountRange` and `GetStorageRanges` against a fresh + pivot `P'` on the new canonical chain. 4. Apply BALs for `W+1..P'` on the new canonical fork. -If the orphaned BALs are not retained by any peer, the syncing node **must** discard partial state and restart synchronization. With the conventional pivot at `HEAD-64`, this scenario requires a reorg deeper than 64 blocks, which has not occurred on mainnet and is further bounded by PoS finality. +If the orphaned BALs are not retained by any peer, the syncing node **must** discard partial +state and restart synchronization. With the conventional pivot at `HEAD-64`, this requires a +reorg deeper than 64 blocks, which has not occurred on mainnet and is further bounded by PoS +finality. ## Retention -Peers serving snap/2 retain BALs for both canonical and non-canonical blocks within the retention window defined in [EIP-7928] (at least the weak subjectivity period). Retention of non-canonical BALs is what enables the reorg-recovery procedure above; without it, a deep reorg forces a sync restart. +Peers serving snap/2 retain BALs within the retention window defined in [EIP-7928] (at least +the weak subjectivity period), and **should** also retain non-canonical (orphaned) BALs. +Retention of non-canonical BALs is what enables the reorg-recovery procedure above; without +it, a deep reorg forces a sync restart. -The 128-block snapshot retention for the data served by `GetAccountRange` / `GetStorageRanges` is unchanged from snap/1. +The 128-block snapshot retention for the data served by `GetAccountRange` / `GetStorageRanges` +is unchanged from snap/1. ## Protocol Messages @@ -72,7 +105,9 @@ These message IDs are reserved and **must not** be reused. `[reqID: P, hashes: [hash1: B_32, hash2: B_32, ...], bytes: P]` -Requests block access lists by block hash. The intended purpose of this message is to obtain the per-block state-diff data needed to catch up the flat state during pivot advancement and to recover from reorgs past the current pivot. +Requests block access lists by block hash. The intended purpose of this message is to obtain +the per-block state-diff data needed to catch up the flat state during pivot advancement and +to recover from reorgs past the current pivot. - `reqID`: Request ID to match up responses with - `hashes`: Block hashes of the BALs to retrieve @@ -81,39 +116,62 @@ Requests block access lists by block hash. The intended purpose of this message Notes: - Nodes **must** always respond to the query. -- Requests are keyed by block hash, so canonical and non-canonical (orphaned) BALs are served through the same message. Serving nodes **should** retain non-canonical BALs within the retention window defined in [EIP-7928] so that syncing nodes can recover from reorgs past their pivot. -- BALs are only available for blocks after [EIP-7928] activation and within the retention window. For any requested hash outside this range, see the corresponding response semantics in [BlockAccessLists](#blockaccesslists-0x09). -- The responding node is allowed to return **less** data than requested (own QoS limits, or to honour `bytes`), truncating from the tail. The returned entries **must** preserve request order. +- Requests are keyed by block hash, so canonical and non-canonical (orphaned) blocks are + served through the same message. Serving nodes **should** retain non-canonical BALs (see + [Retention](#retention)) so that syncing nodes can recover from reorgs past their pivot. +- BALs are only available for blocks after [EIP-7928] activation and within the retention + window. For any requested hash outside this range, see the response semantics in + [BlockAccessLists](#blockaccesslists-0x09). +- The responding node is allowed to return **less** data than requested (own QoS limits, or + to honour `bytes`), truncating from the tail. The returned entries **must** preserve request + order. Rationale: -- Responses are byte-capped to keep network traffic deterministic, consistent with the other `snap` messages. -- Block hash, not block number, is the request key, because it disambiguates canonical and orphaned blocks; both are addressable through a single message without a separate fork qualifier. +- Responses are byte-capped to keep network traffic deterministic, consistent with the other + `snap` messages. +- Block hash, not block number, is the request key, because it disambiguates canonical and + orphaned blocks; both are addressable through a single message without a separate fork + qualifier. ### BlockAccessLists (0x09) `[reqID: P, bals: [bal1: B, bal2: B, ...]]` -Returns the requested block access lists in request order. Each `bal_i` corresponds positionally to `hashes[i]` from the request. +Returns the requested block access lists in request order. Each `bal_i` corresponds +positionally to `hashes[i]` from the request. - `reqID`: ID of the request this is a response for - `bals`: List of BALs in request order Notes: -- If a BAL is unavailable (pruned, never seen, or beyond the retention window), the response **must** contain the RLP empty string (`0x80`) at that position. Unlike `ByteCodes` (0x05), the protocol does **not** collapse unavailable entries; positional correspondence with the request is required. -- The responding node is allowed to truncate from the tail to respect the size limit. The recommended soft limit for a single response is 2 MiB. -- A received BAL is valid if and only if `keccak256(rlp.encode(bal_i))` equals the `block-access-list-hash` field of the header identified by `hashes[i]`; see [EIP-7928] for the BAL encoding. +- If a BAL is unavailable (pruned, never seen, or beyond the retention window), the response + **must** contain the RLP empty string (`0x80`) at that position. Unlike `ByteCodes` (0x05), + the protocol does **not** collapse unavailable entries; positional correspondence with the + request is required. +- The responding node is allowed to truncate from the tail to respect the size limit. The + recommended soft limit for a single response is 2 MiB. +- Each `bal_i` is the RLP-encoded BAL. It is valid if and only if `keccak256(bal_i)` + equals the `block_access_list_hash` field of the header identified by `hashes[i]`; see + [EIP-7928] for the BAL encoding. Rationale: -- Positional empty placeholders (rather than collapsing as `ByteCodes` does) preserve the request-to-response mapping without an extra index lookup. BALs are large enough that a one-byte `0x80` placeholder is negligible overhead. -- Application order matters for correctness: BALs **must** be applied in strict block order against the correct fork, with each BAL hash verified before application. A wrong-fork or out-of-order BAL produces an invalid state root, detected at the final root check. +- Positional empty placeholders (rather than collapsing as `ByteCodes` does) preserve the + request-to-response mapping without an extra index lookup. BALs are large enough that a + one-byte `0x80` placeholder is negligible overhead. +- Application order matters for correctness: BALs **must** be applied in strict block order + against the correct fork, with each BAL verified before application. A wrong-fork or + out-of-order BAL produces an invalid state root, detected at the final root check. Caveats: -- A peer that returns a BAL whose `keccak256(rlp.encode(bal))` does not match the header commitment is misbehaving; the syncing node **should** disconnect from or deprioritize such peers. -- Peers that return empty entries for blocks that should be available may be misbehaving or may have pruned data legitimately. Implementations should track peer reliability and deprioritize unreliable peers rather than treating a single empty entry as adversarial. +- A peer that returns a BAL not matching the header commitment is misbehaving; the syncing + node **should** disconnect from or deprioritize it. +- Peers that return empty entries for blocks that should be available may be misbehaving or + may have pruned data legitimately. Implementations should track peer reliability rather than + treating a single empty entry as adversarial. ## Change Log @@ -121,8 +179,12 @@ Caveats: - Added `GetBlockAccessLists` (0x08) and `BlockAccessLists` (0x09). - Removed `GetTrieNodes` (0x06) and `TrieNodes` (0x07); IDs reserved. -- Synchronization: replaced iterative trie healing with sequential BAL application. Pivot advancement requires in-line BAL catch-up before any further range fetching against the new pivot. Reorg past the current pivot is recovered by fetching orphaned-fork BALs, re-fetching diverged leaves, and applying new-fork BALs. -- Retention: serving peers retain BALs for canonical and non-canonical blocks within the [EIP-7928] retention window. +- Synchronization: replaced iterative trie healing with sequential BAL application. Pivot + advancement requires in-line BAL catch-up before any further range fetching against the new + pivot. Reorg past the current pivot is recovered by fetching orphaned-fork BALs, re-fetching + diverged leaves, and applying new-fork BALs. +- Retention: serving peers retain BALs for canonical and non-canonical blocks within the + [EIP-7928] retention window. ### snap/1 @@ -131,4 +193,3 @@ See [snap.md][snap1]. [snap1]: ./snap.md [EIP-7928]: https://eips.ethereum.org/EIPS/eip-7928 [EIP-8189]: https://eips.ethereum.org/EIPS/eip-8189 -[RLPx]: ../rlpx.md From e1e26d6bf45010f38ad86bace5a7f9203f4237ef Mon Sep 17 00:00:00 2001 From: Felix Lange Date: Tue, 30 Jun 2026 23:11:54 +0200 Subject: [PATCH 3/5] caps/snap.md: merge snap/2 spec into snap/1 spec --- caps/snap.md | 280 +++++++++++++++++++++++++------------------------- caps/snap2.md | 195 ----------------------------------- 2 files changed, 141 insertions(+), 334 deletions(-) delete mode 100644 caps/snap2.md diff --git a/caps/snap.md b/caps/snap.md index 21493315..c5a9d86f 100644 --- a/caps/snap.md +++ b/caps/snap.md @@ -4,7 +4,7 @@ The `snap` protocol runs on top of [RLPx], facilitating the exchange of Ethereum snapshots between peers. The protocol is an optional extension for peers supporting (or caring about) the dynamic snapshot format. -The current version is `snap/1`. +The current version is `snap/2`. ## Overview @@ -14,57 +14,15 @@ part in chain maintenance (block and transaction propagation); and it is **meant side-by-side with the `eth` protocol**, not standalone (e.g. chain progression is announced via `eth`). -The protocol itself is simplistic by design (take note, the supporting implementation is -everything but simple). In its crux, `snap` supports retrieving a contiguous segment of -accounts from the Ethereum state trie, or a contiguous segment of storage slots from one -particular storage trie. Both replies are Merkle proven for immediate verification. In -addition batches of bytecodes can also be retrieved similarly to the `eth` protocol. +The protocol itself is simplistic by design (however, the supporting implementation is +not...). At its core, `snap` supports retrieving a contiguous segment of accounts from the +Ethereum state trie, or a contiguous segment of storage slots from one particular storage +trie. Both replies are Merkle proven for immediate verification. In addition batches of +bytecodes can also be retrieved similarly to the `eth` protocol. The synchronization mechanism the protocol enables is for peers to retrieve and verify all the account and storage data without downloading intermediate Merkle trie nodes. The final -state trie is reassembled locally. An additional complexity nodes must be aware of, is -that state is ephemeral and moves with the chain, so syncers need to support reassembling -partially consistent state segments. This is supported by trie node retrieval similar to -`eth`, which can be used to heal trie inconsistencies (more on this later). - -The `snap` protocol permits downloading the entire Ethereum state without having to -download all the intermediate Merkle proofs, which can be regenerated locally. This -reduces the networking load enormously: - -- Ingress bandwidth is reduced from `O(accounts * log account + SUM(states * log states))` - (Merkle trie nodes) to `O(accounts + SUM(states))` (actual state data). -- Egress bandwidth is reduced from `O(accounts * log account + SUM(states * log states)) * - 32 bytes` (Merkle trie node hashes) to `O(accounts + SUM(states)) / 100000 bytes` - (number of 100KB chucks to cover the state). -- Round trip time is reduced from `O(accounts * log account + SUM(states * log states)) / - 384` (states retrieval packets) to `O(accounts + SUM(states)) / 100000 bytes` (number of - 100KB chucks to cover the state). - -### Expected results - -To put some numbers on the above abstract orders of magnitudes, synchronizing Ethereum -mainnet state (i.e. ignoring blocks and receipts, as those are the same) with `eth` vs. -the `snap` protocol: - -Block ~#11,177,000: - -- Accounts: 107,598,788 @ 19.70GiB -- Byte codes: 319,654 @ 1.48GiB -- Storage slots: 365,787,020 @ 49.88GiB -- Trie nodes: 617,045,138 - -| | Time | Upload | Download | Packets | Serving disk reads* | -|:------:|:------:|:-------:|:--------:|:--------:|:-------------------:| -| `eth` | 10h50m | 20.38GB | 43.8GB | 1607M | 15.68TB | -| `snap` | 2h6m | 0.15GB | 20.44GB | 0.099M | 0.096TB | -| | -80.6% | -99.26% | -53.33% | -99.993% | -99.39% | - -*\*Also accounts for other peer requests during the time span.* - -Post snap state heal: - -- Additional trie nodes: 541,260 @ 160.44MiB -- Additional byte codes: 34 @ 234.98KiB +state trie is reassembled locally. ## Relation to `eth` @@ -90,51 +48,59 @@ pursue it or not, without hindering their capacity to participate in the `eth` p ## Synchronization algorithm -The crux of the snapshot synchronization is making contiguous ranges of accounts and -storage slots available for remote retrieval. The sort order is the same as the state trie -iteration order, which makes it possible to not only request N subsequent accounts, but -also to Merkle prove them. Some important properties of this simple algorithm: - -- Opposed to *fast sync*, we only need to transfer the useful leaf data from the state - trie and can reconstruct internal nodes locally. -- Opposed to *warp sync*, we can download small chunks of accounts and storage slots and - immediately verify their Merkle proofs, making junk attacks impossible. -- Opposed to *warp sync*, random account ranges can be retrieved, thus synchronization - concurrency is totally dependent on client implementation and is not forced by the - protocol. - -The gotcha of the snapshot synchronization is that serving nodes need to be able to -provide **fast** iterable access to the state of the most recent `N` (128) blocks. -Iterating the Merkle trie itself might be functional, but it's not viable (iterating the -state trie at the time of writing takes 9h 30m on an idle machine). Geth introduced -support for [dynamic snapshots], which allows iterating all the accounts in 7m -(see [blog for more]). Some important properties of the dynamic snapshots: - -- Serving a contiguous range of accounts or storage slots take `O(n)` operations, and more - importantly, it's the same for disk access too, being stored contiguously on disk (not - counting the database read amplification). -- Maintaining a live dynamic snapshot means: - - Opposed to *warp sync*, syncing nodes can always get the latest data, thus they don't - need to process days' worth of blocks afterwards. - - Opposed to *warp sync*, there is no pre-computation to generate a snapshot (it's - updated live), so there's no periodic burden on the nodes to iterate the tries (there - it an initial burden to create the first snapshot after sync though). - - Providing access to 128 recent snapshots permits `O(1)` direct access to any account - and state, which can be used during EVM execution for `SLOAD`. - -The caveat of the snapshot synchronization is that as with *fast sync* (and opposed to -*warp sync*), the available data constantly moves (as new blocks arrive). The probability -of finishing sync before the 128 block window (15m) moves out is asymptotically zero. This -is not a problem, because we can self-heal. It is fine to import state snapshot chunks -from different tries, because the inconsistencies can be fixed by running a -*fast-sync-style-state-sync* on top of the assembled semi-correct state afterwards. Some -important properties of the self-healing: - -- Synchronization can be aborted at any time and resumed later. It might cause - self-healing to run longer, but it will fix the data either way. -- Synchronization on slow connections is guaranteed to finish too (as long as the node can - download data faster than it's being produced by the network), the data cannot disappear - from the network (opposed to warp sync). +The goal of the protocol is to assemble the complete state of a recent block. Since the +blockchain advances while the state is being downloaded, the sync algorithm has to +continuously re-target newer states. The current target block is called the 'pivot block'. + +Synchronization uses two separate processes in parallel to achieve the target state: + +- Snapshot download: ranges of state values are requested in key-order. The download + starts at state root `R₀` of the initial pivot block and all responses are verified + against `R₀`. As the pivot block advances, the current root is updated to `R₁`, ... `Rₙ` + from the pivot. The state iteration does not restart when the pivot moves, i.e. it + always advances the key until the end of state is reached. Contract storage and code is + fetched concurrently with accounts. + + In isolation, this process would not result in a consistent state because the resulting + state is a sequence of key-value pairs from states `R₀`, `R₁`, ... `Rₙ`. To make it + consistent with the final root `Rₙ`, the state has to be patched using BALs: + +- BAL application: synchronization continuously fetches BALs of all blocks starting at the + initial pivot block, and applies their state diff to the state. By doing this, the final + state is made consistent with all state modifications performed since the sync started. + +Essentially, synchronization performs these steps in a loop: + +1. Select a pivot `P` (typically `HEAD-64`). +2. Bulk-download flat state at `P` via `GetAccountRange`, `GetStorageRanges`, `GetByteCodes`. +3. As the chain advances from `P` to `P+K`, fetch BALs for `P+1..P+K` via + `GetBlockAccessLists`, verify each against the `block_access_list_hash` of its header, + and apply the resulting state diff to the partial flat state. `P+K` is then the target + for any remaining range requests. +4. Repeat step 3 if the pivot advances again during catch-up. +5. Once the flat state is consistent with the latest pivot, reconstruct all tries locally + and verify the resulting root against the last header. + +### Reorg past the current pivot + +If the canonical chain reorgs past the current pivot `P`, the bulk-downloaded state may +contain leaves written by the now-orphaned fork. Let `W` be the common ancestor of the old +and new canonical chains. Recovery: + +1. Fetch BALs for `W+1..P` on the orphaned fork via `GetBlockAccessLists`. Requests are + keyed by block hash, so orphaned BALs are addressable identically to canonical ones, + provided peers have retained them (see [Data Retention Requirements]). +2. From the orphaned-fork and new-fork BALs, compute the set of accounts and storage slots + mutated on the orphaned fork but **not** on the new canonical fork. Entries mutated on + both forks will be overwritten in step 4 and need no special handling. +3. Re-fetch the diverged entries via `GetAccountRange` and `GetStorageRanges` against a + fresh pivot `P'` on the new canonical chain. +4. Apply BALs for `W+1..P'` on the new canonical fork. + +If the orphaned BALs are not retained by any peer, the syncing node **must** discard +partial state and restart synchronization. With the conventional pivot at `HEAD-64`, this +scenario requires a reorg deeper than 64 blocks, which has not occurred on mainnet and is +further bounded by PoS finality. ## Data format @@ -147,6 +113,16 @@ The accounts in the `snap` protocol are analogous to the Ethereum RLP consensus This is done to avoid having to transfer the same 32+32 bytes for all plain accounts over the network. +## Data Retention Requirements + +Peers serving snap must retain BALs for both canonical and non-canonical blocks within the +retention window defined in [EIP-7928] (at least the weak subjectivity period). Retention +of non-canonical BALs is what enables the reorg-recovery procedure above; without it, a +deep reorg would force a sync restart. + +Snapshot data served by `GetAccountRange`, `GetStorageRanges` must be made available for +the last 128 blocks. + ## Protocol Messages ### GetAccountRange (0x00) @@ -172,7 +148,7 @@ Notes: 128 blocks. - The responding node is allowed to return **less** data than requested (own QoS limits), but the node **must** return at least one account. If no accounts exist between `startingHash` and `limitHash`, then - the first (if any) account **after** `limitHash` must be provided. + the first (if any) account **after** `limitHash` must be provided. - The responding node **must** Merkle prove the starting hash (even if it does not exist) and the last returned account (if any exists after the starting hash). @@ -351,72 +327,98 @@ Returns a number of requested contract codes. The order is the same as in the re there might be gaps if not all codes are available or there might be fewer is QoS limits are reached. -### GetTrieNodes (0x06) +### GetBlockAccessLists (0x08) -`[reqID: P, rootHash: B_32, paths: [[accPath: B, slotPath1: B, slotPath2: B, ...]...], bytes: P]` +`[reqID: P, hashes: [hash1: B_32, hash2: B_32, ...], bytes: P]` -Requests a number of state (either account or storage) Merkle trie nodes **by path**. This -is analogous in functionality to the `eth/63` `GetNodeData`, but restricted to only tries -and queried by path, to break the generality that causes issues with database -optimizations. +Requests block access lists by block hash. The intended purpose of this message is to +obtain the per-block state-diff data needed to catch up the flat state during pivot +advancement and to recover from reorgs past the current pivot. - `reqID`: Request ID to match up responses with -- `rootHash`: Root hash of the account trie to serve -- `paths`: Trie paths to retrieve the nodes for, grouped by account +- `hashes`: Block hashes of the BALs to retrieve - `bytes`: Soft limit at which to stop returning data -The `paths` is one array of trie node paths to retrieve per account (i.e. list of list of -paths). Each list in the array special cases the first element as the path in the account -trie and the remaining elements as paths in the storage trie. To address an account node, -the inner list should have a length of 1 consisting of only the account path. Partial -paths (<32 bytes) should be compact encoded per the Ethereum wire protocol, full paths -should be plain binary encoded. - -*This functionality was mutated into `snap` from `eth/65` to permit `eth` long term to -become a chain maintenance protocol only and move synchronization primitives out into -satellite protocols only.* - Notes: - Nodes **must** always respond to the query. -- The returned nodes **must** be in the request order. -- If the node does **not** have the state for the requested state root or for **any** - requested account paths, it **must** return an empty reply. It is the responsibility of - the caller to query an state not older than 128 blocks; and the caller is expected to - only ever query existing trie nodes. -- The responding node is allowed to return **less** data than requested (serving QoS - limits), but the node **must** return at least one trie node. +- Requests are keyed by block hash, so canonical and non-canonical (orphaned) BALs are + served through the same message. Serving nodes **should** retain non-canonical BALs + within the retention window defined in [EIP-7928] so that syncing nodes can recover from + reorgs past their pivot. +- BALs are only available for blocks after [EIP-7928] activation and within the retention + window. For any requested hash outside this range, see the corresponding response + semantics in [BlockAccessLists](#blockaccesslists-0x09). +- The responding node is allowed to return **less** data than requested (own QoS limits, + or to honour `bytes`), truncating from the tail. The returned entries **must** preserve + request order. Rationale: -- The response is capped by byte size and not by number of slots, because it makes the - network traffic more deterministic. Although opposed to the previous request types - (accounts, slots, codes), trie nodes are relatively deterministic (100-500B), the - protocol remains cleaner if all packets follow the same traffic shaping rules. -- A naive way to represent trie nodes would be a simple list of `account || storage` path - segments concatenated, but that would be very wasteful on the network as it would - duplicate the account hash for every storage trie node. +- Responses are byte-capped to keep network traffic deterministic, consistent with the + other `snap` messages. +- Block hash, not block number, is the request key, because it disambiguates canonical and + orphaned blocks; both are addressable through a single message without a separate fork + qualifier. + +### BlockAccessLists (0x09) -### TrieNodes (0x07) +`[reqID: P, bals: [bal1: B, bal2: B, ...]]` -`[reqID: P, nodes: [node1: B, node2: B, ...]]` +Returns the requested block access lists in request order. Each `bal_i` corresponds +positionally to `hashes[i]` from the request. + +- `reqID`: ID of the request this is a response for +- `bals`: List of BALs in request order + +Notes: + +- If a BAL is unavailable (pruned, never seen, or beyond the retention window), the + response **must** contain the RLP empty string (`0x80`) at that position. Unlike + `ByteCodes` (0x05), the protocol does **not** collapse unavailable entries; positional + correspondence with the request is required. +- The responding node is allowed to truncate from the tail to respect the size limit. The + recommended soft limit for a single response is 2 MiB. +- Each `bal_i` is the RLP-encoded BAL. It is valid if and only if `keccak256(bal_i)` + equals the `block_access_list_hash` field of the header identified by `hashes[i]`; see + [EIP-7928] for the BAL encoding. + +Rationale: + +- Positional empty placeholders (rather than collapsing as `ByteCodes` does) preserve the + request-to-response mapping without an extra index lookup. BALs are large enough that a + one-byte `0x80` placeholder is negligible overhead. +- Application order matters for correctness: BALs **must** be applied in strict block + order against the correct fork, with each BAL hash verified before application. A + wrong-fork or out-of-order BAL produces an invalid state root, detected at the final + root check. + +Caveats: -Returns a number of requested state trie nodes. The order is the same as in the request, -but there might be fewer is QoS limits are reached. +- A peer that returns a BAL whose `keccak256(rlp.encode(bal))` does not match the header + commitment is misbehaving; the syncing node **should** disconnect from or deprioritize + such peers. +- Peers that return empty entries for blocks that should be available may be misbehaving + or may have pruned data legitimately. Implementations should track peer reliability and + deprioritize unreliable peers rather than treating a single empty entry as adversarial. ## Change Log -### snap/2 +### snap/2 ([EIP-8189], June 2026) -Version 2 replaces trie-node healing (0x06/0x07) with block-access-list based -catch-up (0x08/0x09). It is specified separately in [snap2.md](./snap2.md) as a -delta over this document; the messages 0x00–0x05 and all framing defined here -are inherited unchanged. +- Added `GetBlockAccessLists` (0x08) and `BlockAccessLists` (0x09). +- Removed `GetTrieNodes` (0x06) and `TrieNodes` (0x07); IDs reserved. +- Synchronization: replaced iterative trie healing with BAL application. +- Retention: serving peers retain BALs for canonical and non-canonical blocks within the + [EIP-7928] retention window. ### snap/1 (November 2020) Version 1 was the introduction of the snapshot protocol. +Also see the [initial snap-sync announcement]. [RLPx]: ../rlpx.md -[dynamic snapshots]: https://github.com/ethereum/go-ethereum/pull/20152 -[blog for more]: https://blog.ethereum.org/2020/07/17/ask-about-geth-snapshot-acceleration/ +[Data Retention Requirements]: #data-retention-requirements +[initial snap-sync annoucement]: https://blog.ethereum.org/2020/07/17/ask-about-geth-snapshot-acceleration/ +[EIP-7928]: https://eips.ethereum.org/EIPS/eip-7928 +[EIP-8189]: https://eips.ethereum.org/EIPS/eip-8189 diff --git a/caps/snap2.md b/caps/snap2.md deleted file mode 100644 index 5a919022..00000000 --- a/caps/snap2.md +++ /dev/null @@ -1,195 +0,0 @@ -# Ethereum Snapshot Protocol Version 2 (SNAP/2) - -This document specifies version 2 of the `snap` protocol. It is a delta over version 1; -everything not redefined here is inherited unchanged from [snap.md][snap1]. That includes -the overview, the satellite relationship with `eth`, the data format, the account-range, -storage-range and bytecode messages (0x00–0x05), and the general framing of snap sync. - -snap/2 was introduced by [EIP-8189]. It replaces snap/1's trie-node healing mechanism with -state-diff application using block-level access lists ([EIP-7928]). snap/2 is meaningful -only for blocks after [EIP-7928] activation, when the header field `block_access_list_hash` -is present; for pre-activation blocks the snap/1 mechanism continues to apply. - -## Differences from snap/1 - -| | snap/1 | snap/2 | -|---|---|---| -| Healing primitive | `GetTrieNodes` / `TrieNodes` (0x06 / 0x07) | `GetBlockAccessLists` / `BlockAccessLists` (0x08 / 0x09) | -| Catch-up | Iterative trie-node discovery | Sequential application of verified BALs | -| Pivot advancement during sync | Free retarget; healing reconciles afterwards | In-line BAL catch-up required before retarget | -| Reorg past current pivot | Handled by trie healing | Re-fetch of diverged leaves, gated on orphaned-BAL availability | -| Required header field | none | `block_access_list_hash` ([EIP-7928]) | - -## Synchronization algorithm - -The high-level structure of snap sync (pivot selection, byte-bounded contiguous range -download with Merkle-proven boundaries, the 128-block snapshot serving window) is unchanged; -see [snap.md][snap1]. The change is the replacement of trie-node healing with BAL-based -catch-up. Healing in snap/1 reacts to whatever inconsistencies the syncing node observes -during trie reconstruction; snap/2's catch-up is upfront-deterministic: the set of blocks to -apply is known from the header chain alone. - -Concretely, the sync loop becomes: - -1. Select a pivot `P` (typically `HEAD-64`). -2. Bulk-download flat state at `P` via `GetAccountRange`, `GetStorageRanges`, and - `GetByteCodes`. -3. As the chain advances from `P` to `P+K`, fetch BALs for `P+1..P+K` via - `GetBlockAccessLists`, verify each against the `block_access_list_hash` of its header, and - apply the resulting state diff to the partial flat state. `P+K` is then the target for any - remaining range requests. -4. Repeat step 3 if the pivot advances again during catch-up. -5. Once the flat state is consistent with the latest pivot, reconstruct tries locally and - verify the resulting root against the corresponding header. - -There is no separate healing phase. - -### Pivot advancement - -In snap/1, when the pivot advances from `P` to `P+K` during state download, the syncing node -retargets the new pivot and lets the healing phase reconcile the gap. snap/2 has no later -healing pass, so the advance itself is the catch-up: BALs for `P+1..P+K` **must** be fetched, -verified, and applied to the partially-synced flat state **before** any further range request -is issued against the new pivot. Range data downloaded prior to the advance is only consistent -with the new pivot once those BALs have been applied. - -### Reorg past the current pivot - -If the canonical chain reorgs past the current pivot `P`, the bulk-downloaded state may -contain leaves written by the now-orphaned fork. Let `W` be the common ancestor of the old -and new canonical chains. Recovery: - -1. Fetch BALs for `W+1..P` on the orphaned fork via `GetBlockAccessLists`. Orphaned BALs are - addressable by hash like canonical ones, provided peers have retained them (see - [Retention](#retention)). -2. From the orphaned-fork and new-fork BALs, compute the set of accounts and storage slots - mutated on the orphaned fork but **not** on the new canonical fork. Entries mutated on both - forks are overwritten in step 4 and need no special handling. -3. Re-fetch the diverged entries via `GetAccountRange` and `GetStorageRanges` against a fresh - pivot `P'` on the new canonical chain. -4. Apply BALs for `W+1..P'` on the new canonical fork. - -If the orphaned BALs are not retained by any peer, the syncing node **must** discard partial -state and restart synchronization. With the conventional pivot at `HEAD-64`, this requires a -reorg deeper than 64 blocks, which has not occurred on mainnet and is further bounded by PoS -finality. - -## Retention - -Peers serving snap/2 retain BALs within the retention window defined in [EIP-7928] (at least -the weak subjectivity period), and **should** also retain non-canonical (orphaned) BALs. -Retention of non-canonical BALs is what enables the reorg-recovery procedure above; without -it, a deep reorg forces a sync restart. - -The 128-block snapshot retention for the data served by `GetAccountRange` / `GetStorageRanges` -is unchanged from snap/1. - -## Protocol Messages - -### Unchanged from snap/1 - -The following messages are defined in [snap.md][snap1] and unchanged in snap/2: - -- `GetAccountRange` (0x00) / `AccountRange` (0x01) -- `GetStorageRanges` (0x02) / `StorageRanges` (0x03) -- `GetByteCodes` (0x04) / `ByteCodes` (0x05) - -### Removed in snap/2 - -- `GetTrieNodes` (0x06) -- `TrieNodes` (0x07) - -These message IDs are reserved and **must not** be reused. - -### GetBlockAccessLists (0x08) - -`[reqID: P, hashes: [hash1: B_32, hash2: B_32, ...], bytes: P]` - -Requests block access lists by block hash. The intended purpose of this message is to obtain -the per-block state-diff data needed to catch up the flat state during pivot advancement and -to recover from reorgs past the current pivot. - -- `reqID`: Request ID to match up responses with -- `hashes`: Block hashes of the BALs to retrieve -- `bytes`: Soft limit at which to stop returning data - -Notes: - -- Nodes **must** always respond to the query. -- Requests are keyed by block hash, so canonical and non-canonical (orphaned) blocks are - served through the same message. Serving nodes **should** retain non-canonical BALs (see - [Retention](#retention)) so that syncing nodes can recover from reorgs past their pivot. -- BALs are only available for blocks after [EIP-7928] activation and within the retention - window. For any requested hash outside this range, see the response semantics in - [BlockAccessLists](#blockaccesslists-0x09). -- The responding node is allowed to return **less** data than requested (own QoS limits, or - to honour `bytes`), truncating from the tail. The returned entries **must** preserve request - order. - -Rationale: - -- Responses are byte-capped to keep network traffic deterministic, consistent with the other - `snap` messages. -- Block hash, not block number, is the request key, because it disambiguates canonical and - orphaned blocks; both are addressable through a single message without a separate fork - qualifier. - -### BlockAccessLists (0x09) - -`[reqID: P, bals: [bal1: B, bal2: B, ...]]` - -Returns the requested block access lists in request order. Each `bal_i` corresponds -positionally to `hashes[i]` from the request. - -- `reqID`: ID of the request this is a response for -- `bals`: List of BALs in request order - -Notes: - -- If a BAL is unavailable (pruned, never seen, or beyond the retention window), the response - **must** contain the RLP empty string (`0x80`) at that position. Unlike `ByteCodes` (0x05), - the protocol does **not** collapse unavailable entries; positional correspondence with the - request is required. -- The responding node is allowed to truncate from the tail to respect the size limit. The - recommended soft limit for a single response is 2 MiB. -- Each `bal_i` is the RLP-encoded BAL. It is valid if and only if `keccak256(bal_i)` - equals the `block_access_list_hash` field of the header identified by `hashes[i]`; see - [EIP-7928] for the BAL encoding. - -Rationale: - -- Positional empty placeholders (rather than collapsing as `ByteCodes` does) preserve the - request-to-response mapping without an extra index lookup. BALs are large enough that a - one-byte `0x80` placeholder is negligible overhead. -- Application order matters for correctness: BALs **must** be applied in strict block order - against the correct fork, with each BAL verified before application. A wrong-fork or - out-of-order BAL produces an invalid state root, detected at the final root check. - -Caveats: - -- A peer that returns a BAL not matching the header commitment is misbehaving; the syncing - node **should** disconnect from or deprioritize it. -- Peers that return empty entries for blocks that should be available may be misbehaving or - may have pruned data legitimately. Implementations should track peer reliability rather than - treating a single empty entry as adversarial. - -## Change Log - -### snap/2 ([EIP-8189]) - -- Added `GetBlockAccessLists` (0x08) and `BlockAccessLists` (0x09). -- Removed `GetTrieNodes` (0x06) and `TrieNodes` (0x07); IDs reserved. -- Synchronization: replaced iterative trie healing with sequential BAL application. Pivot - advancement requires in-line BAL catch-up before any further range fetching against the new - pivot. Reorg past the current pivot is recovered by fetching orphaned-fork BALs, re-fetching - diverged leaves, and applying new-fork BALs. -- Retention: serving peers retain BALs for canonical and non-canonical blocks within the - [EIP-7928] retention window. - -### snap/1 - -See [snap.md][snap1]. - -[snap1]: ./snap.md -[EIP-7928]: https://eips.ethereum.org/EIPS/eip-7928 -[EIP-8189]: https://eips.ethereum.org/EIPS/eip-8189 From 7ff08dac8638e963f7ddd9cdf8daebdd5a95b9fc Mon Sep 17 00:00:00 2001 From: Felix Lange Date: Tue, 30 Jun 2026 23:14:15 +0200 Subject: [PATCH 4/5] caps/snap.md: fix link --- caps/snap.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/caps/snap.md b/caps/snap.md index c5a9d86f..82a38662 100644 --- a/caps/snap.md +++ b/caps/snap.md @@ -419,6 +419,6 @@ Also see the [initial snap-sync announcement]. [RLPx]: ../rlpx.md [Data Retention Requirements]: #data-retention-requirements -[initial snap-sync annoucement]: https://blog.ethereum.org/2020/07/17/ask-about-geth-snapshot-acceleration/ +[initial snap-sync announcement]: https://blog.ethereum.org/2020/07/17/ask-about-geth-snapshot-acceleration/ [EIP-7928]: https://eips.ethereum.org/EIPS/eip-7928 [EIP-8189]: https://eips.ethereum.org/EIPS/eip-8189 From 98914a2b0c2fc74336c4bfe11e12882141daf61a Mon Sep 17 00:00:00 2001 From: Felix Lange Date: Tue, 30 Jun 2026 23:15:01 +0200 Subject: [PATCH 5/5] README.md: remove snap/2 link --- README.md | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index ba1c4c5a..fcaab90a 100644 --- a/README.md +++ b/README.md @@ -17,8 +17,8 @@ We have several specifications for low-level protocols: The repository also contains specifications of many RLPx-based application-level protocols: -- [Ethereum Wire Protocol] (eth/68) -- [Ethereum Snapshot Protocol] (snap/1), [Snapshot Protocol v2] (snap/2) +- [Ethereum Wire Protocol] (eth/70) +- [Ethereum Snapshot Protocol] (snap/2) - [Light Ethereum Subprotocol] (les/4) - [Parity Light Protocol] (pip/1) - [Ethereum Witness Protocol] (wit/0) @@ -81,7 +81,6 @@ WireShark dissectors are available here: