Skip to content

BAL DB should key by (blockNumber, blockHash), not blockHash alone, to avoid RocksDB write amplification #11504

@asdacap

Description

@asdacap

Problem

The Block Access List (BAL) DB introduced for EIP-7928 currently keys entries by blockHash alone:

  • src/Nethermind/Nethermind.Blockchain/BlockAccessLists/BlockAccessListStore.cs:
    public void Insert(Hash256 blockHash, BlockAccessList bal)
    {
        using NettyRlpStream rlpStream = BlockAccessListDecoder.Instance.EncodeToNewNettyStream(bal);
        balDb.Set(blockHash, rlpStream.AsSpan());
    }

Block hashes are uniformly random, so consecutive inserts land at random positions in RocksDB's keyspace. Combined with the large value size of a BAL — empirical mainnet measurements put it at ~91 KB raw / ~43 KB Snappy-compressed on average, with p95 around 74 KB compressed (see EIP-7928 BAL size analysis) — this is close to half the size of the block body itself, which currently averages ~167 KB on mainnet.

Random-position writes of large values is the worst case for an LSM tree:

  • New keys are scattered across all SST levels, so compaction has to repeatedly rewrite large existing SSTs to merge in tiny amounts of new data.
  • Write amplification grows roughly with the size of L_max divided by the size of new ingest per compaction cycle, and the large per-record size makes each compaction cycle expensive in absolute bytes.
  • There is no opportunity for trivial-move (a.k.a. "trivial move compaction") optimizations, because new keys overlap the entire existing key range.

This is exactly the problem that motivated the migration of the blocks DB, receipts DB, and headers DB away from pure-blockHash keys to a (blockNumber, blockHash) composite key. With block-number-prefixed keys:

  • New writes are monotonically increasing in key order, so they always land at the right edge of the keyspace.
  • Compaction degenerates to mostly trivial-move between levels — no rewriting of historical data.
  • Range scans by block number become possible.
  • Pruning (deleting BALs older than N blocks) becomes a contiguous range delete instead of scattered point deletes.

Proposal

Change the BAL DB key from blockHash to a composite (blockNumber, blockHash) key, matching the convention already established by BlockStore, HeaderStore, and PersistentReceiptStorage.

API-side, the lookup signature should accept (long blockNumber, Hash256 blockHash) (with a fallback path for legacy callers that only have a hash, the same way the other stores handle it).

Files involved

  • src/Nethermind/Nethermind.Blockchain/BlockAccessLists/BlockAccessListStore.cs
  • src/Nethermind/Nethermind.Blockchain/BlockAccessLists/IBlockAccessListStore.cs
  • All call sites of IBlockAccessListStore.Insert/Get/Delete

Notes

  • This should be done before EIP-7928 ships on mainnet — migrating the key layout after a real BAL DB has accumulated on user nodes is much more painful than getting it right up front.
  • The same encoding helper used by BlockStore / HeaderStore (Bytes.Concat(blockNumber, blockHash) / the existing BlockKeyEncoder) can be reused.

Reported by Claude (Opus 4.7) on behalf of @asdacap.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions