Skip to content

perf(blockvalidation): cut peak RAM during block validation by ~150 GB#886

Merged
icellan merged 6 commits into
bsv-blockchain:mainfrom
icellan:feat/block-validation-ram-optimization
May 18, 2026
Merged

perf(blockvalidation): cut peak RAM during block validation by ~150 GB#886
icellan merged 6 commits into
bsv-blockchain:mainfrom
icellan:feat/block-validation-ram-optimization

Conversation

@icellan

@icellan icellan commented May 18, 2026

Copy link
Copy Markdown
Contributor

Summary

On dev-scale-2 (675M-tx blocks, ~117 GB serialized, every ~7 min) the `block-validator` pod baselined at ~15 GB RSS and spiked to >600 GB during each block. Live heap during validation was only ~80-100 GB; the gap was allocation churn, GC pacing, and avoidable map rehash transients.

This PR attacks four sources concurrently. Each track is independently revertable; tests cover each in isolation.

1. Pre-size + bucket bump for the swiss maps

`txMap` and `parentSpendsMap` are now constructed with 1.2× per-bucket headroom inside `go-tx-map.NewSplitSwissMapUint64` and `model.NewSplitSyncedParentMap`. For `N=675M, B=8192` the per-bucket count is Binomial(N, 1/B) with mean ~82K and σ ~285 — 20% headroom absorbs the max-of-B order statistic with a ~250× safety margin and the underlying `dolthub/swiss` map no longer triggers a mid-fill rehash.

Live mutex profile on block 1122 showed peak 236 goroutines blocked on `SplitSyncedParentMap.SetIfNotExists` at `model/map.go b.mu.Lock()` — that contention burst correlates 1:1 with the rehash window and disappears once rehash is eliminated.

Bucket count bumped 1024 → 8192 (txMap) and 4096 → 8192 (parentSpendsMap) to halve steady-state collision rate at 192 concurrent inserters from ~4.7% → ~2.3%.

2. Pool the maps across blocks

`txMap` and `parentSpendsMap` are drawn from size-class-keyed `sync.Pool`s (4K..1B for tx counts, 16K..4B for parent spends) so the multi-GB backing storage is reused rather than allocated fresh per block. Maps are reset via the new `Clear()` methods on `dolthub/swiss.Map` wrappers — Clear zeroes the ctrl/metadata in place without releasing the (often multi-GB) group arrays.

Network-load shifts (1K ↔ 1M subtree size) are handled by `sync.Pool`'s natural per-GC drainage: classes that stop being `Get`ed age out automatically.

3. Pool the per-subtree []Node arrays

For a 675M-tx block, ~620 subtrees × ~1M nodes × 48 B per `Node` = ~30 GB allocated and freed per block. `go-subtree` exposes a new `DeserializeFromReaderWithAllocator` that takes a caller-supplied `NodeAllocator`; teranode installs a size-class-bucketed pool (1K/4K/16K/64K/256K/1M/4M Node capacities) and releases the slice back to the pool on cache eviction or any validation/storage failure path.

The existing `DeserializeFromReader`, `deserializeNodes`, and `deserializeFromReaderMmap` paths are byte-for-byte unchanged.

4. Aerospike setMined per-record pools

After validation, every tx in the accepted block is marked mined via Aerospike batch UDFs. Live alloc profile attributed ~975 GB cumulative allocation per block to this path. Two safe in-repo pools land here:

  • `aerospike.Key` reuse via a per-Store `sync.Pool`. Per-Store because `aerospike.Key.SetValue` recomputes the digest using the Key's existing (unexported) setName — a package-level pool would cross-contaminate stores with different setNames and produce wrong digests.
  • `LuaMapResponse` pool with `Reset()` and a private `parseLuaMapResponseInto` sink so the parser fills a caller-owned struct. The existing public `ParseLuaMapResponse` remains an allocating thin wrapper so the 12+ unrelated callers in `locked.go` / `alert_system.go` / `un_spend.go` etc. are untouched.

The aerospike-client-go-side packer/unpacker buffer pool (the single largest churn driver) is intentionally not in this PR — main pins upstream `v8.2.0` and that fix requires the bsv fork. It lands separately on the `feat/teranode-native-ops` (PR #828) branch which already uses the fork. See bsv-blockchain/aerospike-client-go#3.

Dependent fork PRs

Both go-tx-map and go-subtree are pinned in this PR via pseudo-version. Bump to a tagged release before merging.

Estimated RAM impact at 675M-tx scale

Source Saved
Eliminated swiss rehash transient (2 maps) ~60 GB peak
Map pooling across blocks (alloc churn) ~60 GB/block churn
Subtree `[]Node` pooling ~30 GB/block churn
Aerospike Key + LuaMapResponse pools ~80 GB cumulative churn

The packer pool (when PR 828 merges) adds another ~30 GB `bytes.Buffer.grow` saving.

Test plan

  • `go test -race -tags testtxmetacache ./model/...` — passes
  • `go test -race -tags testtxmetacache -run 'TestGetAndValidateSubtrees|TestNode|TestBlockValid' ./services/blockvalidation/` — passes
  • `go test -race -tags testtxmetacache -run 'TestSetMined|TestParseLuaMapResponse' ./stores/utxo/aerospike/` — passes (testcontainer-dependent tests require Docker)
  • `go vet ./...` — clean on changed files (pre-existing vet errors in `test/utils/` are unrelated)
  • New `BenchmarkSplitSwissMapUint64_NoRehash` (1M inserts) — 1 B/op, 0 allocs/op after construction, confirms no mid-fill rehash
  • New `TestGetTxMap_`, `TestGetParentSpendsMap_`, `TestPools_NilSafe`, `TestTxMapPool_ConcurrentReuse` — pool round-trip + concurrent safety
  • New `TestNodePool*`, `TestNodeAllocFromPool*` (`services/blockvalidation/pools_test.go`)
  • Smoke deploy on dev-scale-2 — confirm Prometheus `container_memory_working_set_bytes` peak drops below 450 GB
  • pprof heap mid-validation post-deploy — `swiss.Map.rehash` no longer appears in `-inuse_space`

Known follow-ups (out of this PR)

Cherry-pick into PR #828

This PR is designed to merge cleanly into `feat/teranode-native-ops` (PR #828). The only meaningful conflict is in `stores/utxo/aerospike/` where 828 uses the renamed bsv fork imports — trivial path-level conflict.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR targets large, transient allocation spikes during block validation by adding pooling and pre-sizing strategies for several hot-path data structures (validation maps, subtree node slices, and Aerospike setMined batch structures), aiming to reduce peak RSS and GC churn on very large blocks.

Changes:

  • Add size-class sync.Pool reuse for block-validation maps (txMap and parent-spends map) and add Clear()/bucket sizing support.
  • Introduce a size-class pool for per-subtree []Node allocations, wired in via a NodeAllocator on model.Block, and release pooled slices on cache eviction / failure paths.
  • Reduce Aerospike setMined churn by reusing []*aerospike.Key slices and pooling LuaMapResponse parsing structures.

Reviewed changes

Copilot reviewed 11 out of 12 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
stores/utxo/aerospike/teranode.go Adds LuaMapResponse reset + pooling and a pooled parse helper.
stores/utxo/aerospike/set_mined.go Reuses per-Store key slices and pooled Lua map responses in SetMinedMulti.
stores/utxo/aerospike/aerospike.go Extends Store with a per-instance key slice pool.
services/blockvalidation/pools.go New size-class pool for subtree []Node allocations + block node release helper.
services/blockvalidation/pools_test.go Tests for node-slice pooling behavior and concurrency safety.
services/blockvalidation/BlockValidation.go Installs pooled NodeAllocator and releases nodes on eviction/failure paths.
model/map.go Adds per-bucket headroom sizing and Clear()/bucket-count helper for parent-spends map pooling.
model/Block.go Uses pooled txMap/parent-spends maps; wires allocator into subtree deserialization; returns pooled txMap on success.
model/block_validation_pools.go Implements size-class pools for txMap and parent-spends map reuse across blocks.
model/block_validation_pools_test.go Tests pool round-trips and concurrent reuse behavior.
go.mod Bumps go-subtree, go-tx-map, go-safe-conversion, and some indirect test deps.
go.sum Updates module checksums for the bumped dependencies.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread stores/utxo/aerospike/set_mined.go
Comment thread model/Block.go
@icellan icellan force-pushed the feat/block-validation-ram-optimization branch from 7089096 to 82bbcfe Compare May 18, 2026 14:47
@github-actions

github-actions Bot commented May 18, 2026

Copy link
Copy Markdown
Contributor

🤖 Claude Code Review

Status: Complete

Summary:

This PR implements comprehensive memory pooling across four critical allocation paths in block validation, targeting a ~150 GB peak RAM reduction at 675M-tx scale. The implementation is well-structured with proper separation of concerns and defensive error handling.

Findings:

Previously reported issue is FIXED: The txMap resource leak on error paths (comment from Copilot) has been resolved via defer b.releaseTxMap() at model/Block.go:547.

The following existing inline comment remains open for author consideration:

  • stores/utxo/aerospike/set_mined.go:146 - Copilot raised a concern about unbounded pool growth in batchKeysPool. My analysis shows this is mitigated by per-Store scoping and sync.Pool per-GC drainage. The workload assumption (uniform batch sizes within a Store lifetime) should be documented if not already in inline comments.

Overall Assessment:

No new critical issues found. The PR demonstrates careful engineering:

  • Proper defer-based cleanup on all exit paths
  • Nil-safe, idempotent pool release methods
  • Size-class bucketing prevents pathological allocations
  • Pool scoping aligned with lifetime assumptions (package-level for shared validation state, per-Store for service-scoped resources)
  • Comprehensive test coverage for pool round-trips and concurrent safety

The dependency bumps to go-tx-map v1.3.7 and go-subtree v1.3.3 (both tagged releases) are appropriate. Recommend merging once manual validation on dev-scale-2 confirms the expected memory reduction.

@github-actions

github-actions Bot commented May 18, 2026

Copy link
Copy Markdown
Contributor

Benchmark Comparison Report

Baseline: main (unknown)

Current: PR-886 (38f6d37)

Summary

  • Regressions: 0
  • Improvements: 0
  • Unchanged: 142
  • Significance level: p < 0.05
All benchmark results (sec/op)
Benchmark Baseline Current Change p-value
_NewBlockFromBytes-4 1.565µ 1.558µ ~ 0.100
SplitSyncedParentMap_SetIfNotExists/256_buckets-4 71.18n 71.21n ~ 1.000
SplitSyncedParentMap_SetIfNotExists/16_buckets-4 71.55n 71.36n ~ 0.700
SplitSyncedParentMap_SetIfNotExists/1_bucket-4 71.39n 71.19n ~ 0.400
SplitSyncedParentMap_ConcurrentSetIfNotExists/256_buckets... 35.98n 32.56n ~ 0.100
SplitSyncedParentMap_ConcurrentSetIfNotExists/16_buckets_... 58.02n 55.79n ~ 0.400
SplitSyncedParentMap_ConcurrentSetIfNotExists/1_bucket_pa... 160.7n 131.0n ~ 0.100
MiningCandidate_Stringify_Short-4 217.7n 218.0n ~ 0.400
MiningCandidate_Stringify_Long-4 1.614µ 1.603µ ~ 0.100
MiningSolution_Stringify-4 845.5n 833.8n ~ 0.100
BlockInfo_MarshalJSON-4 1.726µ 1.739µ ~ 0.100
NewFromBytes-4 129.3n 124.3n ~ 0.400
Mine_EasyDifficulty-4 61.04µ 60.65µ ~ 0.100
Mine_WithAddress-4 7.555µ 7.233µ ~ 1.000
BlockAssembler_AddTx-4 0.02768n 0.02675n ~ 0.700
AddNode-4 11.31 10.67 ~ 0.200
AddNodeWithMap-4 10.98 12.21 ~ 0.100
DirectSubtreeAdd/4_per_subtree-4 60.03n 61.86n ~ 0.700
DirectSubtreeAdd/64_per_subtree-4 32.12n 31.58n ~ 0.100
DirectSubtreeAdd/256_per_subtree-4 31.34n 30.02n ~ 0.100
DirectSubtreeAdd/1024_per_subtree-4 29.68n 28.76n ~ 0.100
DirectSubtreeAdd/2048_per_subtree-4 29.17n 28.15n ~ 0.100
SubtreeProcessorAdd/4_per_subtree-4 288.5n 305.7n ~ 0.100
SubtreeProcessorAdd/64_per_subtree-4 282.2n 289.9n ~ 0.200
SubtreeProcessorAdd/256_per_subtree-4 284.2n 298.7n ~ 0.100
SubtreeProcessorAdd/1024_per_subtree-4 275.5n 289.5n ~ 0.100
SubtreeProcessorAdd/2048_per_subtree-4 274.7n 289.0n ~ 0.100
SubtreeProcessorRotate/4_per_subtree-4 281.4n 309.9n ~ 0.100
SubtreeProcessorRotate/64_per_subtree-4 277.2n 300.1n ~ 0.100
SubtreeProcessorRotate/256_per_subtree-4 278.9n 285.4n ~ 0.100
SubtreeProcessorRotate/1024_per_subtree-4 276.3n 282.8n ~ 0.100
SubtreeNodeAddOnly/4_per_subtree-4 54.76n 55.69n ~ 0.300
SubtreeNodeAddOnly/64_per_subtree-4 34.43n 36.23n ~ 0.100
SubtreeNodeAddOnly/256_per_subtree-4 33.44n 34.91n ~ 0.100
SubtreeNodeAddOnly/1024_per_subtree-4 32.81n 34.30n ~ 0.100
SubtreeCreationOnly/4_per_subtree-4 116.0n 125.8n ~ 0.100
SubtreeCreationOnly/64_per_subtree-4 407.2n 530.5n ~ 0.100
SubtreeCreationOnly/256_per_subtree-4 1.379µ 1.799µ ~ 0.100
SubtreeCreationOnly/1024_per_subtree-4 4.467µ 5.232µ ~ 0.100
SubtreeCreationOnly/2048_per_subtree-4 8.485µ 9.047µ ~ 0.100
SubtreeProcessorOverheadBreakdown/64_per_subtree-4 277.2n 286.3n ~ 0.200
SubtreeProcessorOverheadBreakdown/1024_per_subtree-4 278.1n 285.7n ~ 0.200
ParallelGetAndSetIfNotExists/1k_nodes-4 2.234m 2.278m ~ 0.100
ParallelGetAndSetIfNotExists/10k_nodes-4 5.461m 5.572m ~ 0.100
ParallelGetAndSetIfNotExists/50k_nodes-4 7.820m 7.791m ~ 1.000
ParallelGetAndSetIfNotExists/100k_nodes-4 10.85m 10.82m ~ 0.700
SequentialGetAndSetIfNotExists/1k_nodes-4 1.957m 1.991m ~ 0.100
SequentialGetAndSetIfNotExists/10k_nodes-4 5.666m 4.664m ~ 0.100
SequentialGetAndSetIfNotExists/50k_nodes-4 14.44m 13.01m ~ 0.100
SequentialGetAndSetIfNotExists/100k_nodes-4 23.91m 24.13m ~ 0.400
ProcessOwnBlockSubtreeNodesParallel/1k_nodes-4 2.269m 2.322m ~ 0.100
ProcessOwnBlockSubtreeNodesParallel/10k_nodes-4 8.320m 8.440m ~ 0.200
ProcessOwnBlockSubtreeNodesParallel/100k_nodes-4 13.46m 14.05m ~ 0.100
ProcessOwnBlockSubtreeNodesSequential/1k_nodes-4 1.984m 2.041m ~ 0.100
ProcessOwnBlockSubtreeNodesSequential/10k_nodes-4 7.679m 8.093m ~ 0.100
ProcessOwnBlockSubtreeNodesSequential/100k_nodes-4 42.65m 40.90m ~ 0.100
DiskTxMap_SetIfNotExists-4 3.685µ 3.640µ ~ 0.100
DiskTxMap_SetIfNotExists_Parallel-4 3.642µ 3.431µ ~ 1.000
DiskTxMap_ExistenceOnly-4 317.7n 320.0n ~ 1.000
Queue-4 190.6n 188.2n ~ 0.100
AtomicPointer-4 4.305n 4.517n ~ 0.100
ReorgOptimizations/DedupFilterPipeline/Old/10K-4 860.5µ 840.8µ ~ 0.100
ReorgOptimizations/DedupFilterPipeline/New/10K-4 817.0µ 782.7µ ~ 0.100
ReorgOptimizations/AllMarkFalse/Old/10K-4 104.6µ 103.8µ ~ 0.700
ReorgOptimizations/AllMarkFalse/New/10K-4 61.67µ 62.88µ ~ 0.100
ReorgOptimizations/HashSlicePool/Old/10K-4 68.54µ 55.99µ ~ 0.100
ReorgOptimizations/HashSlicePool/New/10K-4 12.36µ 11.34µ ~ 0.200
ReorgOptimizations/NodeFlags/Old/10K-4 5.567µ 4.739µ ~ 0.100
ReorgOptimizations/NodeFlags/New/10K-4 2.139µ 1.639µ ~ 0.100
ReorgOptimizations/DedupFilterPipeline/Old/100K-4 11.047m 9.536m ~ 0.100
ReorgOptimizations/DedupFilterPipeline/New/100K-4 11.72m 10.69m ~ 0.100
ReorgOptimizations/AllMarkFalse/Old/100K-4 1.119m 1.124m ~ 0.700
ReorgOptimizations/AllMarkFalse/New/100K-4 686.2µ 685.6µ ~ 1.000
ReorgOptimizations/HashSlicePool/Old/100K-4 540.1µ 564.8µ ~ 0.100
ReorgOptimizations/HashSlicePool/New/100K-4 297.6µ 298.3µ ~ 1.000
ReorgOptimizations/NodeFlags/Old/100K-4 49.84µ 49.39µ ~ 0.400
ReorgOptimizations/NodeFlags/New/100K-4 17.55µ 16.50µ ~ 0.100
TxMapSetIfNotExists-4 52.69n 53.13n ~ 1.000
TxMapSetIfNotExistsDuplicate-4 39.07n 39.05n ~ 1.000
ChannelSendReceive-4 625.0n 590.2n ~ 0.100
CalcBlockWork-4 550.5n 547.9n ~ 0.700
CalculateWork-4 720.7n 739.0n ~ 0.100
BuildBlockLocatorString_Helpers/Size_10-4 1.321µ 1.342µ ~ 0.100
BuildBlockLocatorString_Helpers/Size_100-4 15.93µ 13.00µ ~ 0.200
BuildBlockLocatorString_Helpers/Size_1000-4 125.7µ 145.4µ ~ 0.100
CatchupWithHeaderCache-4 104.1m 104.4m ~ 0.200
_BufferPoolAllocation/16KB-4 4.609µ 3.841µ ~ 0.400
_BufferPoolAllocation/32KB-4 7.640µ 8.770µ ~ 0.700
_BufferPoolAllocation/64KB-4 14.82µ 20.93µ ~ 0.100
_BufferPoolAllocation/128KB-4 23.56µ 37.88µ ~ 0.100
_BufferPoolAllocation/512KB-4 89.81µ 126.91µ ~ 0.100
_BufferPoolConcurrent/32KB-4 18.38µ 19.28µ ~ 0.100
_BufferPoolConcurrent/64KB-4 29.10µ 30.18µ ~ 0.700
_BufferPoolConcurrent/512KB-4 148.5µ 149.2µ ~ 0.700
_SubtreeDeserializationWithBufferSizes/16KB-4 622.1µ 636.1µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/32KB-4 632.9µ 637.9µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/64KB-4 632.2µ 626.2µ ~ 0.700
_SubtreeDeserializationWithBufferSizes/128KB-4 614.6µ 606.7µ ~ 0.200
_SubtreeDeserializationWithBufferSizes/512KB-4 610.7µ 609.0µ ~ 1.000
_SubtreeDataDeserializationWithBufferSizes/16KB-4 36.53m 36.96m ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/32KB-4 36.60m 36.92m ~ 0.400
_SubtreeDataDeserializationWithBufferSizes/64KB-4 36.21m 37.09m ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/128KB-4 36.44m 37.40m ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/512KB-4 36.33m 36.73m ~ 0.100
_PooledVsNonPooled/Pooled-4 742.3n 743.5n ~ 0.100
_PooledVsNonPooled/NonPooled-4 7.564µ 8.764µ ~ 0.100
_MemoryFootprint/Current_512KB_32concurrent-4 6.596µ 6.530µ ~ 0.400
_MemoryFootprint/Proposed_32KB_32concurrent-4 9.522µ 9.316µ ~ 0.100
_MemoryFootprint/Alternative_64KB_32concurrent-4 9.421µ 8.967µ ~ 0.100
SubtreeSizes/10k_tx_4_per_subtree-4 1.303m 1.343m ~ 1.000
SubtreeSizes/10k_tx_16_per_subtree-4 308.9µ 302.6µ ~ 0.400
SubtreeSizes/10k_tx_64_per_subtree-4 72.90µ 71.79µ ~ 0.100
SubtreeSizes/10k_tx_256_per_subtree-4 18.25µ 18.02µ ~ 0.700
SubtreeSizes/10k_tx_512_per_subtree-4 8.900µ 8.876µ ~ 0.700
SubtreeSizes/10k_tx_1024_per_subtree-4 4.458µ 4.371µ ~ 0.100
SubtreeSizes/10k_tx_2k_per_subtree-4 2.193µ 2.180µ ~ 0.700
BlockSizeScaling/10k_tx_64_per_subtree-4 69.74µ 70.88µ ~ 0.200
BlockSizeScaling/10k_tx_256_per_subtree-4 17.58µ 17.71µ ~ 0.700
BlockSizeScaling/10k_tx_1024_per_subtree-4 4.415µ 4.397µ ~ 0.400
BlockSizeScaling/50k_tx_64_per_subtree-4 378.1µ 369.7µ ~ 0.700
BlockSizeScaling/50k_tx_256_per_subtree-4 88.53µ 89.09µ ~ 0.700
BlockSizeScaling/50k_tx_1024_per_subtree-4 21.83µ 21.68µ ~ 0.100
SubtreeAllocations/small_subtrees_exists_check-4 152.6µ 149.7µ ~ 0.100
SubtreeAllocations/small_subtrees_data_fetch-4 160.7µ 160.8µ ~ 1.000
SubtreeAllocations/small_subtrees_full_validation-4 317.1µ 311.7µ ~ 0.200
SubtreeAllocations/medium_subtrees_exists_check-4 8.997µ 8.892µ ~ 0.200
SubtreeAllocations/medium_subtrees_data_fetch-4 9.488µ 9.415µ ~ 0.700
SubtreeAllocations/medium_subtrees_full_validation-4 17.64µ 17.71µ ~ 1.000
SubtreeAllocations/large_subtrees_exists_check-4 2.180µ 2.103µ ~ 0.200
SubtreeAllocations/large_subtrees_data_fetch-4 2.251µ 2.267µ ~ 0.400
SubtreeAllocations/large_subtrees_full_validation-4 4.377µ 4.372µ ~ 1.000
_prepareTxsPerLevel-4 375.7m 377.8m ~ 0.200
_prepareTxsPerLevelOrdered-4 4.089m 4.053m ~ 0.700
_prepareTxsPerLevel_Comparison/Original-4 379.8m 379.7m ~ 0.400
_prepareTxsPerLevel_Comparison/Optimized-4 3.918m 3.858m ~ 0.200
StoreBlock_Sequential/BelowCSVHeight-4 324.7µ 324.0µ ~ 0.200
StoreBlock_Sequential/AboveCSVHeight-4 331.7µ 325.0µ ~ 1.000
GetUtxoHashes-4 260.1n 258.8n ~ 0.200
GetUtxoHashes_ManyOutputs-4 43.16µ 43.24µ ~ 0.700
_NewMetaDataFromBytes-4 239.1n 237.1n ~ 0.400
_Bytes-4 630.6n 616.7n ~ 0.100
_MetaBytes-4 568.0n 563.4n ~ 0.100

Threshold: >10% with p < 0.05 | Generated: 2026-05-18 16:42 UTC

icellan added a commit to icellan/teranode that referenced this pull request May 18, 2026
…path

Address PR bsv-blockchain#886 review (Copilot + claude-code-review confirmed Major):
the pooled txMap allocated inside checkDuplicateTransactions was only
returned to the size-class pool on the success path. Any error between
the allocation and the final cleanup (a checkDuplicateTransactions
worker error, the txMap.Flush failure, or any validOrderAndBlessed
error) returned from Block.Valid with b.txMap still pointing at the
multi-GB pooled instance. At 654M-tx scale each leaked map pins ~30 GB
of backing storage on the Block until GC, which may not run for a
long time if the Block is held by the lastValidatedBlocks cache.

Extract the cleanup into a (b *Block).releaseTxMap helper (nil-safe,
idempotent, handles disk-backed / pooled-in-memory / generic Closer)
and call it via `defer` from Block.Valid immediately before invoking
checkDuplicateTransactions. The defer fires on every exit path, so
errors no longer leak the map.

Also address Copilot Minor on batchKeysPool: document the workload
assumption (uniform SetMinedMulti batch sizes within a single Store's
lifetime) and explain when size-class bucketing would become necessary.
icellan added 5 commits May 18, 2026 17:27
On dev-scale-2 (675M-tx blocks, ~117 GB serialized, every ~7 min) the
block-validator pod baselined at ~15 GB RSS and spiked to >600 GB
during each block. Live heap was only ~80-100 GB; the gap was
allocation churn, GC pacing, and avoidable map rehash transients.

This change attacks four sources concurrently. Each is independently
revertable; tests cover each in isolation.

1) Pre-size + bucket bump for the swiss maps (model/Block.go,
   model/map.go, go-tx-map@e0da75e)

   txMap and parentSpendsMap are now constructed with 1.2x per-bucket
   headroom inside go-tx-map.NewSplitSwissMapUint64 and
   model.NewSplitSyncedParentMap. For N=675M, B=8192 the per-bucket
   count is Binomial(N, 1/B) with mean ~82k and sigma ~285 — 20%
   headroom absorbs the max-of-B order statistic with a ~250x safety
   margin and the underlying dolthub/swiss map no longer triggers a
   mid-fill rehash. Live mutex profile on block 1122 showed peak 236
   goroutines blocked on SplitSyncedParentMap.SetIfNotExists at
   model/map.go b.mu.Lock() — that contention burst correlates 1:1
   with the rehash window and disappears once rehash is eliminated.

   Bucket count bumped 1024 -> 8192 (txMap) and 4096 -> 8192
   (parentSpendsMap) to halve steady-state collision rate at 192
   concurrent inserters from ~4.7% to ~2.3%.

2) Pool the maps across blocks (model/block_validation_pools.go,
   go-tx-map@e0da75e Clear())

   txMap and parentSpendsMap are drawn from size-class-keyed
   sync.Pools (4K..1B for tx counts, 16K..4B for parent spends)
   so the multi-GB backing storage is reused rather than allocated
   fresh per block. Maps are reset via the new Clear() methods on
   dolthub/swiss.Map wrappers (SwissMap, SwissMapUint64,
   SplitSwissMapUint64, SplitSyncedParentMap) — Clear zeroes the
   ctrl/metadata in place without releasing the (often multi-GB)
   group arrays.

   Network-load shifts (1K <-> 1M subtree size) are handled by
   sync.Pool's natural per-GC drainage: classes that stop being Got
   age out automatically.

3) Pool the per-subtree []Node arrays (services/blockvalidation/
   pools.go, go-subtree@54b2484
   DeserializeFromReaderWithAllocator/ReleaseNodes)

   For a 675M-tx block, ~620 subtrees x ~1M nodes x 48 B per Node =
   ~30 GB allocated and freed per block. go-subtree exposes a new
   DeserializeFromReaderWithAllocator that takes a caller-supplied
   NodeAllocator; teranode installs a size-class-bucketed pool
   (1K/4K/16K/64K/256K/1M/4M Node capacities) and releases the slice
   back to the pool on cache eviction or any validation/storage
   failure path. The existing DeserializeFromReader,
   deserializeNodes, and deserializeFromReaderMmap paths are
   unchanged.

4) Aerospike setMined per-record pools (stores/utxo/aerospike/
   set_mined.go, aerospike.go, teranode.go)

   After validation, every tx in the accepted block is marked mined
   via Aerospike batch UDFs. Live alloc profile attributed ~975 GB
   cumulative allocation per block to this path. Two safe in-repo
   pools land here:

   - aerospike.Key reuse via a per-Store sync.Pool. Per-Store
     because aerospike.Key.SetValue recomputes the digest using the
     Key's existing (unexported) setName — a package-level pool
     would cross-contaminate stores with different setNames and
     produce wrong digests.

   - LuaMapResponse pool with Reset() and a private
     parseLuaMapResponseInto sink so the parser fills a
     caller-owned struct. The existing public ParseLuaMapResponse
     remains an allocating thin wrapper so the 12+ unrelated
     callers in locked.go / alert_system.go / un_spend.go etc. are
     untouched.

   The aerospike-client-go-side packer/unpacker buffer pool (the
   single largest churn driver, ~395 GB cumulative parseRecord) is
   intentionally NOT in this PR — teranode main pins upstream
   v8.2.0 and that fix requires the BSV fork. It will land
   separately on the feat/teranode-native-ops (PR 828) branch
   which already uses the fork.

Benchmark: BenchmarkSplitSwissMapUint64_NoRehash (1M inserts) drops
from doubling-grow churn to 1 B/op, 0 allocs/op after construction,
confirming no mid-fill rehash.

Tests: go test -race -tags testtxmetacache ./model/... and the
targeted ./services/blockvalidation/... runs pass. The unrelated
DiskParentSpendsMap.Stats / writerLoop race surfaced under
block_diskMapDirs (set via local settings_local.conf) exists on
plain main too and is filed separately.

Fork branches:

- github.com/bsv-blockchain/go-tx-map feat/presize-headroom
  (1.2x headroom + Clear methods)
- github.com/bsv-blockchain/go-subtree feat/node-allocator
  (DeserializeFromReaderWithAllocator + ReleaseNodes)

Both pinned via pseudo-version in go.mod; bump to a tagged release
before merging.
…path

Address PR bsv-blockchain#886 review (Copilot + claude-code-review confirmed Major):
the pooled txMap allocated inside checkDuplicateTransactions was only
returned to the size-class pool on the success path. Any error between
the allocation and the final cleanup (a checkDuplicateTransactions
worker error, the txMap.Flush failure, or any validOrderAndBlessed
error) returned from Block.Valid with b.txMap still pointing at the
multi-GB pooled instance. At 654M-tx scale each leaked map pins ~30 GB
of backing storage on the Block until GC, which may not run for a
long time if the Block is held by the lastValidatedBlocks cache.

Extract the cleanup into a (b *Block).releaseTxMap helper (nil-safe,
idempotent, handles disk-backed / pooled-in-memory / generic Closer)
and call it via `defer` from Block.Valid immediately before invoking
checkDuplicateTransactions. The defer fires on every exit path, so
errors no longer leak the map.

Also address Copilot Minor on batchKeysPool: document the workload
assumption (uniform SetMinedMulti batch sizes within a single Store's
lifetime) and explain when size-class bucketing would become necessary.
@icellan icellan force-pushed the feat/block-validation-ram-optimization branch from 6ff34e8 to 8d71ebe Compare May 18, 2026 15:36
…nt-vs-empty semantic

processBatchResultsForSetMined checks `res.BlockIDs != nil` to decide
whether to insert the per-tx blockIDs into the returned map. The
contract observed by that check is "BlockIDs != nil iff the Lua
response actually contained a blockIDs field" — parseLuaMapResponseInto
only writes BlockIDs when respMap["blockIDs"] is present, so error
responses (e.g. TX_NOT_FOUND) leave BlockIDs untouched.

The pre-pool code allocated a fresh LuaMapResponse per record, so
BlockIDs started as nil and that contract held trivially. The pooled
Reset was truncating BlockIDs to [:0] instead of nil-ing it, to
preserve the slice capacity across calls. That violated the contract:
after a successful record set BlockIDs = [333], the next record's
Reset left it as a non-nil empty slice, and the caller then inserted
the failed record's hash into the returned map with an empty value.

Fixes TestAerospike/aerospike_setmined_multi_partial_failure caught
by CI on PR bsv-blockchain#886. The slice-cap reuse was a minor optimization not
worth violating the public-facing semantic; per-record BlockIDs are
small (typically O(1) for an unforked chain) so the allocation cost
is negligible.

Errors map is still cleared-not-nil-ed because the loop body checks
res.Signal and other scalar fields rather than relying on
r.Errors == nil.
@sonarqubecloud

Copy link
Copy Markdown

icellan added a commit that referenced this pull request May 18, 2026
…path

Address PR #886 review (Copilot + claude-code-review confirmed Major):
the pooled txMap allocated inside checkDuplicateTransactions was only
returned to the size-class pool on the success path. Any error between
the allocation and the final cleanup (a checkDuplicateTransactions
worker error, the txMap.Flush failure, or any validOrderAndBlessed
error) returned from Block.Valid with b.txMap still pointing at the
multi-GB pooled instance. At 654M-tx scale each leaked map pins ~30 GB
of backing storage on the Block until GC, which may not run for a
long time if the Block is held by the lastValidatedBlocks cache.

Extract the cleanup into a (b *Block).releaseTxMap helper (nil-safe,
idempotent, handles disk-backed / pooled-in-memory / generic Closer)
and call it via `defer` from Block.Valid immediately before invoking
checkDuplicateTransactions. The defer fires on every exit path, so
errors no longer leak the map.

Also address Copilot Minor on batchKeysPool: document the workload
assumption (uniform SetMinedMulti batch sizes within a single Store's
lifetime) and explain when size-class bucketing would become necessary.
icellan added a commit that referenced this pull request May 18, 2026
…nt-vs-empty semantic

processBatchResultsForSetMined checks `res.BlockIDs != nil` to decide
whether to insert the per-tx blockIDs into the returned map. The
contract observed by that check is "BlockIDs != nil iff the Lua
response actually contained a blockIDs field" — parseLuaMapResponseInto
only writes BlockIDs when respMap["blockIDs"] is present, so error
responses (e.g. TX_NOT_FOUND) leave BlockIDs untouched.

The pre-pool code allocated a fresh LuaMapResponse per record, so
BlockIDs started as nil and that contract held trivially. The pooled
Reset was truncating BlockIDs to [:0] instead of nil-ing it, to
preserve the slice capacity across calls. That violated the contract:
after a successful record set BlockIDs = [333], the next record's
Reset left it as a non-nil empty slice, and the caller then inserted
the failed record's hash into the returned map with an empty value.

Fixes TestAerospike/aerospike_setmined_multi_partial_failure caught
by CI on PR #886. The slice-cap reuse was a minor optimization not
worth violating the public-facing semantic; per-record BlockIDs are
small (typically O(1) for an unforked chain) so the allocation cost
is negligible.

Errors map is still cleared-not-nil-ed because the loop body checks
res.Signal and other scalar fields rather than relying on
r.Errors == nil.
@icellan icellan self-assigned this May 18, 2026
@icellan icellan merged commit 74e933a into bsv-blockchain:main May 18, 2026
25 checks passed
icellan added a commit that referenced this pull request May 18, 2026
Trivial: origin/main now contains PR #886 (squash-merged as 74e933a).
Our branch already cherry-picked the same content earlier, so the three
file-level conflicts in stores/utxo/aerospike/ (aerospike.go, set_mined.go,
teranode.go) are PR #886's changes-vs-themselves-plus-PR-#828-extras. Kept
HEAD in each case — it has both the cherry-picked PR #886 work and the
PR #828 additions (useNativeTeranodeOps field on the Store, richer
batchRecord diagnostics in processSingleBatchRecordPooled, luaResponse*
helpers in parseLuaMapResponseInto for native-op response shapes).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants