perf(blockassembly): adopt packed TxInpoints + zero-alloc columnar path by icellan · Pull Request #889 · bsv-blockchain/teranode

icellan · 2026-05-18T18:44:54Z

Summary

Bumps go-subtree to v1.4.0 (packed-layout TxInpoints) and reworks both block-assembly receive paths to consume the new shape with minimal heap churn.

Block-assembly is the single non-horizontally-scalable service in the cluster, so every per-tx allocation eliminated there is amortised across every TPS-second.

What `go-subtree` v1.4.0 brings

TxInpoints stores per-parent vouts in a single count-prefixed []uint32 (was nested [][]uint32) — one fewer alloc per parent.
NewTxInpoints sizes via len(tx.Inputs), preserving the no-grow guarantee the prior cap-8 / cap-16 was aiming for without paying for unused slack on typical 1-2-input txs.
NewTxInpointsFromPacked(parents, voutIdxs) aliases caller-owned slices into a TxInpoints with zero allocation and zero validation — the hot-path constructor used by columnar gRPC.
The public Idxs field is removed (rather than renamed) so external code on the old API fails to compile rather than silently misuse the new layout.

What this PR does in teranode

1. `AddTxBatchColumnar` wire-format bump (proto)

Replaced parent_vout_indices + vout_idx_offsets (per-parent) with vout_idxs_packed + vout_idxs_tx_offsets (per-tx).
The packed buffer is byte-identical to TxInpoints.voutIdxs's internal layout, so the Server aliases a per-tx slice straight into a TxInpoints via NewTxInpointsFromPacked.
Parent hashes reinterpreted with unsafe.Slice once per batch (chainhash.Hash is [32]byte, byte-aligned over []byte backing).
Per-tx allocations in the columnar handler: ~4 → 0.

2. `AddTxBatch` (row-oriented) handler

Batch-allocate []TxInpoints values once and take addresses into txInpointsList, instead of escaping a fresh TxInpoints per tx.
Per-tx allocations: ~2 → 0.

3. `Client.go` columnar emit

Build vout_idxs_packed as [count_0, vals..., count_1, vals..., ...] directly, ready for zero-decode consumption.

4. Test migrations (17 files)

Construction sites now use NewTxInpointsFromInputs with fake *bt.Input values; read sites use GetParentVoutsAtIndex.
server_columnar_test.go rewritten end-to-end against the new proto shape including a new validator for vout_idxs_tx_offsets length.

Expected impact at 1 M+ TPS

With UseColumnarBatch=true and the packed layout enabled:

Metric	Before	After
Server-side allocs/sec (receive path)	~4 M	~3 K
GC-tracked objects in receive path	~1.2 B at 600 M live txs	~1.2 M (≈1000× reduction)
Estimated CPU saved on block-assembly	—	~20-25 % (dominated by GC mark)
Wire bytes per 1-input tx	~110 B	~95 B

The CPU win comes mostly from collapsing per-tx slice objects in currentTxMap into shared per-batch buffers that GC counts as one object regardless of how many transactions alias into them.

Deploy notes

The AddTxBatchColumnar wire-format change is internal between Validator and block-assembly. Validators and block-assembly must deploy together.
UseColumnarBatch is still false by default — flip it per environment after the lock-step deploy.
The row-oriented AddTxBatch path stays fully compatible.

Test plan

go build ./...
go vet ./... clean (modulo pre-existing test/utils/ lock-by-value warnings)
go test -race ./services/blockassembly/... ./services/asset/httpimpl/... ./stores/utxo/meta/... ./stores/utxo/ ./stores/txmetacache/... ./services/validator/... ./model/... — all green
golangci-lint run clean on touched files (only pre-existing prealloc warnings remain in unrelated benchmarks)
Smoke test in scale-2 (toggle UseColumnarBatch=true after deploy)

Bumps go-subtree to v1.4.0 (packed-layout TxInpoints) and reworks the block-assembly receive paths to consume the new shape with minimal heap churn. The block-assembly Server is the single non-horizontally-scalable service in the cluster, so every per-tx allocation eliminated there pays off proportionally to ingestion rate. go-subtree v1.4.0 changes (consumed here): - TxInpoints stores per-parent vouts in a single count-prefixed []uint32 (was nested [][]uint32) — drops one alloc per parent. - NewTxInpoints sizes via len(tx.Inputs), preserves the no-grow guarantee the prior cap-8/cap-16 was aiming for without paying for unused slack on typical 1-2-input txs. - NewTxInpointsFromPacked aliases caller-owned (parents, voutIdxs) slices into a TxInpoints with zero allocation and zero validation. - Public field Idxs removed; callers use GetParentVoutsAtIndex. teranode changes: 1. AddTxBatchColumnar wire format (proto bump): - Replaced parent_vout_indices + vout_idx_offsets (per-parent) with vout_idxs_packed + vout_idxs_tx_offsets (per-tx). The packed buffer is byte-identical to the layout block-assembly stores internally, so the Server aliases a per-tx slice straight into a TxInpoints via NewTxInpointsFromPacked. Two slice ops per tx, zero allocation. - Parent hashes reinterpreted with unsafe.Slice once per batch (chainhash.Hash is [32]byte, byte-aligned over []byte backing). - Per-tx allocations in the columnar handler: ~4 -> 0. 2. AddTxBatch (row-oriented) handler: - Batch-allocate []TxInpoints values once and take addresses into txInpointsList, instead of escaping a fresh TxInpoints per tx. - Per-tx allocations: ~2 -> 0. 3. Client.go columnar emit: - Build vout_idxs_packed as [count_0, vals..., count_1, vals..., ...] directly, ready for zero-decode consumption. Test migrations: - 17 test files moved off the removed Idxs field. Construction sites now use NewTxInpointsFromInputs with fake *bt.Input values; read sites use GetParentVoutsAtIndex. - server_columnar_test.go rewritten end-to-end against the new proto shape including a new validator for vout_idxs_tx_offsets length. Expected at 1M+ TPS (UseColumnarBatch=true, packed layout): - Server-side allocs/sec: ~4M -> ~3K (-99.9 %) - GC-tracked objects in receive path: ~1000x reduction (per-tx TxInpoints inner slices replaced with batch-shared buffers aliased into per-tx slice headers). - Estimated ~20-25 % CPU saved on block-assembly, dominated by reduced GC mark cost. Wire-protocol bump is internal between Validator and block-assembly; deploy in lock-step.

github-actions · 2026-05-18T18:46:16Z

🤖 Claude Code Review

Status: Complete

Current Review:
Both previously flagged issues have been addressed:

✅ Offset validation (Server.go:1094-1139): Comprehensive bounds checking prevents slice-out-of-range panics. Validates start positions, endpoints, and monotonicity for both offset arrays. Covered by regression tests.
✅ Proto field reservation (blockassembly_api.proto:151-152): Fields 8 and 9 are properly reserved to prevent silent data corruption during version skew. New fields use tags 10 and 11.

The implementation demonstrates careful attention to performance and safety:

Zero-allocation columnar deserialization with proper validation
Extensive test coverage including malformed input cases
Well-documented unsafe pointer usage with clear lifetime assumptions
Appropriate use of field reservations to fail-fast on version mismatches

No additional issues found.

github-actions · 2026-05-18T18:48:35Z

Critical Issue Found in Server.go:1089

[Critical] Missing bounds validation for offset arrays

The validation checks that ParentTxOffsets and VoutIdxsTxOffsets have the correct length (txCount+1), but does not verify that the offset values themselves are within bounds of their respective data arrays.

Potential panic scenarios at lines 1131-1132 if offset values exceed array bounds.

See services/blockassembly/Server.go:1089-1132 for details.

Copilot

Pull request overview

This PR upgrades go-subtree to v1.4.0 and adapts block-assembly ingestion paths and tests to the new packed TxInpoints layout, with the main goal of reducing allocations in block-assembly receive paths.

Changes:

Updates columnar block-assembly gRPC format to send packed vout indices and reconstruct TxInpoints without per-transaction decoding.
Optimizes row-oriented batch ingestion by storing TxInpoints values in a batch slice.
Migrates tests and helpers away from the removed TxInpoints.Idxs field.

Reviewed changes

Copilot reviewed 25 out of 27 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
`go.mod`	Bumps `go-subtree` and `go-safe-conversion` versions.
`go.sum`	Adds checksums for updated dependencies.
`services/blockassembly/Server.go`	Reworks row and columnar AddTxBatch receive paths for packed inpoints.
`services/blockassembly/Client.go`	Emits count-prefixed packed vout indices for columnar batches.
`services/blockassembly/blockassembly_api/blockassembly_api.proto`	Updates columnar request fields/documentation for packed vouts.
`services/blockassembly/blockassembly_api/blockassembly_api.pb.go`	Regenerates Go protobuf bindings for the updated request shape.
`services/blockassembly/server_columnar_test.go`	Migrates columnar tests to the new packed format.
`services/blockassembly/inpoints_helper_test.go`	Adds a test helper for single-parent `TxInpoints`.
`services/blockassembly/server_test.go`	Updates test inpoint construction to new APIs.
`services/blockassembly/filter_transactions_test.go`	Replaces direct `Idxs` literals with helper-built inpoints.
`services/blockassembly/data_test.go`	Updates serialization tests for packed inpoint construction.
`services/blockassembly/parent_validation_ordering_test.go`	Builds parent inpoints via fake inputs.
`services/blockassembly/subtreeprocessor/SubtreeProcessor_test.go`	Updates subtree metadata test inpoint construction.
`services/blockassembly/subtreeprocessor/disk_tx_map_test.go`	Migrates disk tx map serialization test to packed inpoints.
`services/blockassembly/subtreeprocessor/conflicting_queue_race_test.go`	Updates race test inpoints helper usage.
`services/validator/Server_test.go`	Adds shared validator test helper for single-parent inpoints.
`services/validator/Server_coverage_test.go`	Reuses validator inpoint helper in coverage tests.
`services/asset/httpimpl/test.go`	Builds transaction metadata inpoints via `NewTxInpointsFromInputs`.
`services/asset/httpimpl/GetTransactionMeta_test.go`	Replaces static inpoint literal with constructor helper.
`stores/utxo/unmined_serialization_test.go`	Updates unmined transaction serialization fixtures.
`stores/utxo/meta/data_test.go`	Adds test helpers and updates metadata tests/benchmarks.
`stores/txmetacache/txmetacache_test.go`	Updates TxMetaCache assertions and fixtures for packed inpoints.
`model/TestHelper.go`	Builds generated block subtree metadata using new inpoint constructor.
`model/Block_test.go`	Migrates subtree metadata test construction to fake inputs.
`test/longtest/services/blockassembly/subtreeprocessor/SubtreeProcessor_test.go`	Updates long-test subtree inpoints setup.
`test/longtest/services/blockassembly/Server_test.go`	Updates performance-test serialized inpoints setup.
`test/longtest/model/BlockBig_test.go`	Adjusts expected empty `TxInpoints` value.

Files not reviewed (1)

services/blockassembly/blockassembly_api/blockassembly_api.pb.go: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

github-actions · 2026-05-18T19:00:21Z

Benchmark Comparison Report

Baseline: main (unknown)

Current: PR-889 (5e0e4a3)

Summary

Regressions: 0
Improvements: 0
Unchanged: 142
Significance level: p < 0.05

All benchmark results (sec/op)

Benchmark	Baseline	Current	Change	p-value
_NewBlockFromBytes-4	1.661µ	1.681µ	~	0.100
SplitSyncedParentMap_SetIfNotExists/256_buckets-4	61.79n	61.84n	~	1.000
SplitSyncedParentMap_SetIfNotExists/16_buckets-4	62.15n	61.86n	~	0.300
SplitSyncedParentMap_SetIfNotExists/1_bucket-4	61.76n	61.70n	~	0.400
SplitSyncedParentMap_ConcurrentSetIfNotExists/256_buckets...	29.33n	30.44n	~	0.100
SplitSyncedParentMap_ConcurrentSetIfNotExists/16_buckets_...	49.67n	50.38n	~	0.400
SplitSyncedParentMap_ConcurrentSetIfNotExists/1_bucket_pa...	108.0n	112.4n	~	0.100
MiningCandidate_Stringify_Short-4	268.5n	262.5n	~	0.100
MiningCandidate_Stringify_Long-4	1.869µ	1.870µ	~	0.700
MiningSolution_Stringify-4	954.1n	963.0n	~	0.100
BlockInfo_MarshalJSON-4	1.754µ	1.754µ	~	0.800
NewFromBytes-4	94.19n	104.60n	~	0.700
Mine_EasyDifficulty-4	67.47µ	67.39µ	~	0.400
Mine_WithAddress-4	7.343µ	7.329µ	~	1.000
BlockAssembler_AddTx-4	0.02916n	0.02854n	~	1.000
AddNode-4	10.50	10.53	~	1.000
AddNodeWithMap-4	11.45	10.88	~	0.200
DirectSubtreeAdd/4_per_subtree-4	55.44n	60.58n	~	0.100
DirectSubtreeAdd/64_per_subtree-4	30.01n	28.96n	~	0.200
DirectSubtreeAdd/256_per_subtree-4	27.76n	27.84n	~	0.700
DirectSubtreeAdd/1024_per_subtree-4	26.41n	26.57n	~	0.100
DirectSubtreeAdd/2048_per_subtree-4	25.98n	26.10n	~	0.100
SubtreeProcessorAdd/4_per_subtree-4	293.4n	291.7n	~	0.400
SubtreeProcessorAdd/64_per_subtree-4	285.8n	292.6n	~	0.100
SubtreeProcessorAdd/256_per_subtree-4	282.2n	303.6n	~	0.100
SubtreeProcessorAdd/1024_per_subtree-4	277.0n	280.0n	~	0.200
SubtreeProcessorAdd/2048_per_subtree-4	278.5n	280.4n	~	0.700
SubtreeProcessorRotate/4_per_subtree-4	285.5n	283.6n	~	0.400
SubtreeProcessorRotate/64_per_subtree-4	284.6n	290.0n	~	0.700
SubtreeProcessorRotate/256_per_subtree-4	280.7n	289.1n	~	0.100
SubtreeProcessorRotate/1024_per_subtree-4	283.4n	279.5n	~	0.100
SubtreeNodeAddOnly/4_per_subtree-4	54.98n	54.95n	~	1.000
SubtreeNodeAddOnly/64_per_subtree-4	36.05n	35.95n	~	0.700
SubtreeNodeAddOnly/256_per_subtree-4	35.04n	35.04n	~	1.000
SubtreeNodeAddOnly/1024_per_subtree-4	34.49n	34.28n	~	0.100
SubtreeCreationOnly/4_per_subtree-4	110.9n	112.3n	~	0.200
SubtreeCreationOnly/64_per_subtree-4	351.6n	355.4n	~	1.000
SubtreeCreationOnly/256_per_subtree-4	1.234µ	1.232µ	~	0.700
SubtreeCreationOnly/1024_per_subtree-4	3.837µ	3.797µ	~	0.100
SubtreeCreationOnly/2048_per_subtree-4	6.934µ	6.820µ	~	0.100
SubtreeProcessorOverheadBreakdown/64_per_subtree-4	303.2n	287.0n	~	0.100
SubtreeProcessorOverheadBreakdown/1024_per_subtree-4	348.1n	291.0n	~	0.100
ParallelGetAndSetIfNotExists/1k_nodes-4	2.027m	2.012m	~	1.000
ParallelGetAndSetIfNotExists/10k_nodes-4	5.270m	5.205m	~	0.200
ParallelGetAndSetIfNotExists/50k_nodes-4	7.463m	7.551m	~	1.000
ParallelGetAndSetIfNotExists/100k_nodes-4	11.15m	10.15m	~	0.100
SequentialGetAndSetIfNotExists/1k_nodes-4	1.805m	1.781m	~	0.700
SequentialGetAndSetIfNotExists/10k_nodes-4	4.789m	4.602m	~	0.100
SequentialGetAndSetIfNotExists/50k_nodes-4	14.20m	14.54m	~	1.000
SequentialGetAndSetIfNotExists/100k_nodes-4	25.75m	26.31m	~	0.400
ProcessOwnBlockSubtreeNodesParallel/1k_nodes-4	2.046m	2.065m	~	0.700
ProcessOwnBlockSubtreeNodesParallel/10k_nodes-4	8.433m	8.543m	~	0.700
ProcessOwnBlockSubtreeNodesParallel/100k_nodes-4	13.49m	13.68m	~	0.400
ProcessOwnBlockSubtreeNodesSequential/1k_nodes-4	1.849m	1.821m	~	1.000
ProcessOwnBlockSubtreeNodesSequential/10k_nodes-4	8.916m	8.146m	~	0.100
ProcessOwnBlockSubtreeNodesSequential/100k_nodes-4	44.82m	45.02m	~	0.700
DiskTxMap_SetIfNotExists-4	3.847µ	3.886µ	~	0.700
DiskTxMap_SetIfNotExists_Parallel-4	3.587µ	3.534µ	~	0.400
DiskTxMap_ExistenceOnly-4	321.0n	338.5n	~	0.100
Queue-4	188.1n	189.4n	~	0.200
AtomicPointer-4	3.684n	3.239n	~	0.100
ReorgOptimizations/DedupFilterPipeline/Old/10K-4	803.0µ	807.9µ	~	1.000
ReorgOptimizations/DedupFilterPipeline/New/10K-4	762.8µ	763.4µ	~	1.000
ReorgOptimizations/AllMarkFalse/Old/10K-4	102.8µ	104.9µ	~	0.700
ReorgOptimizations/AllMarkFalse/New/10K-4	64.19µ	64.06µ	~	1.000
ReorgOptimizations/HashSlicePool/Old/10K-4	50.05µ	49.94µ	~	0.700
ReorgOptimizations/HashSlicePool/New/10K-4	11.02µ	10.95µ	~	0.100
ReorgOptimizations/NodeFlags/Old/10K-4	4.714µ	4.332µ	~	0.100
ReorgOptimizations/NodeFlags/New/10K-4	1.724µ	1.540µ	~	0.100
ReorgOptimizations/DedupFilterPipeline/Old/100K-4	9.563m	9.531m	~	1.000
ReorgOptimizations/DedupFilterPipeline/New/100K-4	9.729m	10.147m	~	0.100
ReorgOptimizations/AllMarkFalse/Old/100K-4	1.068m	1.103m	~	0.100
ReorgOptimizations/AllMarkFalse/New/100K-4	704.9µ	703.3µ	~	0.400
ReorgOptimizations/HashSlicePool/Old/100K-4	473.5µ	497.0µ	~	0.200
ReorgOptimizations/HashSlicePool/New/100K-4	197.0µ	197.0µ	~	1.000
ReorgOptimizations/NodeFlags/Old/100K-4	48.96µ	48.08µ	~	0.100
ReorgOptimizations/NodeFlags/New/100K-4	16.83µ	16.73µ	~	0.700
TxMapSetIfNotExists-4	46.26n	49.71n	~	0.100
TxMapSetIfNotExistsDuplicate-4	38.74n	42.57n	~	0.100
ChannelSendReceive-4	574.3n	576.9n	~	0.700
CalcBlockWork-4	499.8n	470.4n	~	0.700
CalculateWork-4	624.4n	672.4n	~	0.200
BuildBlockLocatorString_Helpers/Size_10-4	1.361µ	1.385µ	~	0.100
BuildBlockLocatorString_Helpers/Size_100-4	13.51µ	13.06µ	~	0.700
BuildBlockLocatorString_Helpers/Size_1000-4	130.8µ	162.1µ	~	0.700
CatchupWithHeaderCache-4	104.7m	104.5m	~	0.700
_BufferPoolAllocation/16KB-4	3.813µ	4.018µ	~	0.100
_BufferPoolAllocation/32KB-4	8.995µ	9.407µ	~	0.400
_BufferPoolAllocation/64KB-4	18.42µ	16.03µ	~	0.100
_BufferPoolAllocation/128KB-4	36.20µ	25.59µ	~	0.100
_BufferPoolAllocation/512KB-4	96.89µ	101.34µ	~	0.400
_BufferPoolConcurrent/32KB-4	18.62µ	18.50µ	~	1.000
_BufferPoolConcurrent/64KB-4	29.52µ	31.31µ	~	0.100
_BufferPoolConcurrent/512KB-4	144.0µ	143.0µ	~	0.400
_SubtreeDeserializationWithBufferSizes/16KB-4	623.4µ	627.1µ	~	0.700
_SubtreeDeserializationWithBufferSizes/32KB-4	624.8µ	622.5µ	~	1.000
_SubtreeDeserializationWithBufferSizes/64KB-4	613.5µ	616.1µ	~	0.100
_SubtreeDeserializationWithBufferSizes/128KB-4	601.8µ	598.3µ	~	0.100
_SubtreeDeserializationWithBufferSizes/512KB-4	617.2µ	606.8µ	~	0.100
_SubtreeDataDeserializationWithBufferSizes/16KB-4	36.63m	36.15m	~	0.700
_SubtreeDataDeserializationWithBufferSizes/32KB-4	36.70m	36.24m	~	0.100
_SubtreeDataDeserializationWithBufferSizes/64KB-4	36.52m	36.21m	~	0.200
_SubtreeDataDeserializationWithBufferSizes/128KB-4	36.53m	36.01m	~	0.100
_SubtreeDataDeserializationWithBufferSizes/512KB-4	36.06m	35.96m	~	1.000
_PooledVsNonPooled/Pooled-4	740.1n	737.9n	~	0.100
_PooledVsNonPooled/NonPooled-4	7.805µ	7.546µ	~	0.200
_MemoryFootprint/Current_512KB_32concurrent-4	6.610µ	6.688µ	~	0.400
_MemoryFootprint/Proposed_32KB_32concurrent-4	9.303µ	9.522µ	~	0.100
_MemoryFootprint/Alternative_64KB_32concurrent-4	9.119µ	9.712µ	~	0.100
SubtreeSizes/10k_tx_4_per_subtree-4	1.380m	1.396m	~	0.400
SubtreeSizes/10k_tx_16_per_subtree-4	322.6µ	322.1µ	~	0.700
SubtreeSizes/10k_tx_64_per_subtree-4	77.18µ	76.68µ	~	1.000
SubtreeSizes/10k_tx_256_per_subtree-4	19.15µ	19.09µ	~	1.000
SubtreeSizes/10k_tx_512_per_subtree-4	9.408µ	9.528µ	~	0.200
SubtreeSizes/10k_tx_1024_per_subtree-4	4.652µ	4.734µ	~	0.200
SubtreeSizes/10k_tx_2k_per_subtree-4	2.315µ	2.355µ	~	0.100
BlockSizeScaling/10k_tx_64_per_subtree-4	74.57µ	74.58µ	~	0.700
BlockSizeScaling/10k_tx_256_per_subtree-4	18.80µ	18.96µ	~	0.400
BlockSizeScaling/10k_tx_1024_per_subtree-4	4.638µ	4.762µ	~	0.100
BlockSizeScaling/50k_tx_64_per_subtree-4	395.3µ	393.9µ	~	0.700
BlockSizeScaling/50k_tx_256_per_subtree-4	93.20µ	95.44µ	~	0.200
BlockSizeScaling/50k_tx_1024_per_subtree-4	22.86µ	23.06µ	~	0.400
SubtreeAllocations/small_subtrees_exists_check-4	156.2µ	161.1µ	~	0.100
SubtreeAllocations/small_subtrees_data_fetch-4	166.7µ	170.6µ	~	0.400
SubtreeAllocations/small_subtrees_full_validation-4	321.6µ	328.6µ	~	0.100
SubtreeAllocations/medium_subtrees_exists_check-4	9.253µ	9.434µ	~	0.100
SubtreeAllocations/medium_subtrees_data_fetch-4	9.389µ	9.888µ	~	0.100
SubtreeAllocations/medium_subtrees_full_validation-4	18.46µ	19.02µ	~	0.100
SubtreeAllocations/large_subtrees_exists_check-4	2.191µ	2.274µ	~	0.100
SubtreeAllocations/large_subtrees_data_fetch-4	2.348µ	2.409µ	~	0.100
SubtreeAllocations/large_subtrees_full_validation-4	4.630µ	4.793µ	~	0.100
_prepareTxsPerLevel-4	391.3m	388.8m	~	1.000
_prepareTxsPerLevelOrdered-4	4.897m	4.133m	~	0.100
_prepareTxsPerLevel_Comparison/Original-4	400.3m	386.5m	~	0.100
_prepareTxsPerLevel_Comparison/Optimized-4	4.603m	4.075m	~	0.100
StoreBlock_Sequential/BelowCSVHeight-4	333.2µ	332.6µ	~	1.000
StoreBlock_Sequential/AboveCSVHeight-4	333.7µ	334.4µ	~	1.000
GetUtxoHashes-4	276.8n	277.6n	~	1.000
GetUtxoHashes_ManyOutputs-4	45.42µ	45.53µ	~	0.400
_NewMetaDataFromBytes-4	232.0n	213.3n	~	0.100
_Bytes-4	610.2n	395.4n	~	0.100
_MetaBytes-4	560.8n	336.3n	~	0.100

Threshold: >10% with p < 0.05 | Generated: 2026-05-19 10:44 UTC

PR #889 bumped go-subtree to v1.4.0, which made SubtreeMeta.Serialize reject empty TxInpoints on non-coinbase entries with "parent tx hashes are not set for node N". The test helper still seeded every node with subtreepkg.NewTxInpoints() (no parents), so TestBlock_ValidateSubtree_MissingParents/multiple_invalid_parents and TestBlock_ValidateSubtree_NodeIteration/subtree_with_multiple_nodes_iteration broke as soon as they used the multi-node form. Helper now assigns a single dummy parent to every node except a coinbase placeholder at index 0. Matches the construction pattern used by createSubtreeMetadataWithParents and by the existing PR #889 migration in TestHelper.go.

freemans13

There are failing tests but aside from that this is fab

go-subtree v1.4.0's NewTxInpoints() returns the zero value (nil ParentTxHashes). Meta.Serialize rejects nil ParentTxHashes for non-coinbase nodes, so createValidSubtreeMetadata and the two inline-init sites in Block_test.go fail with "cannot serialize, parent tx hashes are not set for node N" when constructing test fixtures. Pre-v1.4.0 NewTxInpoints returned a cap-8 empty slice, so the same code path worked. Replace those five sites with an explicit TxInpoints{ParentTxHashes: []chainhash.Hash{}} literal — semantically identical and unambiguous about the invariant the test relies on. The proper fix is upstream — go-subtree PR #126 restores the non-nil empty ParentTxHashes from NewTxInpoints. Once v1.4.1 lands these literals can be reverted to NewTxInpoints() calls.

…xtures" This reverts commit bcd3f00.

v1.4.1 lands the NewTxInpoints empty-but-non-nil ParentTxHashes fix (go-subtree #126). With that in place the interim Block_test.go workaround reverted in the previous commit is no longer needed.

Two findings from external review of PR #889 addressed: 1. Proto field-number reuse: fields 8/9 previously carried different semantics than the new packed-layout fields. Same wire type means a misversioned client silently decodes into the wrong fields. Renumber the new fields to 10/11 and add `reserved 8, 9` (+ reserved names). Even with the lock-step-deploy operational model, this prevents the silent semantic break if a stray old-version request ever arrives. 2. Offset-validation gaps in AddTxBatchColumnar: the handler indexed into parents[ParentTxOffsets[i]:ParentTxOffsets[i+1]] and similarly for VoutIdxsPacked without checking offset endpoints, bounds, or monotonicity. A malformed request triggered a slice-bounds-out-of-range panic. grpc-go does not recover handler panics — a single bad packet would crash the entire single-instance block-assembly process. Add cheap endpoint + monotonicity checks that guarantee every per-tx slice expression is bounds-safe. The validator is trusted at the semantic layer, so we do not walk the packed voutIdxs to verify the count-prefix invariant (that walk would be O(B*P) on the hot path); we only do what is required to convert a process-crashing panic into a clean InvalidArgument response. Cost: 419 ns per 1000-tx batch (Apple M3 Max, isolated validation loop benchmark). At 1M TPS with batch 1000 that is ~0.04 % of a core. Tests: - TestAddTxBatchColumnar_RejectsOOBParentTxOffsets - TestAddTxBatchColumnar_RejectsNonMonotonicOffsets - TestAddTxBatchColumnar_RejectsOOBVoutIdxsTxOffsets - BenchmarkAddTxBatchColumnar_Validation (full handler, 1465 ns/op) - BenchmarkOffsetValidationLoop (isolated, 419 ns/op) Defensive comment added in Server.go documenting the proto.Unmarshal aliasing assumption — the unsafe.Slice on ParentTxHashesPacked relies on protobuf decoding bytes fields into fresh Go-heap allocations rather than aliasing the gRPC receive buffer. A future zero-copy codec would invalidate this and require rework.

…d-layout # Conflicts: # go.mod # go.sum

sonarqubecloud · 2026-05-19T10:41:38Z

Quality Gate passed

Issues
1 New issue
0 Accepted issues

Measures
0 Security Hotspots
91.5% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

oskarszoon

Both prior critical blockers are closed:

C1 (offset validation): cb21b2240 adds 4 endpoint checks + monotonicity loop in AddTxBatchColumnar. Every slice expression in the per-tx loop is bounds-safe; new tests cover OOB and non-monotonic cases. Cost ~0.04% of a core at 1M TPS / batch 1000.
C2 (proto field reuse): .proto reserves field numbers 8 and 9 plus the old field names; new fields renumbered to 10/11. .pb.go regenerated cleanly.

go-subtree v1.4.0 → v1.4.1 is a correctness-only bump (empty-but-non-nil ParentTxHashes), zero perf delta. The Merkle-root lift in submitMiningSolution is not new in this PR — it came in via the merge from main.

A rolling-deploy mismatch where one block-assembly replica predates bsv-blockchain#889 silently drops to the row path; without a log line operators investigating "why is this batch slow" can't tell the fallback fired. Debug level keeps it out of normal log volume but available when needed.

icellan requested review from Copilot, freemans13 and oskarszoon May 18, 2026 18:46

Copilot started reviewing on behalf of icellan May 18, 2026 18:46 View session

icellan self-assigned this May 18, 2026

Copilot AI reviewed May 18, 2026

View reviewed changes

Comment thread services/blockassembly/Server.go

Comment thread services/blockassembly/blockassembly_api/blockassembly_api.proto Outdated

freemans13 approved these changes May 18, 2026

View reviewed changes

icellan mentioned this pull request May 19, 2026

fix(inpoints): NewTxInpoints returns non-nil empty ParentTxHashes bsv-blockchain/go-subtree#126

Merged

2 tasks

icellan added 5 commits May 19, 2026 11:06

Revert "test(blockassembly): inline empty TxInpoints in Block_test fi…

e93bed7

…xtures" This reverts commit bcd3f00.

chore(deps): bump go-subtree v1.4.0 -> v1.4.1

0c9e282

v1.4.1 lands the NewTxInpoints empty-but-non-nil ParentTxHashes fix (go-subtree #126). With that in place the interim Block_test.go workaround reverted in the previous commit is no longer needed.

Merge remote-tracking branch 'origin/main' into perf/txinpoints-packe…

867613f

…d-layout # Conflicts: # go.mod # go.sum

oskarszoon approved these changes May 19, 2026

View reviewed changes

icellan merged commit f896e60 into main May 19, 2026
30 checks passed

icellan deleted the perf/txinpoints-packed-layout branch May 19, 2026 11:28

icellan mentioned this pull request May 19, 2026

fix(blockassembly): fall back to row-oriented AddTxBatch on Unimplemented #897

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(blockassembly): adopt packed TxInpoints + zero-alloc columnar path#889

perf(blockassembly): adopt packed TxInpoints + zero-alloc columnar path#889
icellan merged 6 commits into
mainfrom
perf/txinpoints-packed-layout

icellan commented May 18, 2026

Uh oh!

github-actions Bot commented May 18, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented May 18, 2026 •

edited

Loading

Uh oh!

freemans13 left a comment

Uh oh!

sonarqubecloud Bot commented May 19, 2026

Uh oh!

oskarszoon left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

icellan commented May 18, 2026

Summary

What go-subtree v1.4.0 brings

What this PR does in teranode

1. AddTxBatchColumnar wire-format bump (proto)

2. AddTxBatch (row-oriented) handler

3. Client.go columnar emit

4. Test migrations (17 files)

Expected impact at 1 M+ TPS

Deploy notes

Test plan

Related

Uh oh!

github-actions Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 18, 2026

Critical Issue Found in Server.go:1089

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark Comparison Report

Summary

Uh oh!

freemans13 left a comment

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud Bot commented May 19, 2026

Quality Gate passed

Uh oh!

oskarszoon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

What `go-subtree` v1.4.0 brings

1. `AddTxBatchColumnar` wire-format bump (proto)

2. `AddTxBatch` (row-oriented) handler

3. `Client.go` columnar emit

github-actions Bot commented May 18, 2026 •

edited

Loading

github-actions Bot commented May 18, 2026 •

edited

Loading