perf(blockassembly): adopt packed TxInpoints + zero-alloc columnar path#889
Conversation
Bumps go-subtree to v1.4.0 (packed-layout TxInpoints) and reworks the
block-assembly receive paths to consume the new shape with minimal heap
churn. The block-assembly Server is the single non-horizontally-scalable
service in the cluster, so every per-tx allocation eliminated there
pays off proportionally to ingestion rate.
go-subtree v1.4.0 changes (consumed here):
- TxInpoints stores per-parent vouts in a single count-prefixed
[]uint32 (was nested [][]uint32) — drops one alloc per parent.
- NewTxInpoints sizes via len(tx.Inputs), preserves the no-grow
guarantee the prior cap-8/cap-16 was aiming for without paying for
unused slack on typical 1-2-input txs.
- NewTxInpointsFromPacked aliases caller-owned (parents, voutIdxs)
slices into a TxInpoints with zero allocation and zero validation.
- Public field Idxs removed; callers use GetParentVoutsAtIndex.
teranode changes:
1. AddTxBatchColumnar wire format (proto bump):
- Replaced parent_vout_indices + vout_idx_offsets (per-parent) with
vout_idxs_packed + vout_idxs_tx_offsets (per-tx). The packed
buffer is byte-identical to the layout block-assembly stores
internally, so the Server aliases a per-tx slice straight into a
TxInpoints via NewTxInpointsFromPacked. Two slice ops per tx,
zero allocation.
- Parent hashes reinterpreted with unsafe.Slice once per batch
(chainhash.Hash is [32]byte, byte-aligned over []byte backing).
- Per-tx allocations in the columnar handler: ~4 -> 0.
2. AddTxBatch (row-oriented) handler:
- Batch-allocate []TxInpoints values once and take addresses into
txInpointsList, instead of escaping a fresh TxInpoints per tx.
- Per-tx allocations: ~2 -> 0.
3. Client.go columnar emit:
- Build vout_idxs_packed as [count_0, vals..., count_1, vals..., ...]
directly, ready for zero-decode consumption.
Test migrations:
- 17 test files moved off the removed Idxs field. Construction sites
now use NewTxInpointsFromInputs with fake *bt.Input values; read
sites use GetParentVoutsAtIndex.
- server_columnar_test.go rewritten end-to-end against the new
proto shape including a new validator for vout_idxs_tx_offsets
length.
Expected at 1M+ TPS (UseColumnarBatch=true, packed layout):
- Server-side allocs/sec: ~4M -> ~3K (-99.9 %)
- GC-tracked objects in receive path: ~1000x reduction (per-tx
TxInpoints inner slices replaced with batch-shared buffers
aliased into per-tx slice headers).
- Estimated ~20-25 % CPU saved on block-assembly, dominated by
reduced GC mark cost.
Wire-protocol bump is internal between Validator and block-assembly;
deploy in lock-step.
|
🤖 Claude Code Review Status: Complete Current Review:
The implementation demonstrates careful attention to performance and safety:
No additional issues found. |
Critical Issue Found in Server.go:1089[Critical] Missing bounds validation for offset arrays The validation checks that ParentTxOffsets and VoutIdxsTxOffsets have the correct length (txCount+1), but does not verify that the offset values themselves are within bounds of their respective data arrays. Potential panic scenarios at lines 1131-1132 if offset values exceed array bounds. See services/blockassembly/Server.go:1089-1132 for details. |
There was a problem hiding this comment.
Pull request overview
This PR upgrades go-subtree to v1.4.0 and adapts block-assembly ingestion paths and tests to the new packed TxInpoints layout, with the main goal of reducing allocations in block-assembly receive paths.
Changes:
- Updates columnar block-assembly gRPC format to send packed vout indices and reconstruct
TxInpointswithout per-transaction decoding. - Optimizes row-oriented batch ingestion by storing
TxInpointsvalues in a batch slice. - Migrates tests and helpers away from the removed
TxInpoints.Idxsfield.
Reviewed changes
Copilot reviewed 25 out of 27 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
go.mod |
Bumps go-subtree and go-safe-conversion versions. |
go.sum |
Adds checksums for updated dependencies. |
services/blockassembly/Server.go |
Reworks row and columnar AddTxBatch receive paths for packed inpoints. |
services/blockassembly/Client.go |
Emits count-prefixed packed vout indices for columnar batches. |
services/blockassembly/blockassembly_api/blockassembly_api.proto |
Updates columnar request fields/documentation for packed vouts. |
services/blockassembly/blockassembly_api/blockassembly_api.pb.go |
Regenerates Go protobuf bindings for the updated request shape. |
services/blockassembly/server_columnar_test.go |
Migrates columnar tests to the new packed format. |
services/blockassembly/inpoints_helper_test.go |
Adds a test helper for single-parent TxInpoints. |
services/blockassembly/server_test.go |
Updates test inpoint construction to new APIs. |
services/blockassembly/filter_transactions_test.go |
Replaces direct Idxs literals with helper-built inpoints. |
services/blockassembly/data_test.go |
Updates serialization tests for packed inpoint construction. |
services/blockassembly/parent_validation_ordering_test.go |
Builds parent inpoints via fake inputs. |
services/blockassembly/subtreeprocessor/SubtreeProcessor_test.go |
Updates subtree metadata test inpoint construction. |
services/blockassembly/subtreeprocessor/disk_tx_map_test.go |
Migrates disk tx map serialization test to packed inpoints. |
services/blockassembly/subtreeprocessor/conflicting_queue_race_test.go |
Updates race test inpoints helper usage. |
services/validator/Server_test.go |
Adds shared validator test helper for single-parent inpoints. |
services/validator/Server_coverage_test.go |
Reuses validator inpoint helper in coverage tests. |
services/asset/httpimpl/test.go |
Builds transaction metadata inpoints via NewTxInpointsFromInputs. |
services/asset/httpimpl/GetTransactionMeta_test.go |
Replaces static inpoint literal with constructor helper. |
stores/utxo/unmined_serialization_test.go |
Updates unmined transaction serialization fixtures. |
stores/utxo/meta/data_test.go |
Adds test helpers and updates metadata tests/benchmarks. |
stores/txmetacache/txmetacache_test.go |
Updates TxMetaCache assertions and fixtures for packed inpoints. |
model/TestHelper.go |
Builds generated block subtree metadata using new inpoint constructor. |
model/Block_test.go |
Migrates subtree metadata test construction to fake inputs. |
test/longtest/services/blockassembly/subtreeprocessor/SubtreeProcessor_test.go |
Updates long-test subtree inpoints setup. |
test/longtest/services/blockassembly/Server_test.go |
Updates performance-test serialized inpoints setup. |
test/longtest/model/BlockBig_test.go |
Adjusts expected empty TxInpoints value. |
Files not reviewed (1)
- services/blockassembly/blockassembly_api/blockassembly_api.pb.go: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Benchmark Comparison ReportBaseline: Current: Summary
All benchmark results (sec/op)
Threshold: >10% with p < 0.05 | Generated: 2026-05-19 10:44 UTC |
PR #889 bumped go-subtree to v1.4.0, which made SubtreeMeta.Serialize reject empty TxInpoints on non-coinbase entries with "parent tx hashes are not set for node N". The test helper still seeded every node with subtreepkg.NewTxInpoints() (no parents), so TestBlock_ValidateSubtree_MissingParents/multiple_invalid_parents and TestBlock_ValidateSubtree_NodeIteration/subtree_with_multiple_nodes_iteration broke as soon as they used the multi-node form. Helper now assigns a single dummy parent to every node except a coinbase placeholder at index 0. Matches the construction pattern used by createSubtreeMetadataWithParents and by the existing PR #889 migration in TestHelper.go.
freemans13
left a comment
There was a problem hiding this comment.
There are failing tests but aside from that this is fab
go-subtree v1.4.0's NewTxInpoints() returns the zero value (nil
ParentTxHashes). Meta.Serialize rejects nil ParentTxHashes for
non-coinbase nodes, so createValidSubtreeMetadata and the two
inline-init sites in Block_test.go fail with "cannot serialize, parent
tx hashes are not set for node N" when constructing test fixtures.
Pre-v1.4.0 NewTxInpoints returned a cap-8 empty slice, so the same
code path worked. Replace those five sites with an explicit
TxInpoints{ParentTxHashes: []chainhash.Hash{}} literal — semantically
identical and unambiguous about the invariant the test relies on.
The proper fix is upstream — go-subtree PR #126 restores the non-nil
empty ParentTxHashes from NewTxInpoints. Once v1.4.1 lands these
literals can be reverted to NewTxInpoints() calls.
…xtures" This reverts commit bcd3f00.
v1.4.1 lands the NewTxInpoints empty-but-non-nil ParentTxHashes fix (go-subtree #126). With that in place the interim Block_test.go workaround reverted in the previous commit is no longer needed.
Two findings from external review of PR #889 addressed: 1. Proto field-number reuse: fields 8/9 previously carried different semantics than the new packed-layout fields. Same wire type means a misversioned client silently decodes into the wrong fields. Renumber the new fields to 10/11 and add `reserved 8, 9` (+ reserved names). Even with the lock-step-deploy operational model, this prevents the silent semantic break if a stray old-version request ever arrives. 2. Offset-validation gaps in AddTxBatchColumnar: the handler indexed into parents[ParentTxOffsets[i]:ParentTxOffsets[i+1]] and similarly for VoutIdxsPacked without checking offset endpoints, bounds, or monotonicity. A malformed request triggered a slice-bounds-out-of-range panic. grpc-go does not recover handler panics — a single bad packet would crash the entire single-instance block-assembly process. Add cheap endpoint + monotonicity checks that guarantee every per-tx slice expression is bounds-safe. The validator is trusted at the semantic layer, so we do not walk the packed voutIdxs to verify the count-prefix invariant (that walk would be O(B*P) on the hot path); we only do what is required to convert a process-crashing panic into a clean InvalidArgument response. Cost: 419 ns per 1000-tx batch (Apple M3 Max, isolated validation loop benchmark). At 1M TPS with batch 1000 that is ~0.04 % of a core. Tests: - TestAddTxBatchColumnar_RejectsOOBParentTxOffsets - TestAddTxBatchColumnar_RejectsNonMonotonicOffsets - TestAddTxBatchColumnar_RejectsOOBVoutIdxsTxOffsets - BenchmarkAddTxBatchColumnar_Validation (full handler, 1465 ns/op) - BenchmarkOffsetValidationLoop (isolated, 419 ns/op) Defensive comment added in Server.go documenting the proto.Unmarshal aliasing assumption — the unsafe.Slice on ParentTxHashesPacked relies on protobuf decoding bytes fields into fresh Go-heap allocations rather than aliasing the gRPC receive buffer. A future zero-copy codec would invalidate this and require rework.
…d-layout # Conflicts: # go.mod # go.sum
|
oskarszoon
left a comment
There was a problem hiding this comment.
Both prior critical blockers are closed:
- C1 (offset validation):
cb21b2240adds 4 endpoint checks + monotonicity loop inAddTxBatchColumnar. Every slice expression in the per-tx loop is bounds-safe; new tests cover OOB and non-monotonic cases. Cost ~0.04% of a core at 1M TPS / batch 1000. - C2 (proto field reuse):
.protoreserves field numbers 8 and 9 plus the old field names; new fields renumbered to 10/11..pb.goregenerated cleanly.
go-subtree v1.4.0 → v1.4.1 is a correctness-only bump (empty-but-non-nil ParentTxHashes), zero perf delta. The Merkle-root lift in submitMiningSolution is not new in this PR — it came in via the merge from main.
A rolling-deploy mismatch where one block-assembly replica predates bsv-blockchain#889 silently drops to the row path; without a log line operators investigating "why is this batch slow" can't tell the fallback fired. Debug level keeps it out of normal log volume but available when needed.



Summary
Bumps
go-subtreeto v1.4.0 (packed-layoutTxInpoints) and reworks both block-assembly receive paths to consume the new shape with minimal heap churn.Block-assembly is the single non-horizontally-scalable service in the cluster, so every per-tx allocation eliminated there is amortised across every TPS-second.
What
go-subtreev1.4.0 bringsTxInpointsstores per-parent vouts in a single count-prefixed[]uint32(was nested[][]uint32) — one fewer alloc per parent.NewTxInpointssizes vialen(tx.Inputs), preserving the no-grow guarantee the prior cap-8 / cap-16 was aiming for without paying for unused slack on typical 1-2-input txs.NewTxInpointsFromPacked(parents, voutIdxs)aliases caller-owned slices into aTxInpointswith zero allocation and zero validation — the hot-path constructor used by columnar gRPC.Idxsfield is removed (rather than renamed) so external code on the old API fails to compile rather than silently misuse the new layout.What this PR does in teranode
1.
AddTxBatchColumnarwire-format bump (proto)parent_vout_indices+vout_idx_offsets(per-parent) withvout_idxs_packed+vout_idxs_tx_offsets(per-tx).TxInpoints.voutIdxs's internal layout, so the Server aliases a per-tx slice straight into aTxInpointsviaNewTxInpointsFromPacked.unsafe.Sliceonce per batch (chainhash.Hashis[32]byte, byte-aligned over[]bytebacking).2.
AddTxBatch(row-oriented) handler[]TxInpointsvalues once and take addresses intotxInpointsList, instead of escaping a freshTxInpointsper tx.3.
Client.gocolumnar emitvout_idxs_packedas[count_0, vals..., count_1, vals..., ...]directly, ready for zero-decode consumption.4. Test migrations (17 files)
NewTxInpointsFromInputswith fake*bt.Inputvalues; read sites useGetParentVoutsAtIndex.server_columnar_test.gorewritten end-to-end against the new proto shape including a new validator forvout_idxs_tx_offsetslength.Expected impact at 1 M+ TPS
With
UseColumnarBatch=trueand the packed layout enabled:The CPU win comes mostly from collapsing per-tx slice objects in
currentTxMapinto shared per-batch buffers that GC counts as one object regardless of how many transactions alias into them.Deploy notes
AddTxBatchColumnarwire-format change is internal between Validator and block-assembly. Validators and block-assembly must deploy together.UseColumnarBatchis stillfalseby default — flip it per environment after the lock-step deploy.AddTxBatchpath stays fully compatible.Test plan
go build ./...go vet ./...clean (modulo pre-existingtest/utils/lock-by-value warnings)go test -race ./services/blockassembly/... ./services/asset/httpimpl/... ./stores/utxo/meta/... ./stores/utxo/ ./stores/txmetacache/... ./services/validator/... ./model/...— all greengolangci-lint runclean on touched files (only pre-existingpreallocwarnings remain in unrelated benchmarks)UseColumnarBatch=trueafter deploy)Related