perf(inpoints): pack TxInpoints vouts into a single allocation by icellan · Pull Request #125 · bsv-blockchain/go-subtree

icellan · 2026-05-18T17:07:39Z

Summary

Replaces the nested Idxs [][]uint32 field on TxInpoints with an unexported count-prefixed packed []uint32. Drops one heap allocation per parent per TxInpoints and removes the cap-8/cap-16 over-allocation in NewTxInpoints — that pre-allocation was sized for many-input transactions, but in practice most txs have 1-2 inputs and the unused slack dominated per-tx memory.

The constructor now sizes both internal buffers to len(tx.Inputs), the upper bound on parent count and total vouts. The no-grow guarantee the original cap-8/cap-16 was aiming for is preserved without paying for unused capacity.

Wire format unchanged

The on-wire encoding [count_i, vals...] per parent is byte-identical to the new internal layout, so the deserializer writes straight into a single allocation. Serialised blobs interop across versions without bumps.

Breaking change — intentional

The public field Idxs is removed rather than renamed. External callers on the old API will get a compile error rather than silently misuse the new layout. Migration:

Reading per-parent vouts: txi.Idxs[i] → txi.GetParentVoutsAtIndex(i)
Reading all inpoints flat: unchanged — txi.GetTxInpoints()
Constructing literals with Idxs: [][]uint32{...} → build via NewTxInpointsFromTx / NewTxInpointsFromInputs, or use appendInput from within the package

ParentTxHashes stays exported — its semantics are unchanged.

Benchmarks

Apple M3 Max, single-input tx via NewTxInpointsFromTx; deserialize paths use identical hand-crafted wire bytes for apples-to-apples comparison.

Path	Time	Bytes / op	Allocs / op
Build, 1 input (`NewTxInpointsFromTx`)	146.4 ns → 23.0 ns	644 B → 40 B	3 → 2
Deserialize, 1 parent / 1 vout	81.1 ns → 62.1 ns	112 B → 96 B	5 → 4
Deserialize, 10 parents	328.7 ns → 220.0 ns	652 B → 452 B	14 → 4
Deserialize, 100 parents	2596 ns → 1639 ns	6340 B → 4148 B	104 → 4

Deserialize allocation count is now constant at 4 regardless of input count, vs scaling linearly with parent count before. The build-path win (-94 % bytes, -84 % time) is the dominant saving at validator-side ingestion rates.

Test plan

go vet ./...
go test -race ./...
golangci-lint run ./... — no new issues (4 pre-existing G115 warnings in mmap_unix.go / subtree_fuzz_test.go remain, unrelated)
New TestTxInpoints_DedupAndRoundTrip covers multi-parent dedup + wire round-trip
Downstream PR in teranode migrating services/blockassembly/Client.go off the removed Idxs field

Replace the nested Idxs [][]uint32 field with a count-prefixed packed uint32 slice. Removes one heap allocation per parent per TxInpoints and removes the cap-8/cap-16 over-allocation in NewTxInpoints — that pre-allocation was sized for many-input txs but in practice most txs have 1-2 inputs and the slack dominated per-tx memory. The constructor now sizes both internal buffers to len(tx.Inputs), the upper bound on parent count and total vouts. The no-grow guarantee the original cap-8/cap-16 was aiming for is preserved without paying for unused slack. The public field Idxs is removed (not renamed) so external code on the old API fails to compile on upgrade rather than silently misuse the new layout. ParentTxHashes stays exported — its semantics are unchanged. Wire format unchanged. The on-wire encoding [count_i, vals...] for each parent is byte-identical to the new internal layout. Benchmarks (Apple M3 Max, single-input tx, deserialize path measured with identical hand-crafted wire bytes; build path measured via NewTxInpointsFromTx): Build (1 input): 146.4 ns / 644 B / 3 allocs → 23.0 ns / 40 B / 2 allocs Deserialize 1 input: 81.1 ns / 112 B / 5 allocs → 62.1 ns / 96 B / 4 allocs Deserialize 10 inputs: 328.7 ns / 652 B / 14 allocs → 220.0 ns / 452 B / 4 allocs Deserialize 100 inputs: 2596 ns / 6340 B / 104 allocs → 1639 ns / 4148 B / 4 allocs Allocation count for the deserialize path is now constant at 4 regardless of input count, vs scaling linearly before. The build-path win (-94% bytes, -84% time) is the dominant saving at validator-side ingestion rates of millions of TPS.

icellan · 2026-05-18T17:18:04Z

@mrz1836 Breaking changes. Deserves bump to v1.4

Adds a hot-path constructor that wraps caller-owned (parents, voutIdxs) slices directly into a TxInpoints without copying, allocating, or validating. The caller asserts the count-prefix invariant; this is the shape produced by Serialize and by upcoming columnar gRPC handlers that have already trusted the upstream service to emit well-formed data. Benchmark on Apple M3 Max — 0.27 ns/op, 0 B/op, 0 allocs/op (compiler inlines the struct literal). Compare with the 23 ns / 40 B / 2 allocs NewTxInpointsFromTx path. Intended consumer: teranode's AddTxBatchColumnar handler, which already holds the packed layout in a per-batch buffer and previously had to rebuild a [][]uint32 per tx. With this constructor block-assembly becomes two slice operations per tx.

mrz1836

LGTM

sonarqubecloud · 2026-05-18T17:50:08Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

icellan requested a review from mrz1836 as a code owner May 18, 2026 17:07

github-actions Bot added the size/L Large change (201–500 lines) label May 18, 2026

github-actions Bot assigned mrz1836 May 18, 2026

github-actions Bot added the performance Performance improvements or optimizations label May 18, 2026

icellan marked this pull request as draft May 18, 2026 17:32

mrz1836 approved these changes May 18, 2026

View reviewed changes

icellan closed this May 18, 2026

icellan reopened this May 18, 2026

icellan marked this pull request as ready for review May 18, 2026 17:50

mrz1836 merged commit 3c0deed into master May 18, 2026
115 of 123 checks passed

github-actions Bot deleted the perf/txinpoints-packed-layout branch May 18, 2026 18:04

icellan mentioned this pull request May 18, 2026

perf(blockassembly): adopt packed TxInpoints + zero-alloc columnar path bsv-blockchain/teranode#889

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(inpoints): pack TxInpoints vouts into a single allocation#125

perf(inpoints): pack TxInpoints vouts into a single allocation#125
mrz1836 merged 2 commits into
masterfrom
perf/txinpoints-packed-layout

icellan commented May 18, 2026

Uh oh!

icellan commented May 18, 2026

Uh oh!

mrz1836 left a comment

Uh oh!

sonarqubecloud Bot commented May 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

icellan commented May 18, 2026

Summary

Wire format unchanged

Breaking change — intentional

Benchmarks

Test plan

Uh oh!

icellan commented May 18, 2026

Uh oh!

mrz1836 left a comment

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud Bot commented May 18, 2026

Quality Gate passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants