Skip to content

perf(inpoints): pack TxInpoints vouts into a single allocation#125

Merged
mrz1836 merged 2 commits into
masterfrom
perf/txinpoints-packed-layout
May 18, 2026
Merged

perf(inpoints): pack TxInpoints vouts into a single allocation#125
mrz1836 merged 2 commits into
masterfrom
perf/txinpoints-packed-layout

Conversation

@icellan
Copy link
Copy Markdown
Contributor

@icellan icellan commented May 18, 2026

Summary

Replaces the nested Idxs [][]uint32 field on TxInpoints with an unexported count-prefixed packed []uint32. Drops one heap allocation per parent per TxInpoints and removes the cap-8/cap-16 over-allocation in NewTxInpoints — that pre-allocation was sized for many-input transactions, but in practice most txs have 1-2 inputs and the unused slack dominated per-tx memory.

The constructor now sizes both internal buffers to len(tx.Inputs), the upper bound on parent count and total vouts. The no-grow guarantee the original cap-8/cap-16 was aiming for is preserved without paying for unused capacity.

Wire format unchanged

The on-wire encoding [count_i, vals...] per parent is byte-identical to the new internal layout, so the deserializer writes straight into a single allocation. Serialised blobs interop across versions without bumps.

Breaking change — intentional

The public field Idxs is removed rather than renamed. External callers on the old API will get a compile error rather than silently misuse the new layout. Migration:

  • Reading per-parent vouts: txi.Idxs[i]txi.GetParentVoutsAtIndex(i)
  • Reading all inpoints flat: unchanged — txi.GetTxInpoints()
  • Constructing literals with Idxs: [][]uint32{...} → build via NewTxInpointsFromTx / NewTxInpointsFromInputs, or use appendInput from within the package

ParentTxHashes stays exported — its semantics are unchanged.

Benchmarks

Apple M3 Max, single-input tx via NewTxInpointsFromTx; deserialize paths use identical hand-crafted wire bytes for apples-to-apples comparison.

Path Time Bytes / op Allocs / op
Build, 1 input (NewTxInpointsFromTx) 146.4 ns → 23.0 ns 644 B → 40 B 3 → 2
Deserialize, 1 parent / 1 vout 81.1 ns → 62.1 ns 112 B → 96 B 5 → 4
Deserialize, 10 parents 328.7 ns → 220.0 ns 652 B → 452 B 14 → 4
Deserialize, 100 parents 2596 ns → 1639 ns 6340 B → 4148 B 104 → 4

Deserialize allocation count is now constant at 4 regardless of input count, vs scaling linearly with parent count before. The build-path win (-94 % bytes, -84 % time) is the dominant saving at validator-side ingestion rates.

Test plan

  • go vet ./...
  • go test -race ./...
  • golangci-lint run ./... — no new issues (4 pre-existing G115 warnings in mmap_unix.go / subtree_fuzz_test.go remain, unrelated)
  • New TestTxInpoints_DedupAndRoundTrip covers multi-parent dedup + wire round-trip
  • Downstream PR in teranode migrating services/blockassembly/Client.go off the removed Idxs field

Replace the nested Idxs [][]uint32 field with a count-prefixed packed
uint32 slice. Removes one heap allocation per parent per TxInpoints and
removes the cap-8/cap-16 over-allocation in NewTxInpoints — that
pre-allocation was sized for many-input txs but in practice most txs
have 1-2 inputs and the slack dominated per-tx memory.

The constructor now sizes both internal buffers to len(tx.Inputs), the
upper bound on parent count and total vouts. The no-grow guarantee the
original cap-8/cap-16 was aiming for is preserved without paying for
unused slack.

The public field Idxs is removed (not renamed) so external code on the
old API fails to compile on upgrade rather than silently misuse the
new layout. ParentTxHashes stays exported — its semantics are unchanged.

Wire format unchanged. The on-wire encoding [count_i, vals...] for each
parent is byte-identical to the new internal layout.

Benchmarks (Apple M3 Max, single-input tx, deserialize path measured
with identical hand-crafted wire bytes; build path measured via
NewTxInpointsFromTx):

  Build (1 input):       146.4 ns / 644 B / 3 allocs  →  23.0 ns /  40 B / 2 allocs
  Deserialize   1 input:  81.1 ns / 112 B / 5 allocs  →  62.1 ns /  96 B / 4 allocs
  Deserialize  10 inputs: 328.7 ns / 652 B / 14 allocs → 220.0 ns / 452 B / 4 allocs
  Deserialize 100 inputs: 2596 ns / 6340 B / 104 allocs → 1639 ns / 4148 B / 4 allocs

Allocation count for the deserialize path is now constant at 4 regardless
of input count, vs scaling linearly before. The build-path win (-94% bytes,
-84% time) is the dominant saving at validator-side ingestion rates of
millions of TPS.
@icellan icellan requested a review from mrz1836 as a code owner May 18, 2026 17:07
@github-actions github-actions Bot added the size/L Large change (201–500 lines) label May 18, 2026
@github-actions github-actions Bot added the performance Performance improvements or optimizations label May 18, 2026
@icellan
Copy link
Copy Markdown
Contributor Author

icellan commented May 18, 2026

@mrz1836 Breaking changes. Deserves bump to v1.4

@icellan icellan marked this pull request as draft May 18, 2026 17:32
Adds a hot-path constructor that wraps caller-owned (parents, voutIdxs)
slices directly into a TxInpoints without copying, allocating, or
validating. The caller asserts the count-prefix invariant; this is the
shape produced by Serialize and by upcoming columnar gRPC handlers that
have already trusted the upstream service to emit well-formed data.

Benchmark on Apple M3 Max — 0.27 ns/op, 0 B/op, 0 allocs/op (compiler
inlines the struct literal). Compare with the 23 ns / 40 B / 2 allocs
NewTxInpointsFromTx path.

Intended consumer: teranode's AddTxBatchColumnar handler, which already
holds the packed layout in a per-batch buffer and previously had to
rebuild a [][]uint32 per tx. With this constructor block-assembly
becomes two slice operations per tx.
Copy link
Copy Markdown
Collaborator

@mrz1836 mrz1836 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@icellan icellan closed this May 18, 2026
@icellan icellan reopened this May 18, 2026
@icellan icellan marked this pull request as ready for review May 18, 2026 17:50
@sonarqubecloud
Copy link
Copy Markdown

@mrz1836 mrz1836 merged commit 3c0deed into master May 18, 2026
115 of 123 checks passed
@github-actions github-actions Bot deleted the perf/txinpoints-packed-layout branch May 18, 2026 18:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Performance improvements or optimizations size/L Large change (201–500 lines)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants