perf: forward-only cursor in monotonicArena.Alloc by jensneuse · Pull Request #3 · wundergraph/go-arena

jensneuse · 2026-05-06T20:47:52Z

Summary

Closes #2.

monotonicArena.Alloc walked a.buffers from index 0 on every call,
giving O(numBuffers) cost per call and O(N²) total work over the
arena's lifetime.
On the Cosmo Router workload reported in the issue (~180MB JSON
response, ~600-1200 buffers, ~29M Alloc calls per request),
this manifested as ~40s of router-side merge time.

This PR adds a forward-only cursor.
Subsequent walks start at cursor instead of 0.
Cursor advances on a later-buffer hit and on grow.
Reset and Release rewind it to 0 so a reused arena can re-fill
its early buffers from scratch.

For uniform-size allocations the per-call cost becomes O(1).
For mixed sizes the walk is bounded by buffers ahead of the cursor,
with the trade-off that any remaining free space in skipped buffers is
abandoned for the rest of the request.

Benchmarks

Controlled prefix (isolated walk cost):

Prefix	Before	After	Speedup
10	17.5 ns/op	2.7 ns/op	6.5x
100	149 ns/op	2.6 ns/op	57x
1000	1293 ns/op	2.6 ns/op	497x

Pre-fix: clean O(N) scaling.
Post-fix: flat O(1) regardless of prefix size.

Realistic growth workload (AllocCosmoLike):

Prefix	Before	After	Speedup
10	5125 ns/op	3.4 ns/op	1500x
100	5265 ns/op	4.0 ns/op	1300x
1000	2785 ns/op	4.0 ns/op	700x

The realistic workload speedup is larger because the unpatched arena
grows during the timed loop, so the prefix walk gets longer over time.
This is consistent with the reporter's measurement of ~3x end-to-end
on the full Cosmo Router request, where Alloc was the dominant cost.

What changed

monotonic_arena.go: added cursor int field; Alloc walks from a.cursor; cursor advances on success and grow; Reset and Release rewind.
monotonic_arena_test.go: 7 unit tests (cursor mechanics + regression test for issue monotonicArena.Alloc scales poorly on large subgraph responses in Cosmo Router #2 hot path) and 2 benchmarks.

Credit

Original analysis and patch by @thoec in #2.
This PR adapts the patch and adds the test/benchmark coverage.

Test plan

go test -race ./... passes
All 7 new cursor tests pass
Benchmarks confirm O(N) → O(1) scaling

🤖 Generated with Claude Code

@thoec

Alloc walked a.buffers from index 0 on every call, giving O(numBuffers) cost per Alloc and O(N²) total work over an arena's lifetime. On the Cosmo Router workload reported in #2 (~180MB JSON response, ~600-1200 buffers, ~29M Allocs per request), this dominated request time at ~40s of router-side merge. Track the index of the most recent successful Alloc and start subsequent walks there. Cursor advances on a later-buffer hit and on grow; Reset and Release rewind it to 0 so a reused arena can re-fill its early buffers from scratch. For roughly uniform-size allocations the per-call cost becomes O(1); for mixed sizes the walk is bounded by the number of buffers ahead of the cursor. Benchmarks (controlled prefix, isolated walk cost): prefix=10: 17.5 ns/op → 2.7 ns/op (6.5x) prefix=100: 149 ns/op → 2.6 ns/op (57x) prefix=1000: 1293 ns/op → 2.6 ns/op (497x) Realistic growth workload (AllocCosmoLike): prefix=10: 5125 ns/op → 3.4 ns/op (1500x) prefix=100: 5265 ns/op → 4.0 ns/op (1300x) prefix=1000: 2785 ns/op → 4.0 ns/op (700x) Closes #2. Credit: original analysis and patch by @thoec. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: forward-only cursor in monotonicArena.Alloc#3

perf: forward-only cursor in monotonicArena.Alloc#3
jensneuse wants to merge 1 commit into
mainfrom
perf/monotonic-arena-alloc-cursor

jensneuse commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jensneuse commented May 6, 2026

Summary

Benchmarks

What changed

Credit

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant