Commit 9b9bf0c
committed
quic: use arena allocation for packets
Previously Packets were ReqWrap objects with a shared
free-list. This commit changes to a per-Endpoint arena
with no v8 involvement. This is the design I originally
had in mind but I initially went with the simpler
freelist approach to get something working. There's
too much overhead in the reqrap/freelist approach and
individual packets do not really need to be observable
via async hooks.
This design should eliminate the risk of memory fragmentation
and eliminate a significant bottleneck in the hot path.
Summary of improvements:
**Memory Comparison**
| Metric | Before | After | Delta |
| ------ | ------ | ----- | ----- |
| Per-packet memory | ~2,140 bytes | 1,712 bytes | -20% |
| Heap allocations per acquire | 3-4 (Packet, Data, shared_ptr control, V8 object) | 0 (pre-allocated in block) | eliminated |
| Heap allocations per reuse (freelist hit) | 2 (Data, shared_ptr control) | 0 | eliminated |
| V8 heap per packet | ~200-400 bytes (JS object) | 0 | eliminated |
| Block allocation (128 slots) | N/A | 214 KB (one new char[]) | amortized across 128 acquires |
| Per-packet allocator overhead | ~48-96 bytes (malloc headers × 3-4 allocs) | 0 (inline in block) | eliminated |
**Fragmentation**
**Before**: Each packet reuse from the freelist still called
std::make_shared<Data>(length, label) — a new heap allocation
for the Data object + its shared_ptr control block + the std::string
diagnostic label. These are small, variably-sized allocations
scattered across the heap.
**After**: All slots are identical 1,712-byte regions within
contiguous 214 KB blocks. Zero per-packet heap allocations
during steady-state operation. The only allocations happen
when a new block is grown.
**Performance Comparison**
**Acquire (hot path — called up to 32× per SendPendingData)**
**Before** (freelist hit):
1. BindingData::Get(env) — resolve binding data from environment
2. packet_freelist.front() / pop_front() — std::list dereference
(random memory access)
3. std::make_shared<Data>(length, label) — heap allocate Data +
control block
4. std::string constructor for diagnostic label — potential heap
allocation
5. Set listener, destination, data pointer on Packet
**Before** (freelist miss):
1. JS_NEW_INSTANCE_OR_RETURN — allocate V8 JS object (GC pressure,
potentially triggers GC)
2. MakeBaseObject<Packet>(...) — heap allocate Packet
3. std::make_shared<Data>(...) — heap allocate Data + control block
4. ClearWeak() — modify V8 weak handle state
**After** (always):
1. Pop from intrusive free list — slot = free_list_; free_list_ =
slot->next_free; (2 pointer ops)
2. Increment in_use_count_ on block and pool (2 increments)
3. Placement new Packet in pre-allocated memory (zero-initializes
uv_udp_send_t, copies SocketAddress)
The new acquire is essentially 2 pointer operations + a placement
new. No heap allocation, no V8 involvement, no atomic operations
(shared_ptr control block had atomics).
**Release (send callback — every completed packet)**
**Before**:
1. BaseObjectPtr<Packet> construction from raw pointer — atomic
increment
2. MakeWeak() — modify V8 weak handle
3. Check IsDispatched(), call listener
4. data_.reset() — atomic decrement on shared_ptr, may free Data
5. Reset() — reset uv_udp_send_t state
6. packet_freelist.push_back(std::move(self)) — std::list node
allocation (!)
7. Or if freelist full: destroy Packet → V8 GC eventually collects
JS object
**After**:
1. Packet::FromReq(req) — ContainerOf pointer arithmetic
(compile-time offset)
2. Call listener
3. ArenaPool::Release(p) — ~Packet() (trivial), then ReleaseSlot:
- Pointer arithmetic to recover SlotHeader
- slot->next_free = free_list_; free_list_ = slot; (2 pointer ops)
- Decrement 2 counters
- MaybeGC() check (branch, rarely taken)
The new release is pointer arithmetic + 2 pointer operations + 2
decrements. No atomic operations, no heap free, no V8 interaction.
**Send path (UDP::Send)**
**Before**: ClearWeak() + Dispatched() + uv_udp_send() + on
error: Done() + MakeWeak()
**After**: Ptr::release() (1 pointer swap) + uv_udp_send() + on
error: ArenaPool::Release()
**SendPendingData loop (up to 32 packets per call)**
**Before**: Each iteration potentially triggered
JS_NEW_INSTANCE_OR_RETURN (V8 object allocation) on freelist miss,
plus std::make_shared<Data> on every iteration.
**After**: Each iteration is just a free list pop + placement new.
For a full 32-packet burst from a warm pool, this is ~32 × (2 pointer
ops + a memset/memcpy for the Packet fields) — essentially zero
allocation cost.
**GC pressure**
**Before**: Each Packet had a persistent V8 JS object. When the
freelist was full (>100 packets), excess packets were destroyed,
leaving their V8 objects for the garbage collector. Under high
throughput, this created ongoing GC pressure proportional to
packet churn.
**After**: Zero V8 objects. Zero GC pressure from packets. The
ArenaPool::MaybeGC() only runs when >50% of total slots are free
and only frees entire blocks — a rare bulk operation, not per-packet
work.
**Summary**
| Aspect | Improvement |
| Per-packet memory | ~20% smaller (1,712 vs ~2,140 bytes) |
| Heap fragmentation | Eliminated (contiguous block allocation) |
| Heap allocations per acquire | 0 (was 2-4) |
| V8 GC pressure | Eliminated entirely |
| Atomic operations per acquire/release | 0 (was 2+ from shared_ptr) |
| Cache locality | Improved (sequential slots in contiguous blocks) |
| Acquire cost | ~2 pointer ops (was: conditional heap alloc + V8 object + shared_ptr) |
| Release cost | ~4 pointer ops + 2 decrements (was: atomic decrement + V8 weak handle + list node alloc) |
| SendPendingData 32-packet burst | ~32 × pointer swap (was: 32 × potential heap alloc + V8 alloc) |
| Steady-state memory overhead | Fixed: 1 block = 214 KB for 128 slots (was: unbounded individual allocations) |
The biggest wins are eliminating the per-packet V8 object
allocation (which could trigger GC) and the shared_ptr atomic
operations on every acquire/release. For a high-throughput QUIC
session sending 32 packets per SendPendingData call, the new path
is essentially allocation-free after the first block is populated.
Signed-off-by: James M Snell <jasnell@gmail.com>
Assisted-by: Opencode:Opus 4.61 parent bf1aebc commit 9b9bf0c
File tree
13 files changed
+869
-449
lines changed- src/quic
- test/cctest
13 files changed
+869
-449
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
354 | 354 | | |
355 | 355 | | |
356 | 356 | | |
| 357 | + | |
357 | 358 | | |
358 | 359 | | |
359 | 360 | | |
| |||
440 | 441 | | |
441 | 442 | | |
442 | 443 | | |
| 444 | + | |
443 | 445 | | |
444 | 446 | | |
445 | 447 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | 3 | | |
4 | | - | |
5 | 4 | | |
6 | 5 | | |
7 | 6 | | |
| |||
10 | 9 | | |
11 | 10 | | |
12 | 11 | | |
| 12 | + | |
13 | 13 | | |
14 | 14 | | |
15 | 15 | | |
| |||
207 | 207 | | |
208 | 208 | | |
209 | 209 | | |
210 | | - | |
211 | | - | |
212 | | - | |
213 | | - | |
214 | | - | |
215 | | - | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
216 | 213 | | |
217 | 214 | | |
218 | 215 | | |
| |||
264 | 261 | | |
265 | 262 | | |
266 | 263 | | |
267 | | - | |
| 264 | + | |
268 | 265 | | |
269 | 266 | | |
270 | 267 | | |
| |||
273 | 270 | | |
274 | 271 | | |
275 | 272 | | |
276 | | - | |
| 273 | + | |
277 | 274 | | |
278 | 275 | | |
279 | 276 | | |
| |||
299 | 296 | | |
300 | 297 | | |
301 | 298 | | |
302 | | - | |
303 | 299 | | |
304 | 300 | | |
305 | 301 | | |
| |||
367 | 363 | | |
368 | 364 | | |
369 | 365 | | |
370 | | - | |
371 | 366 | | |
372 | 367 | | |
373 | 368 | | |
| |||
380 | 375 | | |
381 | 376 | | |
382 | 377 | | |
383 | | - | |
384 | 378 | | |
385 | 379 | | |
386 | 380 | | |
| |||
393 | 387 | | |
394 | 388 | | |
395 | 389 | | |
396 | | - | |
397 | 390 | | |
398 | 391 | | |
399 | 392 | | |
400 | 393 | | |
401 | | - | |
402 | 394 | | |
403 | 395 | | |
404 | 396 | | |
| |||
416 | 408 | | |
417 | 409 | | |
418 | 410 | | |
419 | | - | |
420 | | - | |
421 | | - | |
| 411 | + | |
422 | 412 | | |
| 413 | + | |
423 | 414 | | |
424 | 415 | | |
425 | 416 | | |
| |||
429 | 420 | | |
430 | 421 | | |
431 | 422 | | |
432 | | - | |
| 423 | + | |
433 | 424 | | |
434 | 425 | | |
435 | 426 | | |
436 | 427 | | |
437 | 428 | | |
438 | 429 | | |
439 | 430 | | |
440 | | - | |
| 431 | + | |
441 | 432 | | |
442 | 433 | | |
443 | 434 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
132 | 132 | | |
133 | 133 | | |
134 | 134 | | |
135 | | - | |
| 135 | + | |
136 | 136 | | |
137 | 137 | | |
138 | 138 | | |
| |||
0 commit comments