Skip to content

Commit 9b9bf0c

Browse files
committed
quic: use arena allocation for packets
Previously Packets were ReqWrap objects with a shared free-list. This commit changes to a per-Endpoint arena with no v8 involvement. This is the design I originally had in mind but I initially went with the simpler freelist approach to get something working. There's too much overhead in the reqrap/freelist approach and individual packets do not really need to be observable via async hooks. This design should eliminate the risk of memory fragmentation and eliminate a significant bottleneck in the hot path. Summary of improvements: **Memory Comparison** | Metric | Before | After | Delta | | ------ | ------ | ----- | ----- | | Per-packet memory | ~2,140 bytes | 1,712 bytes | -20% | | Heap allocations per acquire | 3-4 (Packet, Data, shared_ptr control, V8 object) | 0 (pre-allocated in block) | eliminated | | Heap allocations per reuse (freelist hit) | 2 (Data, shared_ptr control) | 0 | eliminated | | V8 heap per packet | ~200-400 bytes (JS object) | 0 | eliminated | | Block allocation (128 slots) | N/A | 214 KB (one new char[]) | amortized across 128 acquires | | Per-packet allocator overhead | ~48-96 bytes (malloc headers × 3-4 allocs) | 0 (inline in block) | eliminated | **Fragmentation** **Before**: Each packet reuse from the freelist still called std::make_shared<Data>(length, label) — a new heap allocation for the Data object + its shared_ptr control block + the std::string diagnostic label. These are small, variably-sized allocations scattered across the heap. **After**: All slots are identical 1,712-byte regions within contiguous 214 KB blocks. Zero per-packet heap allocations during steady-state operation. The only allocations happen when a new block is grown. **Performance Comparison** **Acquire (hot path — called up to 32× per SendPendingData)** **Before** (freelist hit): 1. BindingData::Get(env) — resolve binding data from environment 2. packet_freelist.front() / pop_front() — std::list dereference (random memory access) 3. std::make_shared<Data>(length, label) — heap allocate Data + control block 4. std::string constructor for diagnostic label — potential heap allocation 5. Set listener, destination, data pointer on Packet **Before** (freelist miss): 1. JS_NEW_INSTANCE_OR_RETURN — allocate V8 JS object (GC pressure, potentially triggers GC) 2. MakeBaseObject<Packet>(...) — heap allocate Packet 3. std::make_shared<Data>(...) — heap allocate Data + control block 4. ClearWeak() — modify V8 weak handle state **After** (always): 1. Pop from intrusive free list — slot = free_list_; free_list_ = slot->next_free; (2 pointer ops) 2. Increment in_use_count_ on block and pool (2 increments) 3. Placement new Packet in pre-allocated memory (zero-initializes uv_udp_send_t, copies SocketAddress) The new acquire is essentially 2 pointer operations + a placement new. No heap allocation, no V8 involvement, no atomic operations (shared_ptr control block had atomics). **Release (send callback — every completed packet)** **Before**: 1. BaseObjectPtr<Packet> construction from raw pointer — atomic increment 2. MakeWeak() — modify V8 weak handle 3. Check IsDispatched(), call listener 4. data_.reset() — atomic decrement on shared_ptr, may free Data 5. Reset() — reset uv_udp_send_t state 6. packet_freelist.push_back(std::move(self)) — std::list node allocation (!) 7. Or if freelist full: destroy Packet → V8 GC eventually collects JS object **After**: 1. Packet::FromReq(req) — ContainerOf pointer arithmetic (compile-time offset) 2. Call listener 3. ArenaPool::Release(p) — ~Packet() (trivial), then ReleaseSlot: - Pointer arithmetic to recover SlotHeader - slot->next_free = free_list_; free_list_ = slot; (2 pointer ops) - Decrement 2 counters - MaybeGC() check (branch, rarely taken) The new release is pointer arithmetic + 2 pointer operations + 2 decrements. No atomic operations, no heap free, no V8 interaction. **Send path (UDP::Send)** **Before**: ClearWeak() + Dispatched() + uv_udp_send() + on error: Done() + MakeWeak() **After**: Ptr::release() (1 pointer swap) + uv_udp_send() + on error: ArenaPool::Release() **SendPendingData loop (up to 32 packets per call)** **Before**: Each iteration potentially triggered JS_NEW_INSTANCE_OR_RETURN (V8 object allocation) on freelist miss, plus std::make_shared<Data> on every iteration. **After**: Each iteration is just a free list pop + placement new. For a full 32-packet burst from a warm pool, this is ~32 × (2 pointer ops + a memset/memcpy for the Packet fields) — essentially zero allocation cost. **GC pressure** **Before**: Each Packet had a persistent V8 JS object. When the freelist was full (>100 packets), excess packets were destroyed, leaving their V8 objects for the garbage collector. Under high throughput, this created ongoing GC pressure proportional to packet churn. **After**: Zero V8 objects. Zero GC pressure from packets. The ArenaPool::MaybeGC() only runs when >50% of total slots are free and only frees entire blocks — a rare bulk operation, not per-packet work. **Summary** | Aspect | Improvement | | Per-packet memory | ~20% smaller (1,712 vs ~2,140 bytes) | | Heap fragmentation | Eliminated (contiguous block allocation) | | Heap allocations per acquire | 0 (was 2-4) | | V8 GC pressure | Eliminated entirely | | Atomic operations per acquire/release | 0 (was 2+ from shared_ptr) | | Cache locality | Improved (sequential slots in contiguous blocks) | | Acquire cost | ~2 pointer ops (was: conditional heap alloc + V8 object + shared_ptr) | | Release cost | ~4 pointer ops + 2 decrements (was: atomic decrement + V8 weak handle + list node alloc) | | SendPendingData 32-packet burst | ~32 × pointer swap (was: 32 × potential heap alloc + V8 alloc) | | Steady-state memory overhead | Fixed: 1 block = 214 KB for 128 slots (was: unbounded individual allocations) | The biggest wins are eliminating the per-packet V8 object allocation (which could trigger GC) and the shared_ptr atomic operations on every acquire/release. For a high-throughput QUIC session sending 32 packets per SendPendingData call, the new path is essentially allocation-free after the first block is populated. Signed-off-by: James M Snell <jasnell@gmail.com> Assisted-by: Opencode:Opus 4.6
1 parent bf1aebc commit 9b9bf0c

File tree

13 files changed

+869
-449
lines changed

13 files changed

+869
-449
lines changed

node.gyp

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -354,6 +354,7 @@
354354
'src/quic/tlscontext.cc',
355355
'src/quic/transportparams.cc',
356356
'src/quic/quic.cc',
357+
'src/quic/arena.h',
357358
'src/quic/bindingdata.h',
358359
'src/quic/cid.h',
359360
'src/quic/data.h',
@@ -440,6 +441,7 @@
440441
'test/cctest/test_node_crypto_env.cc',
441442
],
442443
'node_cctest_quic_sources': [
444+
'test/cctest/test_quic_arena.cc',
443445
'test/cctest/test_quic_cid.cc',
444446
'test/cctest/test_quic_error.cc',
445447
'test/cctest/test_quic_preferredaddress.cc',

src/quic/application.cc

Lines changed: 10 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
11
#if HAVE_OPENSSL && HAVE_QUIC
22
#include "guard.h"
33
#ifndef OPENSSL_NO_QUIC
4-
#include "application.h"
54
#include <async_wrap-inl.h>
65
#include <debug_utils-inl.h>
76
#include <nghttp3/nghttp3.h>
@@ -10,6 +9,7 @@
109
#include <node_sockaddr-inl.h>
1110
#include <uv.h>
1211
#include <v8.h>
12+
#include "application.h"
1313
#include "defs.h"
1414
#include "endpoint.h"
1515
#include "http3.h"
@@ -207,12 +207,9 @@ StreamPriority Session::Application::GetStreamPriority(const Stream& stream) {
207207
return StreamPriority::DEFAULT;
208208
}
209209

210-
BaseObjectPtr<Packet> Session::Application::CreateStreamDataPacket() {
211-
return Packet::Create(env(),
212-
session_->endpoint(),
213-
session_->remote_address(),
214-
session_->max_packet_size(),
215-
"stream data");
210+
Packet::Ptr Session::Application::CreateStreamDataPacket() {
211+
return session_->endpoint().CreatePacket(
212+
session_->remote_address(), session_->max_packet_size(), "stream data");
216213
}
217214

218215
void Session::Application::StreamClose(Stream* stream, QuicError&& error) {
@@ -264,7 +261,7 @@ void Session::Application::SendPendingData() {
264261
// The number of packets that have been sent in this call to SendPendingData.
265262
size_t packet_send_count = 0;
266263

267-
BaseObjectPtr<Packet> packet;
264+
Packet::Ptr packet;
268265
uint8_t* pos = nullptr;
269266
uint8_t* begin = nullptr;
270267

@@ -273,7 +270,7 @@ void Session::Application::SendPendingData() {
273270
packet = CreateStreamDataPacket();
274271
if (!packet) [[unlikely]]
275272
return false;
276-
pos = begin = ngtcp2_vec(*packet).base;
273+
pos = begin = packet->data();
277274
}
278275
DCHECK(packet);
279276
DCHECK_NOT_NULL(pos);
@@ -299,7 +296,6 @@ void Session::Application::SendPendingData() {
299296
// The stream_data is the next block of data from the application stream.
300297
if (GetStreamData(&stream_data) < 0) {
301298
Debug(session_, "Application failed to get stream data");
302-
packet->CancelPacket();
303299
session_->SetLastError(QuicError::ForNgtcp2Error(NGTCP2_ERR_INTERNAL));
304300
closed = true;
305301
return session_->Close(CloseMethod::SILENT);
@@ -367,7 +363,6 @@ void Session::Application::SendPendingData() {
367363
if (ndatalen >= 0 && !StreamCommit(&stream_data, ndatalen)) {
368364
Debug(session_,
369365
"Failed to commit stream data while writing packets");
370-
packet->CancelPacket();
371366
session_->SetLastError(
372367
QuicError::ForNgtcp2Error(NGTCP2_ERR_INTERNAL));
373368
closed = true;
@@ -380,7 +375,6 @@ void Session::Application::SendPendingData() {
380375
// ngtcp2 callback failed for some reason. This would be a
381376
// bug in our code.
382377
Debug(session_, "Internal failure with ngtcp2 callback");
383-
packet->CancelPacket();
384378
session_->SetLastError(
385379
QuicError::ForNgtcp2Error(NGTCP2_ERR_INTERNAL));
386380
closed = true;
@@ -393,12 +387,10 @@ void Session::Application::SendPendingData() {
393387
Debug(session_,
394388
"Application encountered error while writing packet: %s",
395389
ngtcp2_strerror(nwrite));
396-
packet->CancelPacket();
397390
session_->SetLastError(QuicError::ForNgtcp2Error(nwrite));
398391
closed = true;
399392
return session_->Close(CloseMethod::SILENT);
400393
} else if (ndatalen >= 0 && !StreamCommit(&stream_data, ndatalen)) {
401-
packet->CancelPacket();
402394
session_->SetLastError(QuicError::ForNgtcp2Error(NGTCP2_ERR_INTERNAL));
403395
closed = true;
404396
return session_->Close(CloseMethod::SILENT);
@@ -416,10 +408,9 @@ void Session::Application::SendPendingData() {
416408
if (datalen) {
417409
Debug(session_, "Sending packet with %zu bytes", datalen);
418410
packet->Truncate(datalen);
419-
session_->Send(packet, path);
420-
} else {
421-
packet->CancelPacket();
411+
session_->Send(std::move(packet), path);
422412
}
413+
// If no data, Ptr destructor releases the packet.
423414

424415
return;
425416
}
@@ -429,15 +420,15 @@ void Session::Application::SendPendingData() {
429420
size_t datalen = pos - begin;
430421
Debug(session_, "Sending packet with %zu bytes", datalen);
431422
packet->Truncate(datalen);
432-
session_->Send(packet, path);
423+
session_->Send(std::move(packet), path);
433424

434425
// If we have sent the maximum number of packets, we're done.
435426
if (++packet_send_count == max_packet_count) {
436427
return;
437428
}
438429

439430
// Prepare to loop back around to prepare a new packet.
440-
packet.reset();
431+
// packet is already empty from the std::move above.
441432
pos = begin = nullptr;
442433
}
443434
}

src/quic/application.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -132,7 +132,7 @@ class Session::Application : public MemoryRetainer {
132132
}
133133

134134
private:
135-
BaseObjectPtr<Packet> CreateStreamDataPacket();
135+
Packet::Ptr CreateStreamDataPacket();
136136

137137
// Write the given stream_data into the buffer.
138138
ssize_t WriteVStream(PathStorage* path,

0 commit comments

Comments
 (0)