Skip to content

Commit 6b6c6a3

Browse files
committed
src: update coro readme details
1 parent 9043df0 commit 6b6c6a3

File tree

1 file changed

+70
-27
lines changed

1 file changed

+70
-27
lines changed

src/coro/README.md

Lines changed: 70 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -24,10 +24,14 @@ AsyncLocalStorage, microtask draining, and environment lifecycle management.
2424
`UvFsAwaitable` (fs operations), `UvFsStatAwaitable` (stat-family),
2525
`UvWorkAwaitable` (thread pool work), and `UvGetAddrInfoAwaitable`
2626
(DNS resolution). Each embeds the libuv request struct directly in the
27-
coroutine frame, avoiding separate heap allocations.
27+
coroutine frame, avoiding separate heap allocations. Each also exposes a
28+
`cancelable_req()` method returning the underlying `uv_req_t*` for
29+
cancellation support during environment teardown.
2830

2931
* `uv_promise.h` -- Helpers for bridging coroutines to JavaScript Promises:
30-
`MakePromise()`, `ResolvePromise()`, `RejectPromiseWithUVError()`.
32+
`MakePromise()`, `ResolvePromise()`, `RejectPromiseWithUVError()`. The
33+
resolve and reject helpers guard against calling V8 APIs when the
34+
environment is shutting down (`can_call_into_js()` check).
3135

3236
## Usage
3337

@@ -114,14 +118,19 @@ self-destructs when the coroutine completes.
114118
`UvTrackedTask<T, Name>` follows the same lazy/fire-and-forget pattern but
115119
adds three phases around `Start()`:
116120
117-
1. **Creation**: The coroutine frame is heap-allocated by the compiler.
118-
The coroutine is suspended at `initial_suspend` (lazy).
121+
1. **Creation**: The coroutine frame is allocated from the thread-local
122+
free-list (see "Frame allocator" below). The coroutine is suspended at
123+
`initial_suspend` (lazy).
119124
120125
2. **`InitTracking(env)`**: Assigns an `async_id`, captures the current
121-
`async_context_frame` (for AsyncLocalStorage propagation), creates a
122-
resource object for `executionAsyncResource()`, emits the async\_hooks
123-
`init` event and a trace event, registers in the Environment's coroutine
124-
task list, and reports external memory to V8.
126+
`async_context_frame` (for AsyncLocalStorage propagation), emits a trace
127+
event, and registers in the Environment's coroutine task list for
128+
cancellation during teardown. If async\_hooks listeners are active
129+
(`kInit > 0` or `kUsesExecutionAsyncResource > 0`), a resource object
130+
is created for `executionAsyncResource()` and the `init` hook is emitted.
131+
The type name V8 string is cached per template instantiation via
132+
`v8::Eternal<v8::String>`, so only the first coroutine of a given type
133+
pays the `String::NewFromUtf8` cost.
125134
126135
3. **`Start()`**: Marks the task as detached (fire-and-forget) and resumes
127136
the coroutine. Each resume-to-suspend segment is wrapped in an
@@ -133,8 +142,10 @@ adds three phases around `Start()`:
133142
134143
4. **Completion**: At `final_suspend`, the last `InternalCallbackScope` is
135144
closed (draining task queues), the async\_hooks `destroy` event is emitted,
136-
the task is unregistered from the Environment, external memory accounting
137-
is released, and the coroutine frame is freed.
145+
the task is unregistered from the Environment, and the coroutine frame is
146+
returned to the thread-local free-list. If a detached coroutine has a
147+
captured C++ exception that was never observed, `std::terminate()` is
148+
called rather than silently discarding it.
138149
139150
## How the awaitable dispatch works
140151
@@ -144,7 +155,7 @@ directly in the coroutine frame. When the coroutine hits `co_await`:
144155
1. `await_transform()` on the promise wraps it in a `TrackedAwaitable`.
145156
2. `TrackedAwaitable::await_suspend()`:
146157
* Closes the current `InternalCallbackScope` (drains microtasks/nextTick).
147-
* Records the `uv_req_t*` for cancellation support.
158+
* Records the `uv_req_t*` for cancellation support (via `cancelable_req()`).
148159
* Increments `request_waiting_` (event loop liveness).
149160
* Calls the inner `await_suspend()`, which dispatches the libuv call with
150161
`req_.data = this` pointing back to the awaitable.
@@ -157,6 +168,11 @@ directly in the coroutine frame. When the coroutine hits `co_await`:
157168
* Opens a new `InternalCallbackScope` for the next segment.
158169
* Returns the result (e.g., `req_.result` for fs operations).
159170
171+
The liveness counter and cancellation tracking are conditional on the inner
172+
awaitable having a `cancelable_req()` method (checked at compile time via a
173+
`requires` expression). When co\_awaiting another `UvTask` or `UvTrackedTask`
174+
(coroutine composition), these steps are skipped.
175+
160176
## Environment teardown
161177
162178
During `Environment::CleanupHandles()`, the coroutine task list is iterated and
@@ -166,36 +182,51 @@ in-flight libuv request (if any), which causes the libuv callback to fire with
166182
The `request_waiting_` counter ensures the teardown loop waits for all
167183
coroutine I/O to finish before destroying the Environment.
168184
185+
## Frame allocator
186+
187+
Coroutine frames are allocated from a thread-local free-list rather than going
188+
through `malloc`/`free` on every creation and destruction. This is implemented
189+
via `promise_type::operator new` and `operator delete` in `TrackedPromiseBase`,
190+
which route through `CoroFrameAlloc()` and `CoroFrameFree()`.
191+
192+
The free-list uses size-class buckets with 256-byte granularity, covering
193+
frames up to 4096 bytes (which covers typical coroutine frames). Frames larger
194+
than 4096 bytes fall through to the global `operator new`. Since all coroutines
195+
run on the event loop thread, the free-list requires no locking.
196+
197+
After the first coroutine of a given size class completes, subsequent
198+
coroutines of the same size class are allocated from the free-list with zero
199+
`malloc` overhead.
200+
169201
## Allocation comparison with ReqWrap
170202
171203
For a single async operation (e.g., `fsPromises.access`):
172204
173-
| | ReqWrap pattern | Coroutine pattern |
174-
| -------------------- | --------------- | --------------------------------- |
175-
| C++ heap allocations | 3 | 1 (coroutine frame) |
176-
| V8 heap objects | 7 | 3 (resource + resolver + promise) |
177-
| Total allocations | 10 | 4 |
205+
| | ReqWrap pattern | Coroutine (no hooks) | Coroutine (hooks active) |
206+
| -------------------- | --------------- | -------------------- | ------------------------ |
207+
| C++ heap allocations | 3 | 0 (free-list hit) | 0 (free-list hit) |
208+
| V8 heap objects | 7 | 2 (resolver+promise) | 3 (+ resource object) |
209+
| Total allocations | 10 | 2 | 3 |
178210
179211
For a multi-step operation (open + stat + read + close):
180212
181-
| | 4x ReqWrap | Single coroutine |
182-
| ----------------------------- | ---------- | ----------------------------- |
183-
| C++ heap allocations | 12 | 1 |
184-
| V8 heap objects | 28 | 3 |
185-
| Total allocations | 40 | 4 |
186-
| InternalCallbackScope entries | 4 | 5 (one per segment + initial) |
213+
| | 4x ReqWrap | Single coroutine (no hooks) | Single coroutine (hooks active) |
214+
| ----------------------------- | ---------- | --------------------------- | ------------------------------- |
215+
| C++ heap allocations | 12 | 0 (free-list hit) | 0 (free-list hit) |
216+
| V8 heap objects | 28 | 2 | 3 |
217+
| Total allocations | 40 | 2 | 3 |
218+
| InternalCallbackScope entries | 4 | 5 (one per segment) | 5 |
187219
188220
The coroutine frame embeds the `uv_fs_t` (\~440 bytes) directly. The compiler
189221
may overlay non-simultaneously-live awaitables in the frame, so a multi-step
190222
coroutine does not necessarily pay N times the `uv_fs_t` cost.
191223
192224
## Known limitations
193225
194-
* **Heap snapshot visibility**: The coroutine frame is a plain `malloc`
195-
allocated by the C++ coroutine machinery. It is not visible to V8 heap
196-
snapshots or `MemoryRetainer`. `AdjustAmountOfExternalAllocatedMemory` is
197-
used to give V8 a rough signal of the external memory pressure, but the
198-
exact frame contents are not inspectable.
226+
* **Heap snapshot visibility**: The coroutine frame is not visible to V8 heap
227+
snapshots or `MemoryRetainer`. The thread-local free-list allocator reduces
228+
malloc pressure but does not provide V8 with per-frame memory accounting.
229+
The exact frame contents are not inspectable from heap snapshot tooling.
199230
200231
* **Snapshot serialization**: `UvTrackedTask` holds `v8::Global` handles that
201232
cannot be serialized into a startup snapshot. There is currently no safety
@@ -207,3 +238,15 @@ coroutine does not necessarily pay N times the `uv_fs_t` cost.
207238
The coroutine pattern uses free-form string names. The `init` trace event
208239
uses the provided name; the `destroy` trace event currently uses a generic
209240
`"coroutine"` category name rather than the per-instance name.
241+
242+
* **Free-list growth**: The thread-local free-list does not have a cap on the
243+
number of cached frames per size class. Under a workload that creates a
244+
large burst of concurrent coroutines and then goes idle, the free-list will
245+
retain all of those frames until the thread exits. A maximum per-bucket
246+
count could be added if this becomes a concern.
247+
248+
* **Static Eternal handles**: The cached type name `v8::Eternal<v8::String>`
249+
is a static variable per template instantiation. It is never freed and is
250+
shared across all Isolates on the same thread. This is safe for the
251+
single-Isolate case (the common case for Node.js), but would need
252+
per-Isolate caching if multiple Isolates use the same coroutine types.

0 commit comments

Comments
 (0)