Skip to content

Commit d844ecd

Browse files
committed
docs: add keepAwake vs waitUntil table and internal sleep-sequence invariants
1 parent 8aa8524 commit d844ecd

4 files changed

Lines changed: 95 additions & 15 deletions

File tree

CLAUDE.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -425,6 +425,7 @@ Load these only when the task touches the topic.
425425
- **[BARE protocol crates](docs-internal/engine/bare-protocol-crates.md)** — vbare schema ordering, identity converters, `build.rs` TS codec generation pattern. Read before adding/changing protocol crates.
426426
- **[SQLite VFS parity](docs-internal/engine/sqlite-vfs.md)** — native Rust VFS ↔ WASM TypeScript VFS 1:1 parity rule, v2 storage keys, chunk layout, delete/truncate strategy. Read before touching either VFS.
427427
- **[TLS trust roots](docs-internal/engine/tls-trust-roots.md)** — rustls native+webpki union rationale, which clients use which backend.
428+
- **[Sleep sequence](docs-internal/engine/sleep-sequence.md)** — engine lifecycle authority, `keepAwake` vs `waitUntil` semantics, grace deadline shutdown-token abort, `can_arm_sleep_timer` vs `can_finalize_sleep` predicates. Read before touching sleep/destroy lifecycle.
428429

429430
### Agent procedural (`.claude/reference/`)
430431

Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
# Sleep sequence invariants
2+
3+
Design constraints and invariants for the RivetKit actor sleep / destroy lifecycle. Pair with `actor-task-dispatch.md` and `rivetkit-core-internals.md` for surrounding context.
4+
5+
## Authority
6+
7+
- The engine owns lifecycle authority. `ctx.sleep()` and `ctx.destroy()` send fire-and-forget `ActorIntent` events; they do not transition lifecycle state locally. The local `SleepGrace` / `DestroyGrace` transition runs when the engine replies with `StopActor`.
8+
- `envoy-client` retries intent delivery across reconnects via checkpoint-based event replay (`engine/sdks/rust/envoy-client/src/events.rs`). Core does not need its own retry path.
9+
10+
## Public surface: keep-awake primitives
11+
12+
Two user-facing primitives in TypeScript. Both accept a `Promise`, never a closure.
13+
14+
| Method | Blocks idle sleep | Blocks grace finalize | Notes |
15+
| --- | --- | --- | --- |
16+
| `c.keepAwake(promise)` | Yes | Yes | Returns the same promise. Use for work the actor must stay up for. |
17+
| `c.waitUntil(promise)` | No | Yes | Returns void. Use for best-effort flush/cleanup work that is allowed to complete inside the grace window. |
18+
19+
`c.setPreventSleep(b)` and `c.preventSleep` are deprecated no-ops retained for binary / call-site compatibility. They will be removed in 2.2.0.
20+
21+
### Why two primitives and not one
22+
23+
`keepAwake` is scoped, non-leaky, and symmetric with `waitUntil`. `setPreventSleep` was a flag that had to be paired by hand; forgetting to clear it wedged the actor awake. A promise-scoped counter cannot leak: when the promise settles (resolve or reject), the counter decrements.
24+
25+
### Why separate `keep_awake` and `internal_keep_awake` in core
26+
27+
Kept separate for debug visibility. Grace deadline warn logs report each counter independently so diagnostics distinguish user keep-awake sites from framework-owned keep-awake sites (schedule alarms, queue receives).
28+
29+
## Sleep readiness predicates
30+
31+
Two predicates govern the sleep state machine. Both live on `ActorContext` / `SleepState`.
32+
33+
- `can_arm_sleep_timer()` — the idle predicate. Returns `CanSleep::Yes` only when every sleep-affecting counter is zero and the run handler is inactive (or waiting on a queue). Used to start the sleep idle timer.
34+
- `can_finalize_sleep()` — the grace predicate. Returns `true` only when every shutdown-affecting counter is zero: `core_dispatched_hooks`, `shutdown_task_count`, `sleep_keep_awake`, `sleep_internal_keep_awake`, `active_http_requests`, `websocket_callbacks`, `pending_disconnects`. Used to advance from `SleepGrace` to `SleepFinalize` (or finalize destroy).
35+
36+
Removing `preventSleep` deleted both predicate branches. Any future sleep-affecting counter must add an entry in each predicate and must call `ActorContext::reset_sleep_timer()` on transitions that change the result.
37+
38+
## Grace period and abort signals
39+
40+
- `start_grace(reason)` fires at the start of `SleepGrace` / `DestroyGrace`. It cancels the sleep idle timer, cancels the actor abort signal (`actor_abort_signal`), installs a `SleepGraceState` with the effective grace deadline, and resets the sleep timer to arm the grace tick.
41+
- The actor abort signal is a soft signal: "shutdown has started, please wrap up." User code observes it via `c.abortSignal`. It does not force-stop work.
42+
- For destroy, the abort signal may fire earlier than grace entry because `ctx.destroy()` cancels the abort token immediately via `mark_destroy_requested(...)`.
43+
44+
## Grace deadline enforcement
45+
46+
When the grace deadline elapses before `can_finalize_sleep()` returns true:
47+
48+
- `on_sleep_grace_deadline` aborts the user `run` handle (`run_handle.abort()`), cancels the shutdown deadline token (`cancel_shutdown_deadline()`), records the timeout, and emits a structured warn log enumerating every non-drained counter.
49+
- The NAPI `RunGracefulCleanup` task observes `shutdown_deadline_token()` via `tokio::select!` and aborts its in-flight `onSleep` / `onDestroy` call so SQLite and KV cleanup in `teardown_sleep_state` do not race against mid-commit user work.
50+
- Foreign-runtime adapters that run user cleanup callbacks must observe the shutdown deadline token the same way.
51+
52+
## Guarding lifecycle requests
53+
54+
- `ctx.sleep()` and `ctx.destroy()` return `Result<()>`. They fail with `actor/starting` if called before startup completes and `actor/stopping` if the request flag has already been swapped to true for this generation. An atomic `swap(true, ...)` on `sleep_requested` / `destroy_requested` enforces single-shot request semantics per generation.
55+
- The idle sleep timer request path (`spawn_sleep_timer_task`) and the `ActorTask` sleep-tick path both suppress the already-requested error: idle-driven requests may race user-driven requests and the warning is informational.
56+
57+
## Serialize-state shutdown cap
58+
59+
`SERIALIZE_STATE_SHUTDOWN_SANITY_CAP = 15s` is the upper bound on how long the shutdown `SerializeState` reply wait is allowed to pend before `save_final_state` falls back to empty deltas (preserving prior state). This is a sanity cap, not a deadline anyone should ever hit; the normal drain finishes in milliseconds.
60+
61+
## Test harness parity
62+
63+
- Rust integration tests live in `rivetkit-core/tests/modules/sleep.rs` and pin predicate behavior, grace period selection, and `save_final_state` cap.
64+
- TypeScript driver tests in `rivetkit-typescript/packages/rivetkit/tests/driver/actor-sleep*.test.ts` cover abort-signal-at-grace-entry, `keepAwake` holding shutdown, `c.db` writes surviving `onSleep`, and regression coverage for `setPreventSleep` being a no-op.

rivetkit-rust/packages/rivetkit-core/CLAUDE.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,10 @@
66

77
## Sleep state invariants
88

9-
- Any mutation that changes a `can_sleep` input must call `ActorContext::reset_sleep_timer()` so the `ActorTask` sleep deadline is re-evaluated. Inputs are: `ready`/`started`, `prevent_sleep`, `no_sleep`, `active_http_request_count`, `sleep_keep_awake_count`, `sleep_internal_keep_awake_count`, `pending_disconnect_count`, `conns()`, and `websocket_callback_count`. Missing this call leaves the sleep timer armed against stale state and triggers the `"sleep idle deadline elapsed but actor stayed awake"` warning on the next tick.
9+
- Any mutation that changes a `can_sleep` input must call `ActorContext::reset_sleep_timer()` so the `ActorTask` sleep deadline is re-evaluated. Inputs are: `ready`/`started`, `no_sleep`, `active_http_request_count`, `sleep_keep_awake_count`, `sleep_internal_keep_awake_count`, `pending_disconnect_count`, `conns()`, and `websocket_callback_count`. Missing this call leaves the sleep timer armed against stale state and triggers the `"sleep idle deadline elapsed but actor stayed awake"` warning on the next tick.
10+
- `ActorContext::set_prevent_sleep(...)` / `prevent_sleep()` are deprecated no-ops kept for NAPI bridge compatibility. Use `keep_awake(future)` (holds counter while awaited) or `wait_until(future)` (tracked shutdown task) instead. Do not reintroduce a `prevent_sleep` field, a `CanSleep::PreventSleep` variant, or branches that read it.
11+
- `ctx.sleep()` and `ctx.destroy()` return `Result<()>`. They error with `ActorLifecycleError::Starting` when called before startup completes and `ActorLifecycleError::Stopping` if the requested flag has already been set this generation (atomic `swap(true, ...)`). Internal idle-timer paths log and suppress the already-requested error.
12+
- The grace deadline path (`on_sleep_grace_deadline`) aborts the user `run` handle and cancels `shutdown_deadline_token()`. Foreign-runtime adapters running `onSleep` / `onDestroy` must observe that token via `tokio::select!` so SQLite teardown does not race user cleanup work.
1013
- Counter `register_zero_notify(&idle_notify)` hooks only drive shutdown drain waits. They are not a substitute for the activity-dirty notification, so any new sleep-affecting counter must also notify on transitions that change `can_sleep`.
1114
- A clean `run` exit while `Started` is not terminal. Keep the generation alive until the guaranteed `Stop` drives `SleepGrace` or `DestroyGrace`, and only treat `Terminated` as "grace hooks already completed."
1215
- Do not reply to actor startup until the runtime adapter has acknowledged its startup preamble. Otherwise `getOrCreate` can race the first action against `onWake` or `run` startup.

website/src/content/docs/actors/lifecycle.mdx

Lines changed: 26 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -285,7 +285,7 @@ The `run` hook is called after the actor starts and runs in the background witho
285285
The handler exposes `c.aborted` for loop checks and `c.abortSignal` for canceling operations when the actor is stopping. You should always check or listen for shutdown to exit gracefully.
286286

287287
**Important behavior:**
288-
- The actor may go to sleep at any time during the `run` handler. Use `c.setPreventSleep(true)` while work is active, then clear it with `c.setPreventSleep(false)` once the actor can sleep again.
288+
- The actor may go to sleep at any time during the `run` handler. Wrap work that must keep the actor awake with `c.keepAwake(promise)` to block idle sleep until the promise settles.
289289
- If the `run` handler exits (returns), the actor follows its normal idle sleep timeout once it becomes idle
290290
- If the `run` handler throws an error, the actor logs the error and then follows its normal idle sleep timeout once it becomes idle
291291
- On shutdown, `c.abortSignal` fires so the `run` handler can exit within the graceful shutdown window.
@@ -725,7 +725,7 @@ An actor is considered idle and eligible to sleep when **all** of the following
725725
- No active HTTP requests
726726
- No active connections (unless they are hibernatable WebSockets)
727727
- No active `run` handler (unless it is waiting on a queue)
728-
- `setPreventSleep` is not enabled
728+
- No outstanding `c.keepAwake(promise)` promises
729729
- No pending disconnect callbacks
730730
- No async `onWebSocket` event handlers (eg `open`, `message`, `close`) still running
731731

@@ -737,13 +737,18 @@ Outbound requests (e.g. `fetch` calls) do not count as activity and will not kee
737737

738738
The platform may force an actor to migrate to a new machine during version upgrades or when a serverless request is about to timeout. The same [shutdown sequence](#shutdown-sequence) runs, then the actor is rescheduled on a new machine and wakes up with its persisted state.
739739

740-
Use `onSleep`, `waitUntil`, or `setPreventSleep` to control the length of the grace period before the actor moves to another machine.
740+
Use `onSleep`, `waitUntil`, or `keepAwake` to control the length of the grace period before the actor moves to another machine.
741741

742-
### Preventing Sleep
742+
### Keeping the Actor Awake
743743

744-
If actor state says the actor should stay awake, call `c.setPreventSleep(true)` and clear it once the actor can sleep again. You can read `c.preventSleep` to inspect the current flag.
744+
RivetKit gives you two primitives for holding the actor awake across background work. Both take a `Promise` and differ in how they interact with idle sleep and the grace period.
745745

746-
`setPreventSleep` blocks normal idle sleep until you clear it. It is not a platform-wide stop blocker though. If shutdown has already started, RivetKit waits for `preventSleep` to clear within the same `sleepGracePeriod` shutdown budget used by `onSleep` and `waitUntil`.
746+
| Method | Accepts | Blocks idle sleep | Blocks grace finalize | Use case |
747+
| --- | --- | --- | --- | --- |
748+
| `c.keepAwake(promise)` | `Promise<T>` (returns same promise) | Yes | Yes | Critical work that must keep the actor running end to end (for example a turn in a game, an ongoing tool call). |
749+
| `c.waitUntil(promise)` | `Promise<unknown>` (returns void) | No | Yes | Best-effort finalization work that may complete during the grace window (for example analytics flushes, cleanup writes). |
750+
751+
`c.keepAwake(promise)` is the preferred primitive for long-running work the actor should not sleep through. It holds a keep-awake counter until the promise settles, which blocks both idle sleep and the grace finalize step. The promise is returned unchanged, so you can `await` it if you need the value.
747752

748753
```typescript
749754
import { actor } from "rivetkit";
@@ -754,18 +759,25 @@ const sessionActor = actor({
754759
},
755760

756761
actions: {
757-
beginTurn: (c) => {
762+
runTurn: async (c, input: string) => {
758763
c.state.activeTurns += 1;
759-
c.setPreventSleep(true);
760-
},
761-
endTurn: (c) => {
762-
c.state.activeTurns -= 1;
763-
c.setPreventSleep(c.state.activeTurns > 0);
764+
try {
765+
const result = await c.keepAwake(processTurn(input));
766+
return result;
767+
} finally {
768+
c.state.activeTurns -= 1;
769+
}
764770
},
765771
}
766772
});
773+
774+
declare function processTurn(input: string): Promise<string>;
767775
```
768776

777+
<Note>
778+
`setPreventSleep(enabled)` is deprecated and now a no-op. Wrap the work you want to keep alive with `c.keepAwake(promise)` instead.
779+
</Note>
780+
769781
### On Sleep Hook
770782

771783
The [`onSleep`](#onsleep) hook runs during shutdown for cleanup like clearing intervals or closing connections. It is best-effort and will not run if the actor crashes.
@@ -815,14 +827,14 @@ const analyticsActor = actor({
815827
});
816828
```
817829

818-
The actor waits up to `sleepGracePeriod` for graceful sleep work during the [shutdown sequence](#shutdown-sequence). That single budget covers `onSleep`, `waitUntil`, async raw WebSocket handlers such as `message` and `close`, and waiting for `preventSleep` to clear after shutdown has started. By default, this graceful sleep window is 15 seconds total. If the timeout is exceeded, the actor proceeds with sleep anyway.
830+
The actor waits up to `sleepGracePeriod` for graceful sleep work during the [shutdown sequence](#shutdown-sequence). That single budget covers `onSleep`, `waitUntil`, `keepAwake`, async raw WebSocket handlers such as `message` and `close`. By default, this graceful sleep window is 15 seconds total. If the timeout is exceeded, the actor proceeds with sleep anyway.
819831

820832
### Sleep Timeouts
821833

822834
| Option | Default | Description |
823835
|--------|---------|-------------|
824836
| `sleepTimeout` | 30 seconds | Time of inactivity before the actor begins sleeping. |
825-
| `sleepGracePeriod` | 15 seconds | Total graceful shutdown window for hooks, `waitUntil`, async raw WebSocket handlers, disconnects, and waiting for `preventSleep` to clear. |
837+
| `sleepGracePeriod` | 15 seconds | Total graceful shutdown window for hooks, `waitUntil`, `keepAwake`, async raw WebSocket handlers, and disconnects. |
826838

827839
Rivet enforces a hard limit of **30 minutes** for the entire stop process. These can be configured in the [options](#options).
828840

0 commit comments

Comments
 (0)