Skip to content

fix(rivetkit): use keepAwake for websocket callback tracking to prevent c.vars crash after grace deadline#4978

Open
abcxff wants to merge 1 commit intomainfrom
05-05-fix_rivetkit_use_keepawake_for_websocket_callback_tracking_to_prevent_c.vars_crash_after_grace_deadline
Open

fix(rivetkit): use keepAwake for websocket callback tracking to prevent c.vars crash after grace deadline#4978
abcxff wants to merge 1 commit intomainfrom
05-05-fix_rivetkit_use_keepawake_for_websocket_callback_tracking_to_prevent_c.vars_crash_after_grace_deadline

Conversation

@abcxff
Copy link
Copy Markdown
Contributor

@abcxff abcxff commented May 5, 2026

The onSleep handler defers runtime state cleanup while keepAwakeCount > 0, but websocket callback promises were registered via waitUntil which doesn't increment keepAwakeCount. After the grace deadline expired, onSleep cleared the NAPI runtime state while async websocket handlers were still running, causing TypeError when they accessed c.vars.

Switching #callHandler from waitUntil to keepAwake makes websocket callback promises visible to the existing deferred-cleanup guard.

Adds a driver test that reproduces the production crash.

@railway-app
Copy link
Copy Markdown

railway-app Bot commented May 5, 2026

🚅 Deployed to the rivet-pr-4978 environment in rivet-frontend

Service Status Web Updated (UTC)
website 😴 Sleeping (View Logs) Web May 7, 2026 at 1:15 am
kitchen-sink ❌ Build Failed (View Logs) Web May 6, 2026 at 4:18 am
ladle ❌ Build Failed (View Logs) Web May 5, 2026 at 7:41 pm
mcp-hub 🕗 Deploying (View Logs) Web May 5, 2026 at 7:38 pm
frontend-cloud ❌ Build Failed (View Logs) Web May 5, 2026 at 7:35 pm
frontend-inspector ❌ Build Failed (View Logs) Web May 5, 2026 at 7:35 pm

Copy link
Copy Markdown
Contributor Author

abcxff commented May 5, 2026

This stack of pull requests is managed by Graphite. Learn more about stacking.

@abcxff abcxff marked this pull request as ready for review May 5, 2026 19:34
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 5, 2026

Code Review

Overview

This PR fixes a production crash where accessing c.vars in an async WebSocket message handler throws TypeError after the actor grace deadline expires. The root cause: WebSocket callback promises were registered via waitUntil, which tracks the promise in core but does not increment the TS-side keepAwakeCount. The onSleep deferred-cleanup guard only checks keepAwakeCount, so it cleared the NAPI runtime state while handlers were still running.

The fix is a one-line change — waitUntil to keepAwake in TrackedWebSocketHandleAdapter#callHandler — plus a new fixture actor and driver test.


Core Fix (native.ts)

The change is correct and minimal. The semantic mismatch is real: waitUntil registers the task in core but leaves keepAwakeCount == 0, so the onSleep guard at line 4017 proceeds immediately to cleanupNativeSleepRuntimeState even while an async handler is awaiting. keepAwake increments keepAwakeCount before native registration and decrements it in .finally(), triggering cleanupDeferredNativeSleepRuntimeState only after the last handler drains.

One side-effect worth acknowledging: keepAwake also calls actorKeepAwake in core, which registers against core's own keep-awake counter. Previously waitUntil registered against actorWaitUntil. If core treats these two differently (e.g. different shutdown-drain sequencing), that difference should be verified. A comment explaining why c.sleep() still fires despite an open keepAwake would help future readers.


Test Fixture (sleep.ts)

The fixture is clean and the timing constants are well-named. A few notes:

Missing positive assertion on handlerFinished. The fixture tracks handlerFinished in actor state, and the whole point of the fix is that the handler now completes. The test asserts handlerStarted == 1 but never checks handlerFinished. Adding expect(status.handlerFinished).toBe(1) would turn the current negative/indirect check (no TypeError in output) into a direct positive proof that the handler ran to completion.

Fragile output-string assertions. Checking getRuntimeOutput() for "Cannot set properties of undefined" works today but will silently stop catching regressions if the V8 error message text changes. A handlerFinished assertion gives a deterministic, engine-agnostic signal.

Fixed-time waitFor sleep. The wait is a pure time delay (VARS_EXCEEDS_GRACE_DELAY + VARS_EXCEEDS_GRACE_SLEEP_TIMEOUT + 500ms). Polling actor.getStatus() until handlerFinished >= 1 with a timeout would be faster on fast machines and more robust on slow ones.


Wasm Runtime Parity

TrackedWebSocketHandleAdapter is NAPI-only (native path). The wasm runtime in wasm-runtime.ts has its own WebSocket dispatch that currently calls waitUntil (line 499). If the wasm runtime's async WebSocket handler path has the same deferred-cleanup gap, this fix does not cover it. Worth confirming whether the wasm keepAwakeCount guard is exercised on the wasm path, or whether wasm has a separate cleanup flow that is immune.


Summary

Area Status
Core fix correctness Correct
Root cause explained Clear
Test covers the crash path Yes
Positive handler-completion assertion Missing
Fixed-time sleep (potential flake source) Present
Wasm parity check Not addressed

The fix is correct. The main asks before merging: (1) add expect(status.handlerFinished).toBe(1) to replace the fragile output-string assertions, and (2) confirm or note wasm parity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant