Skip to content

fix(rivetkit): restore hibernatable sockets and hydrate serverless starts#4658

Open
NathanFlurry wants to merge 1 commit intobreak-up/persist-hibernating-requestsfrom
break-up/restore-hibernating-sockets
Open

fix(rivetkit): restore hibernatable sockets and hydrate serverless starts#4658
NathanFlurry wants to merge 1 commit intobreak-up/persist-hibernating-requestsfrom
break-up/restore-hibernating-sockets

Conversation

@NathanFlurry
Copy link
Copy Markdown
Member

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Copy link
Copy Markdown
Member Author

NathanFlurry commented Apr 14, 2026

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 14, 2026

PR Review: fix(rivetkit): restore hibernatable sockets and hydrate serverless starts

Overview

This PR fixes hibernatable WebSocket restore and serverless start hydration for the engine actor driver. The core change introduces two new binding registries (#hibernatableConnectBindings and #hibernatableRunnerWebSocketBindings) that persist socket state across actor hibernation/wake cycles so existing connections can be re-bound to the newly started actor without being dropped. It also fixes a pre-existing omission where #dynamicRuntimes was cleared without calling dispose().


Bugs / Correctness Issues

1. Log field inconsistency: err vs error

In the serverlessHandleStart catch block the field is named err: instead of error:. Every other structured log in this file uses error:. This will silently break any log-based alerting or dashboards that filter on the error field.

2. #bindDynamicHibernatableRunnerWebSocket detach does not close proxy

In #bindDynamicHibernatableConnectSocket, the detach closure sends 1011 "dynamic.rebind" to close the proxy WebSocket before removing listeners. The equivalent detach in #bindDynamicHibernatableRunnerWebSocket only removes listeners without closing proxyToActorWs. If the runner-side proxy is open when the binding is detached during rebind, it will linger without a close frame. This asymmetry should be confirmed intentional or fixed.

3. Single try block swallows partial restoration failures silently

A single failure in #rebindHibernatableConnectSockets aborts the entire block, abandoning all remaining connections for that actor with no close frame sent to the gateway or client. Consider separating these into independent try/catch blocks, or at minimum closing each unrestored binding explicitly so the client receives a meaningful error.

4. actorForDispatch fallback may dispatch to a stale actor reference

When currentActor exists but is not a static actor instance, dispatch falls back to the earlier-captured actor variable, which could be a stopped or replaced instance. A comment explaining the intended fallback semantics would help, and the case where neither reference is valid should be handled explicitly rather than silently dispatching to a stale object.


Code Quality

5. as any casts

Multiple event.data as any casts appear when calling forwardIncomingWebSocketMessage and websocket.send. These should use the correct union type (ArrayBuffer | string) or a type guard.

6. Unhandled throws inside onMessage static path

In #bindHibernatableConnectSocket the static path calls wsHandler.onMessage and actor.handleInboundHibernatableWebSocketMessage without a try/catch. An uncaught exception will crash the event listener. The dynamic path wraps forwarding in .catch() — the static path should do the same.

7. Double-logging in #runnerDynamicWebSocket error path

deconstructError internally logs the error, then the call site logs it again at error level before closing. This produces duplicate log entries for the same event.


Performance

8. O(n) scans in rebind helpers

Both #rebindHibernatableConnectSockets and #rebindDynamicHibernatableRunnerWebSockets iterate all bindings to find those matching actorId. For many concurrent connections across many actors, each actor wake-up is O(total bindings). Consider indexing bindings by actorId for O(1) lookup.


Test Coverage

The PR description leaves all test checklist items unchecked. There are no new tests covering:

  • Hibernation and wake-up restoring an active WebSocket connection end-to-end
  • Rebind on dynamic actor restart
  • Cleanup when rebind fails partway through
  • Force-stop behavior during the shutdown drain timeout

Given the complexity of the two new binding maps and their detach/rebind/cleanup paths, the absence of tests is the highest-risk aspect of this change. At minimum, a driver test covering the hibernation-wake-message-delivery round trip would catch regressions early.


Minor / Nits

  • The let to const fix for payload in serverlessHandleStart is a good cleanup.
  • The #disposeDynamicRuntime / #disposeAllDynamicRuntimes refactor correctly replaces the silent .clear() call and improves shutdown correctness.
  • The force-stop loop added during the drain timeout is appropriate; the structured log fields look good.

@NathanFlurry NathanFlurry force-pushed the break-up/expose-hibernation-metadata branch from 26f98bc to fde1e0b Compare April 15, 2026 02:40
@NathanFlurry NathanFlurry force-pushed the break-up/restore-hibernating-sockets branch from ccc38b5 to 8293235 Compare April 15, 2026 02:40
@NathanFlurry NathanFlurry force-pushed the break-up/expose-hibernation-metadata branch from fde1e0b to 789b9cd Compare April 15, 2026 02:50
@NathanFlurry NathanFlurry force-pushed the break-up/restore-hibernating-sockets branch 2 times, most recently from 1ba6a3b to de87a8b Compare April 15, 2026 06:55
@NathanFlurry NathanFlurry changed the base branch from break-up/expose-hibernation-metadata to graphite-base/4658 April 27, 2026 05:57
@NathanFlurry NathanFlurry force-pushed the break-up/restore-hibernating-sockets branch from de87a8b to af4bf6c Compare April 27, 2026 05:57
@NathanFlurry NathanFlurry changed the base branch from graphite-base/4658 to break-up/persist-hibernating-requests April 27, 2026 05:57
@NathanFlurry NathanFlurry marked this pull request as ready for review April 27, 2026 06:32
@NathanFlurry NathanFlurry force-pushed the break-up/persist-hibernating-requests branch from 0626b1c to c5d1d60 Compare April 27, 2026 07:08
@NathanFlurry NathanFlurry force-pushed the break-up/restore-hibernating-sockets branch from af4bf6c to 34a6f24 Compare April 27, 2026 07:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant