Skip to content

chore(rivetkit): impl follow up review#4711

Draft
NathanFlurry wants to merge 1 commit into04-22-chore_rivetkit_core_napi_typescript_follow_up_reviewfrom
04-23-chore_rivetkit_impl_follow_up_review
Draft

chore(rivetkit): impl follow up review#4711
NathanFlurry wants to merge 1 commit into04-22-chore_rivetkit_core_napi_typescript_follow_up_reviewfrom
04-23-chore_rivetkit_impl_follow_up_review

Conversation

@NathanFlurry
Copy link
Copy Markdown
Member

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@railway-app
Copy link
Copy Markdown

railway-app Bot commented Apr 23, 2026

🚅 Deployed to the rivet-pr-4711 environment in rivet-frontend

Service Status Web Updated (UTC)
website ✅ Success (View Logs) Web Apr 24, 2026 at 5:19 am
frontend-cloud ❌ Build Failed (View Logs) Web Apr 24, 2026 at 1:32 am
kitchen-sink ❌ Build Failed (View Logs) Web Apr 24, 2026 at 1:31 am
frontend-inspector ❌ Build Failed (View Logs) Web Apr 24, 2026 at 1:26 am
mcp-hub ✅ Success (View Logs) Web Apr 23, 2026 at 11:27 pm
ladle ✅ Success (View Logs) Web Apr 23, 2026 at 11:26 pm

Copy link
Copy Markdown
Member Author

NathanFlurry commented Apr 23, 2026

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 23, 2026

Code Review: chore(rivetkit): impl follow up review

Overview

This PR is a broad follow-up cleanup across the RivetKit stack touching: CBOR serialization correctness, actor lifecycle/grace period fixes, cancel token refactoring, queue waiter race condition fix, WebSocket tunnel message draining, SQLite storage correctness (truncate cleanup + shard page filtering), and actor runtime state migration from module globals to NAPI context bags.


Positive Highlights

  • Queue completion waiter race fix (actor/queue.rs): Registering the waiter before persisting to KV is the correct ordering. The rollback path on KV failure now also correctly removes the already-registered waiter.
  • CBOR compat layer (encoding.ts, serde.ts): The encodeJsonCompatValue/reviveJsonCompatValue round-trip with prefix escaping is a clean design that avoids collisions between payload arrays and tagged tuples.
  • Module-global to NAPI runtime state bag (native.ts): Moving nativePersistStateByActorId, nativeActorVars, etc. off module-level Maps onto the per-context runtimeState() bag fixes the stale-generation problem on same-key actor restarts.
  • Cancel token refactor: Eliminating the global CANCEL_TOKENS: SccHashMap registry in favour of passing CancellationToken objects directly removes a global leak source and simplifies call sites.
  • Shard page filtering (compaction/shard.rs): The row.pgno <= head.db_size_pages guard correctly prevents truncated pages from being incorporated into shards. The new test compact_shard_discards_pages_above_eof provides direct coverage.
  • Grace period dispatch (task.rs): Allowing accepting_dispatch and fire_due_alarms to fire during SleepGrace/DestroyGrace aligns with the invariant that the actor stays alive until grace completes.
  • Gateway tunnel message drain (pegboard-gateway2): Draining buffered tunnel messages before spawning the forward task prevents a race where messages queued between resend_pending_websocket_messages and task start could be dropped.

Issues and Suggestions

1. encodeJsonCompatValue - escape logic may miss nested sentinel arrays

File: rivetkit-typescript/packages/rivetkit/src/common/encoding.ts

The escape check runs on the already-encoded output and guards encoded[0], but encoded[1] is left unescaped. A round-trip test with a payload like [["$BigInt", "42"], something] would verify that the second element of a user array cannot become a misinterpreted sentinel in reviveJsonCompatValue. Tests covering nested dollar-sign-prefixed 2-element user arrays would increase confidence.

2. reviveJsonCompatValue - unknown sentinel throws instead of degrading

File: rivetkit-typescript/packages/rivetkit/src/common/encoding.ts

An unknown sentinel from a newer peer throws rather than degrades gracefully. A comment clarifying whether this strict-reject behavior is intentional would help future maintainers understand whether forward-compatibility across protocol versions is a concern.

3. handle_stop - LiveExit::Sleep branch silently drops reply_tx

File: rivetkit-rust/packages/rivetkit-core/src/actor/task.rs

A Sleep exit in the handle_stop grace loop returns Ok(()) and drops reply_tx, leaving the caller waiting on reply_rx indefinitely or until drop. This should either be unreachable!() (if Sleep cannot occur during handle_stop) or should explicitly send an error on the reply channel to avoid a silent hang.

4. drain_ready_tunnel_messages - ordering assumption undocumented

File: engine/packages/pegboard-gateway2/src/lib.rs

The drain runs after resend_pending_websocket_messages but before the forward task spawns. A comment explaining why all pre-reconnect messages are guaranteed to already be in msg_rx at this point would help future readers assess whether any delivery window remains.

5. runtime_state - Ref drop-safety without active Env

File: rivetkit-typescript/packages/rivetkit-napi/src/actor_context.rs

The Ref<()> stored in runtime_state is released via reset_runtime_state. If ActorContext is dropped without reset_runtime_state being called (e.g. an unexpected shutdown path), the Ref<()> would drop outside an active NAPI Env, which is undefined behavior in NAPI. Consider whether this path is guarded by construction or whether Drop for ActorContextShared needs special handling.

6. #[doc(hidden)] pub on core lifecycle methods

File: rivetkit-rust/packages/rivetkit-core/src/actor/context.rs

The NAPI bridge methods (set_ready, set_started, ready, started) are now pub with #[doc(hidden)]. A brief inline comment noting these are NAPI-bridge-only would reinforce the boundary co-located with the code, since #[doc(hidden)] does not prevent external callers.

7. Test timing assumption in sleep idle window tests

File: rivetkit-rust/packages/rivetkit-core/src/actor/sleep.rs

The advance(Duration::from_millis(1)).await added to the idle window tests couples the assertion to internal timing. A comment explaining why the advance is needed (to flush the sleep_until arm after the idle_notify refactor) would prevent future readers from incorrectly removing it.


Minor / Style

  • encodeJsonCompatValue/reviveJsonCompatValue lack dedicated unit tests for the sentinel escaping logic. Round-trip tests covering BigInt, ArrayBuffer, Uint8Array, undefined, and literal dollar-sign-prefixed 2-element arrays would catch subtle edge cases.
  • invalidate_v1_migration in takeover.rs returns bool with no documentation on what false means. A rename to try_invalidate_v1_migration or an inline doc comment would clarify intent.
  • setupTest in src/test/mod.ts calls registry.start() without an idempotency guard. Tests that reuse a registry instance could start it twice.
  • drain_ready_tunnel_messages uses a loop/match/try_recv pattern. A while let Ok(msg) = msg_rx.try_recv() would be more idiomatic Rust for the happy path.

Summary

A high-quality correctness cleanup. The queue waiter ordering fix, CBOR compat layer, shard page truncation guard, and runtime state migration are all sound. The main items worth addressing before merge are: verifying CBOR sentinel escape coverage for nested 2-element arrays (issue 1), and the silent drop of reply_tx in the LiveExit::Sleep branch of handle_stop (issue 3). The NAPI Ref drop-safety concern (issue 5) is worth a second look depending on the guaranteed shutdown call path.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 23, 2026

Preview packages published to npm

Install with:

npm install rivetkit@pr-4711

All packages published as 0.0.0-pr.4711.c1e3bef with tag pr-4711.

Engine binary is shipped via @rivetkit/engine-cli on linux-x64-musl, linux-arm64-musl, darwin-x64, and darwin-arm64. Windows users should use the release installer or set RIVET_ENGINE_BINARY.

Docker images:

docker pull rivetdev/engine:slim-c1e3bef
docker pull rivetdev/engine:full-c1e3bef
Individual packages
npm install rivetkit@pr-4711
npm install @rivetkit/react@pr-4711
npm install @rivetkit/rivetkit-napi@pr-4711
npm install @rivetkit/workflow-engine@pr-4711

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant