Skip to content

chore(rivetkit): wasm support#4860

Draft
NathanFlurry wants to merge 42 commits into04-29-feat_sqlite_add_cold_read_benchmarks_and_simplify_optimizationsfrom
04-29-chore_rivetkit_wasm_support
Draft

chore(rivetkit): wasm support#4860
NathanFlurry wants to merge 42 commits into04-29-feat_sqlite_add_cold_read_benchmarks_and_simplify_optimizationsfrom
04-29-chore_rivetkit_wasm_support

Conversation

@NathanFlurry
Copy link
Copy Markdown
Member

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Copy link
Copy Markdown
Member Author

NathanFlurry commented Apr 30, 2026

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

@NathanFlurry NathanFlurry force-pushed the 04-29-chore_rivetkit_wasm_support branch 2 times, most recently from c2da5e3 to cc67013 Compare April 30, 2026 20:24
@claude
Copy link
Copy Markdown

claude Bot commented Apr 30, 2026

Review: chore(rivetkit): wasm support

This is a substantial Phase 1 PR that adds remote SQLite execution (v4 envoy protocol) and a WASM runtime path (rivetkit-wasm). The architecture is well-designed — the split into native-transport/wasm-transport features, the typed ProtocolCompatibilityError hierarchy, and the RAII inflight guard pattern are all solid. A few issues below.


Bugs / Correctness

WasmActorConfig has dead fields that are silently dropped

rivetkit-typescript/packages/rivetkit-wasm/src/lib.rs defines create_state_timeout_ms, on_create_timeout_ms, on_wake_timeout_ms, on_before_actor_start_timeout_ms, and on_request_timeout_ms in WasmActorConfig, but the From<WasmActorConfig> for ActorConfigInput conversion doesn't map them (they don't exist in ActorConfigInput). Users who set these will get silently ignored config. Either remove them from WasmActorConfig or, if they should be supported, add them to ActorConfigInput with corresponding NAPI exposure.

remote_sqlite_inflight_counter is incremented before the semaphore is acquired

In spawn_tracked_remote_sqlite_task (ws_to_tunnel_task.rs), the inflight counter increments at track_remote_sqlite_inflight before worker_permits.acquire_owned().await. If all 32 worker slots are full, the counter will be non-zero while the task is queued waiting for a permit — so actor_stopped will block for the full remote_sqlite_stop_timeout even if the actual SQL work hasn't started. This is the safer direction (avoid closing the DB while tasks are pending), but worth documenting explicitly since it can make actor stop slower than expected under load.

actor_stopped spawned unconditionally without structured cleanup on drain timeout

The caller in ToRivetEvents now unconditionally spawns actor_stopped and drops the handle. If the SQLite drain times out (the ensure! in actor_lifecycle.rs), the error is logged but the active_actors entry is never removed and the executor cell leaks for the lifetime of the connection. The previous in-line call also swallowed errors with warn!, but consider whether the executor/inflight cleanup should still run on drain timeout to avoid accumulated state.


Convention Violations

Inline #[cfg(test)] mod tests in rivetkit-sqlite-types/src/lib.rs

CLAUDE.md: Rust tests live under tests/, not inline #[cfg(test)] mod tests in src/.
The two tests (execute_result_preserves_result_and_route_metadata and execute_result_projects_query_and_exec_results) should move to rivetkit-rust/packages/rivetkit-sqlite-types/tests/.

Missing doc comment on ActorConfig::remote_sqlite

The adjacent has_database field has a full explanation of what it controls and why. remote_sqlite has none. Add a one-liner explaining it selects SqliteBackend::RemoteEnvoy vs LocalNative.


Performance / Design Notes

handle_message clones Arc<Conn> for every message

The signature change from &Conn to Arc<Conn> adds an atomic refcount inc/dec per parsed message, even for the majority of message types that never need the Arc (all existing SQLite page I/O paths just reborrow with &conn). Consider keeping the parameter as &Arc<Conn> and only cloning in the ActorStateStopped and remote SQL dispatch branches.

Global 32-worker limit per connection, not per actor

REMOTE_SQLITE_WORKER_LIMIT = 32 is per-envoy-connection. A single high-traffic actor can saturate the pool. This is a reasonable starting point, but worth a TODO comment flagging per-actor limiting as a follow-up (the spec already notes this gap).


What's well done

  • Typed ProtocolCompatibilityError (versioned.rs): replacing bail!("sqlite requests require envoy-protocol v2") with a structured error carrying feature, direction, required_version, and target_version is a clear improvement. The compat tests in remote_sql_compat.rs cover all four cross-version directions cleanly.
  • compile_error! mutual-exclusion guards in connection/mod.rs: enforcing exactly one transport feature at compile time is the right approach.
  • RAII RemoteSqliteInflightGuard: incrementing on construction and decrementing on Drop is correct and doesn't require the spawned task to manage cleanup explicitly.
  • OnceCell per (actor_id, generation) executor: lazy init on first SQL request, reused for the generation lifetime, torn down on actor stop — exactly what the spec describes.
  • runtime-import-guard.test.ts: enforcing that raw NAPI/WASM binding imports stay behind the two adapter files is an excellent lightweight architectural lint.
  • rivetkit-sqlite-types extraction: removing the duplicated stub enums (BindParam, ColumnValue, ExecuteResult) under #[cfg(not(feature = "sqlite"))] in favor of a shared crate is a clean improvement.
  • open_database_from_engine generation fence check: the ensure! on generation equality before registering the VFS is the correct guard against stale executor reuse.
  • validate_remote_sqlite_request trust boundary: the namespace + actor state + generation checks are well-placed at the envoy edge per the trust boundary conventions.

Minor

  • scripts/ralph/prd.json, scripts/ralph/progress.txt, scripts/ralph/.last-branch, and the archived copies under scripts/ralph/archive/ look like agent working state included accidentally.
  • .mcp.json deletion is unrelated to the feature; confirm it's intentional.

@NathanFlurry NathanFlurry force-pushed the 04-29-chore_rivetkit_wasm_support branch from cc67013 to 716eee0 Compare April 30, 2026 20:29
@claude
Copy link
Copy Markdown

claude Bot commented Apr 30, 2026

Code Review: WASM Support + Remote SQLite Execution

This PR adds WASM support for rivetkit-core and introduces remote SQLite execution over the envoy protocol (v4). The architecture is well-structured and the invariants are clearly documented in .agent/specs/rivetkit-core-wasm-support.md. Overall this is a solid implementation — feedback below is mostly medium/minor.


Protocol versioning (engine/sdks/rust/envoy-protocol/src/versioned.rs)

_ => Ok(()) on owned enum variants — CLAUDE.md violation

ensure_to_envoy_v1_compatible and ensure_to_rivet_v1_compatible (and their v2/v3 variants) use _ => Ok(()) as a catch-all on v4::ToEnvoy / v4::ToRivet. Per CLAUDE.md, wildcard fall-through on owned enum types means a future protocol message variant could silently pass a compatibility gate without a compile-time reminder to update the exclusion list.

// ensure_to_envoy_v1_compatible line 128
_ => Ok(()),  // new v5 message types would slip through here

Recommendation: enumerate every variant explicitly in each ensure_*_compatible function, or at minimum document that the _ => Ok(()) arm is deliberately inclusive and describe the invariant that makes it safe (e.g. "all v4 messages not listed above existed in v1").

_ => bail!(...) on v1::ToEnvoy in convert_to_envoy_v1_to_v2

Same rule. v1:: is a published, frozen schema so new variants won't appear in practice, but the same lint applies. _ => unreachable!() would at least make the assertion explicit.


Error classification (rivetkit-rust/packages/rivetkit-core/src/actor/sqlite.rs)

remote_sqlite_error_response uses string matching to distinguish "unavailable" from "execution failed":

fn remote_sqlite_error_response(message: String) -> anyhow::Error {
    if message.contains("unavailable") || message.contains("unsupported") {
        return SqliteRuntimeError::RemoteUnavailable { reason: message }.build();
    }
    SqliteRuntimeError::RemoteExecutionFailed { message }.build()
}

This is fragile — changing the error message wording in the envoy would silently reclassify errors. Since the protocol already has typed union responses (SqliteExecResponse = SqliteExecOk | SqliteFenceMismatch | SqliteErrorResponse), consider wrapping the error earlier in the response-to-anyhow conversion with a typed sentinel (similar to how ProtocolCompatibilityError is used for the unavailable-version case) so callers can downcast_ref deterministically instead of scanning strings.


Message serialization panic (engine/sdks/rust/envoy-client/src/connection/mod.rs)

let encoded = crate::protocol::versioned::ToRivet::wrap_latest(message)
    .serialize(protocol::PROTOCOL_VERSION)
    .expect("failed to encode message");  // panics instead of returning error

The .expect here would take down the entire envoy connection on an unexpected serialization failure. A tracing::error! + early return (or propagating with ?) would be more graceful.


runtime.rs duplicated #[cfg] blocks

native-runtime and the "neither runtime feature" fallback have identical bodies for RuntimeBoxFuture, RuntimeFuture, RuntimeFutureOutput, and RuntimeSpawner::spawn. The duplication compiles correctly but creates maintenance surface. A single #[cfg(not(feature = "wasm-runtime"))] guard would cover both without repeating:

#[cfg(not(feature = "wasm-runtime"))]
pub type RuntimeBoxFuture<T> = Pin<Box<dyn Future<Output = T> + Send>>;

Minor: commented-out code in test support

engine/packages/pegboard-envoy/tests/support/ws_to_tunnel_task.rs has dead code left over from development:

// use scc::HashMap;
// let authorized_tunnel_routes = HashMap::new();
// let authorized_tunnel_routes = HashMap::new();

These should be removed.


What looks good

  • Protocol compatibility gating: ProtocolCompatibilityError { feature: RemoteSqliteExecution, required_version: 4 } is a clean, typed mechanism; rivetkit-core correctly maps it to SqliteRuntimeError::RemoteUnavailable with downcast_ref.
  • Input validation: 1 MiB SQL limit, 1024 param limit, and 1 MiB bind-param limit in ws_to_tunnel_task.rs are applied before the request reaches the executor.
  • Indeterminate result semantics: fail_sent_remote_sqlite_requests_with_indeterminate_result correctly separates "unsent, safe to retry" from "sent, outcome unknown" on WS disconnect.
  • AsyncCounter: wait_zero correctly arms the Notify permit before the counter check, avoiding the race described in CLAUDE.md ("arm the notification before re-checking").
  • Lock choices: parking_lot::Mutex in actor/sqlite.rs has justifying comments ("Forced-sync: native SQLite handles are read from synchronous diagnostic accessors..."). scc::HashMap is used correctly for the actor map in ws_to_tunnel_task.rs.
  • Feature gating: wasm-runtime / sqlite-local gates are consistently applied and connection/mod.rs uses compile_error! to catch mutually exclusive feature combinations at build time.
  • Test coverage: remote_execution_parity.rs and rivetkit-core/tests/sqlite.rs cover protocol conversion, error mapping, and backend selection across feature combinations.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 30, 2026

PR #4860 Review: chore(rivetkit): wasm support

Overview

This large draft PR (~24,000 additions, ~2,600 deletions across ~100 files) adds WebAssembly support to RivetKit, with two main phases:

  • Phase 1: Remote SQLite execution via a new envoy protocol v4, allowing actors to delegate SQL to pegboard-envoy.
  • Phase 2: A wasm-bindgen-based path enabling rivetkit-core to run on edge runtimes (Cloudflare Workers, Deno Deploy, Supabase Edge Functions) without a native binary.

Issues

Medium

1. Stopping state accepts new remote SQL requests

In ws_to_tunnel_task.rs inside validate_remote_sqlite_active_generation, the Stopping arm is a no-op. The spec says actor stop rejects new SQL after stopping begins, but the implementation currently accepts new requests during Stopping. While the in-flight drain in actor_lifecycle.rs prevents SQLite from closing while requests are active, this creates a window where new requests can arrive just before the close deadline expires. The validation gate should return a protocol error during Stopping to be consistent with the spec.

2. Missing field mappings in WasmActorConfig to ActorConfigInput

WasmActorConfig exposes create_state_timeout_ms, on_create_timeout_ms, on_wake_timeout_ms, on_before_actor_start_timeout_ms, and on_request_timeout_ms, but the From<WasmActorConfig> for ActorConfigInput implementation does not wire them through. If intentionally excluded, a comment is needed; if an oversight, they should be mapped.

3. Unbounded WASM channels in connection/wasm.rs

Both event_tx and ws_tx use mpsc::unbounded_channel. In a WASM single-threaded event loop the sender and receiver share the same thread -- if the JS event loop delivers messages faster than the Rust async executor can drain them, memory grows without bound. A bounded channel with a large-but-finite capacity, or at minimum a comment justifying unbounded, is needed.

4. cleanup_old_sqlite_requests reuses the KV expiry timeout for SQLite

SQLite page-IO requests (large commits, cold reads) may legitimately take longer than the KV expiry window. Using KV_EXPIRE_MS for SQLite timeouts can silently discard in-flight requests that are still pending at the engine. A dedicated SQLITE_EXPIRE_MS constant is safer.

5. remote_sqlite_error_response string-matching heuristic

message.contains("unavailable") || message.contains("unsupported") to detect RemoteUnavailable errors is fragile. If the engine returns a message containing those substrings for an unrelated reason, it will be silently misclassified. The engine should return a structured error code rather than rely on substring matching.

Low / Style

6. Silent sleep fallback to immediate resolution

In both connection/wasm.rs and rivetkit-core/src/lib.rs::time::sleep, if Reflect::get("setTimeout") fails (non-browser WASM runtime), resolve.call0 fires immediately, turning the reconnect backoff into a busy-loop. A tracing::warn! on the fallback path would prevent silent reconnect storms in production.

7. Duplicate sleep implementation

The setTimeout-based sleep function is byte-for-byte identical in connection/wasm.rs and rivetkit-core/src/lib.rs::time. This should be consolidated into a shared utility.

8. Dead RuntimeFuture/RuntimeFutureOutput impls for the no-feature fallback case (runtime.rs)

The not(any(...)) variants are identical to the native-runtime variants, but lib.rs already has a compile_error! that fires when neither feature is active. Those fallback impls are unreachable dead code and should be removed or documented as compile-time scaffolding.

9. wasm-transport feature disables helper functions on non-wasm32 targets

send_initial_metadata, forward_to_envoy, and ws_url in connection/mod.rs are gated with cfg(any(feature = "native-transport", all(feature = "wasm-transport", target_arch = "wasm32"))). Enabling wasm-transport during a cross-compile check targeting x86_64 (common in CI) will cause a compile failure. A compile_error! or doc note explaining this constraint would prevent confusion.

10. unwrap_latest bail message lacks version number (versioned.rs)

When deserializing a V1/V2/V3 wire payload and calling unwrap_latest directly, the caller gets a generic "version not latest" error with no indication of the actual version received. The error message should include the actual version.

11. Namespace matching heuristic in validate_remote_sqlite_request needs a comment

namespace_id == conn.namespace_id.to_string() || namespace_id == conn.namespace_name.as_str() accepts either UUID or name string. The logic is safe (both sides come from the authenticated connection), but a comment explaining why both forms are accepted would reduce future reviewer confusion.


Positive Observations

  • Compile-time mutual-exclusivity guards on native-runtime + wasm-runtime and native-transport + wasm-transport feature flag combinations are correct and enforced throughout.
  • Full enum variant matching (no _ => catch-alls) on ToEnvoy/ToRivet dispatch arms in versioned.rs -- new variants added in the future will be caught at compile time, per CLAUDE.md conventions.
  • Explicit compatibility errors on V4->V3 downgrades with ensure_to_envoy_v3_compatible / ensure_to_rivet_v3_compatible guards enumerate all variants correctly.
  • MAX_REMOTE_SQL_BYTES, MAX_REMOTE_SQL_PARAMS, MAX_REMOTE_SQL_BIND_BYTES constants in ws_to_tunnel_task.rs are clearly defined and consistently enforced; saturating_add is used for per-param byte accounting on untrusted input.
  • Comprehensive remote SQLite test coverage: the four test cases cover round-trip, shutdown, sent-disconnect, and unsent-survive-reconnect transitions, plus remote_sqlite_requests_reject_protocol_v3_serialization directly validates the versioning contract.
  • ActorStateStopped handler in ws_to_tunnel_task.rs uses tokio::spawn with a warn on failure, keeping the read loop live for concurrent in-flight SQL frames.

This review was auto-generated. PR is still in DRAFT; issues are flagged for pre-merge consideration.

@NathanFlurry NathanFlurry force-pushed the 04-29-chore_rivetkit_wasm_support branch from 716eee0 to 8da73a1 Compare April 30, 2026 20:35
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 30, 2026

Preview packages published to npm

Install with:

npm install rivetkit@pr-4860

All packages published as 0.0.0-pr.4860.f43cf47 with tag pr-4860.

Engine binary is shipped via @rivetkit/engine-cli on linux-x64-musl, linux-arm64-musl, darwin-x64, and darwin-arm64. Windows users should use the release installer or set RIVET_ENGINE_BINARY.

Docker images:

docker pull rivetdev/engine:slim-f43cf47
docker pull rivetdev/engine:full-f43cf47
Individual packages
npm install rivetkit@pr-4860
npm install @rivetkit/react@pr-4860
npm install @rivetkit/rivetkit-napi@pr-4860
npm install @rivetkit/workflow-engine@pr-4860

@NathanFlurry NathanFlurry force-pushed the 04-29-chore_rivetkit_wasm_support branch 2 times, most recently from 06acbb1 to 6213c3a Compare May 2, 2026 00:13
@NathanFlurry NathanFlurry force-pushed the 04-29-chore_rivetkit_wasm_support branch from 6213c3a to 4966b21 Compare May 2, 2026 06:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant