Skip to content

chore(rivetkit): wasm support#4860

Draft
NathanFlurry wants to merge 23 commits into04-29-feat_sqlite_add_cold_read_benchmarks_and_simplify_optimizationsfrom
04-29-chore_rivetkit_wasm_support
Draft

chore(rivetkit): wasm support#4860
NathanFlurry wants to merge 23 commits into04-29-feat_sqlite_add_cold_read_benchmarks_and_simplify_optimizationsfrom
04-29-chore_rivetkit_wasm_support

Conversation

@NathanFlurry
Copy link
Copy Markdown
Member

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Copy link
Copy Markdown
Member Author

NathanFlurry commented Apr 30, 2026

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

@NathanFlurry NathanFlurry force-pushed the 04-29-chore_rivetkit_wasm_support branch 2 times, most recently from c2da5e3 to cc67013 Compare April 30, 2026 20:24
@claude
Copy link
Copy Markdown

claude Bot commented Apr 30, 2026

Review: chore(rivetkit): wasm support

This is a substantial Phase 1 PR that adds remote SQLite execution (v4 envoy protocol) and a WASM runtime path (rivetkit-wasm). The architecture is well-designed — the split into native-transport/wasm-transport features, the typed ProtocolCompatibilityError hierarchy, and the RAII inflight guard pattern are all solid. A few issues below.


Bugs / Correctness

WasmActorConfig has dead fields that are silently dropped

rivetkit-typescript/packages/rivetkit-wasm/src/lib.rs defines create_state_timeout_ms, on_create_timeout_ms, on_wake_timeout_ms, on_before_actor_start_timeout_ms, and on_request_timeout_ms in WasmActorConfig, but the From<WasmActorConfig> for ActorConfigInput conversion doesn't map them (they don't exist in ActorConfigInput). Users who set these will get silently ignored config. Either remove them from WasmActorConfig or, if they should be supported, add them to ActorConfigInput with corresponding NAPI exposure.

remote_sqlite_inflight_counter is incremented before the semaphore is acquired

In spawn_tracked_remote_sqlite_task (ws_to_tunnel_task.rs), the inflight counter increments at track_remote_sqlite_inflight before worker_permits.acquire_owned().await. If all 32 worker slots are full, the counter will be non-zero while the task is queued waiting for a permit — so actor_stopped will block for the full remote_sqlite_stop_timeout even if the actual SQL work hasn't started. This is the safer direction (avoid closing the DB while tasks are pending), but worth documenting explicitly since it can make actor stop slower than expected under load.

actor_stopped spawned unconditionally without structured cleanup on drain timeout

The caller in ToRivetEvents now unconditionally spawns actor_stopped and drops the handle. If the SQLite drain times out (the ensure! in actor_lifecycle.rs), the error is logged but the active_actors entry is never removed and the executor cell leaks for the lifetime of the connection. The previous in-line call also swallowed errors with warn!, but consider whether the executor/inflight cleanup should still run on drain timeout to avoid accumulated state.


Convention Violations

Inline #[cfg(test)] mod tests in rivetkit-sqlite-types/src/lib.rs

CLAUDE.md: Rust tests live under tests/, not inline #[cfg(test)] mod tests in src/.
The two tests (execute_result_preserves_result_and_route_metadata and execute_result_projects_query_and_exec_results) should move to rivetkit-rust/packages/rivetkit-sqlite-types/tests/.

Missing doc comment on ActorConfig::remote_sqlite

The adjacent has_database field has a full explanation of what it controls and why. remote_sqlite has none. Add a one-liner explaining it selects SqliteBackend::RemoteEnvoy vs LocalNative.


Performance / Design Notes

handle_message clones Arc<Conn> for every message

The signature change from &Conn to Arc<Conn> adds an atomic refcount inc/dec per parsed message, even for the majority of message types that never need the Arc (all existing SQLite page I/O paths just reborrow with &conn). Consider keeping the parameter as &Arc<Conn> and only cloning in the ActorStateStopped and remote SQL dispatch branches.

Global 32-worker limit per connection, not per actor

REMOTE_SQLITE_WORKER_LIMIT = 32 is per-envoy-connection. A single high-traffic actor can saturate the pool. This is a reasonable starting point, but worth a TODO comment flagging per-actor limiting as a follow-up (the spec already notes this gap).


What's well done

  • Typed ProtocolCompatibilityError (versioned.rs): replacing bail!("sqlite requests require envoy-protocol v2") with a structured error carrying feature, direction, required_version, and target_version is a clear improvement. The compat tests in remote_sql_compat.rs cover all four cross-version directions cleanly.
  • compile_error! mutual-exclusion guards in connection/mod.rs: enforcing exactly one transport feature at compile time is the right approach.
  • RAII RemoteSqliteInflightGuard: incrementing on construction and decrementing on Drop is correct and doesn't require the spawned task to manage cleanup explicitly.
  • OnceCell per (actor_id, generation) executor: lazy init on first SQL request, reused for the generation lifetime, torn down on actor stop — exactly what the spec describes.
  • runtime-import-guard.test.ts: enforcing that raw NAPI/WASM binding imports stay behind the two adapter files is an excellent lightweight architectural lint.
  • rivetkit-sqlite-types extraction: removing the duplicated stub enums (BindParam, ColumnValue, ExecuteResult) under #[cfg(not(feature = "sqlite"))] in favor of a shared crate is a clean improvement.
  • open_database_from_engine generation fence check: the ensure! on generation equality before registering the VFS is the correct guard against stale executor reuse.
  • validate_remote_sqlite_request trust boundary: the namespace + actor state + generation checks are well-placed at the envoy edge per the trust boundary conventions.

Minor

  • scripts/ralph/prd.json, scripts/ralph/progress.txt, scripts/ralph/.last-branch, and the archived copies under scripts/ralph/archive/ look like agent working state included accidentally.
  • .mcp.json deletion is unrelated to the feature; confirm it's intentional.

@NathanFlurry NathanFlurry force-pushed the 04-29-chore_rivetkit_wasm_support branch from cc67013 to 716eee0 Compare April 30, 2026 20:29
@claude
Copy link
Copy Markdown

claude Bot commented Apr 30, 2026

Code Review: WASM Support + Remote SQLite Execution

This PR adds WASM support for rivetkit-core and introduces remote SQLite execution over the envoy protocol (v4). The architecture is well-structured and the invariants are clearly documented in .agent/specs/rivetkit-core-wasm-support.md. Overall this is a solid implementation — feedback below is mostly medium/minor.


Protocol versioning (engine/sdks/rust/envoy-protocol/src/versioned.rs)

_ => Ok(()) on owned enum variants — CLAUDE.md violation

ensure_to_envoy_v1_compatible and ensure_to_rivet_v1_compatible (and their v2/v3 variants) use _ => Ok(()) as a catch-all on v4::ToEnvoy / v4::ToRivet. Per CLAUDE.md, wildcard fall-through on owned enum types means a future protocol message variant could silently pass a compatibility gate without a compile-time reminder to update the exclusion list.

// ensure_to_envoy_v1_compatible line 128
_ => Ok(()),  // new v5 message types would slip through here

Recommendation: enumerate every variant explicitly in each ensure_*_compatible function, or at minimum document that the _ => Ok(()) arm is deliberately inclusive and describe the invariant that makes it safe (e.g. "all v4 messages not listed above existed in v1").

_ => bail!(...) on v1::ToEnvoy in convert_to_envoy_v1_to_v2

Same rule. v1:: is a published, frozen schema so new variants won't appear in practice, but the same lint applies. _ => unreachable!() would at least make the assertion explicit.


Error classification (rivetkit-rust/packages/rivetkit-core/src/actor/sqlite.rs)

remote_sqlite_error_response uses string matching to distinguish "unavailable" from "execution failed":

fn remote_sqlite_error_response(message: String) -> anyhow::Error {
    if message.contains("unavailable") || message.contains("unsupported") {
        return SqliteRuntimeError::RemoteUnavailable { reason: message }.build();
    }
    SqliteRuntimeError::RemoteExecutionFailed { message }.build()
}

This is fragile — changing the error message wording in the envoy would silently reclassify errors. Since the protocol already has typed union responses (SqliteExecResponse = SqliteExecOk | SqliteFenceMismatch | SqliteErrorResponse), consider wrapping the error earlier in the response-to-anyhow conversion with a typed sentinel (similar to how ProtocolCompatibilityError is used for the unavailable-version case) so callers can downcast_ref deterministically instead of scanning strings.


Message serialization panic (engine/sdks/rust/envoy-client/src/connection/mod.rs)

let encoded = crate::protocol::versioned::ToRivet::wrap_latest(message)
    .serialize(protocol::PROTOCOL_VERSION)
    .expect("failed to encode message");  // panics instead of returning error

The .expect here would take down the entire envoy connection on an unexpected serialization failure. A tracing::error! + early return (or propagating with ?) would be more graceful.


runtime.rs duplicated #[cfg] blocks

native-runtime and the "neither runtime feature" fallback have identical bodies for RuntimeBoxFuture, RuntimeFuture, RuntimeFutureOutput, and RuntimeSpawner::spawn. The duplication compiles correctly but creates maintenance surface. A single #[cfg(not(feature = "wasm-runtime"))] guard would cover both without repeating:

#[cfg(not(feature = "wasm-runtime"))]
pub type RuntimeBoxFuture<T> = Pin<Box<dyn Future<Output = T> + Send>>;

Minor: commented-out code in test support

engine/packages/pegboard-envoy/tests/support/ws_to_tunnel_task.rs has dead code left over from development:

// use scc::HashMap;
// let authorized_tunnel_routes = HashMap::new();
// let authorized_tunnel_routes = HashMap::new();

These should be removed.


What looks good

  • Protocol compatibility gating: ProtocolCompatibilityError { feature: RemoteSqliteExecution, required_version: 4 } is a clean, typed mechanism; rivetkit-core correctly maps it to SqliteRuntimeError::RemoteUnavailable with downcast_ref.
  • Input validation: 1 MiB SQL limit, 1024 param limit, and 1 MiB bind-param limit in ws_to_tunnel_task.rs are applied before the request reaches the executor.
  • Indeterminate result semantics: fail_sent_remote_sqlite_requests_with_indeterminate_result correctly separates "unsent, safe to retry" from "sent, outcome unknown" on WS disconnect.
  • AsyncCounter: wait_zero correctly arms the Notify permit before the counter check, avoiding the race described in CLAUDE.md ("arm the notification before re-checking").
  • Lock choices: parking_lot::Mutex in actor/sqlite.rs has justifying comments ("Forced-sync: native SQLite handles are read from synchronous diagnostic accessors..."). scc::HashMap is used correctly for the actor map in ws_to_tunnel_task.rs.
  • Feature gating: wasm-runtime / sqlite-local gates are consistently applied and connection/mod.rs uses compile_error! to catch mutually exclusive feature combinations at build time.
  • Test coverage: remote_execution_parity.rs and rivetkit-core/tests/sqlite.rs cover protocol conversion, error mapping, and backend selection across feature combinations.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 30, 2026

PR #4860 Review: chore(rivetkit): wasm support

Overview

This is a large, multi-phase draft PR (~18,800 additions, ~2,100 deletions, ~100 files) adding WebAssembly support to RivetKit.

  • Phase 1: Remote SQLite SQL execution via a new envoy protocol v4, allowing actors to delegate SQL to pegboard-envoy.
  • Phase 2: A wasm-bindgen-based @rivetkit/rivetkit-wasm package enabling rivetkit-core to run on edge runtimes (Cloudflare Workers, Deno Deploy, Supabase Edge Functions) with no native binary.

Issues

1. Stopping state accepts new remote SQL requests

In ws_to_tunnel_task.rs inside validate_remote_sqlite_active_generation, the Stopping arm is a no-op. The spec says actor stop rejects new SQL after stopping begins, but the implementation currently accepts new requests during Stopping. While the in-flight drain in actor_lifecycle.rs prevents SQLite from closing while requests are active, this creates a window where new requests arrive just before the close deadline runs out. The validation gate should return a protocol error for Stopping to be consistent with the spec.

2. Missing field mappings in WasmActorConfig to ActorConfigInput

WasmActorConfig exposes create_state_timeout_ms, on_create_timeout_ms, on_wake_timeout_ms, on_before_actor_start_timeout_ms, and on_request_timeout_ms, but the From<WasmActorConfig> for ActorConfigInput implementation does not map them. If intentionally excluded, a comment is needed; if an oversight, they should be wired through.

3. unsafe impl Send/Sync for WasmFunction needs a cfg guard

These impls are sound for wasm32-unknown-unknown (single JS thread + LocalSet) but would be unsound on wasm32-wasi-threads or any truly multi-threaded wasm target. Adding a #[cfg(target_arch = "wasm32")] guard would make the soundness precondition explicit.

4. leak_str creates unbounded static allocations

Used to build RivetErrorSchema with &'static str fields. Leaks accumulate one entry per unique error group/code pair observed at runtime. In practice bounded, but the function has no comment explaining why leaking is necessary or what bounds the growth.

5. Indentation inconsistency in WasmServeConfig::from

Fields starting with handle_inspector_http_in_runtime in lib.rs have extra leading tabs vs. the surrounding struct literal. Cosmetic but violates Rust hard-tab consistency.

6. WASM transport (connection/wasm.rs) not implemented

connection/mod.rs references a WASM transport gated behind wasm-transport, but connection/wasm.rs does not appear in the PR. Phase 2 (WASM actors connecting to pegboard-envoy over WebSocket) is not shippable from this PR alone. Worth calling out explicitly in the PR description.


Positive Observations

  • Protocol versioning: New v4 types are additive; no existing *.bare schemas are mutated. Correct.
  • AsyncCounter: Arms the Notify permit before re-checking the counter value in wait_zero, correctly avoiding the decrement-to-zero race described in CLAUDE.md.
  • Concurrent maps: RemoteSqliteExecutors uses scc::HashMap, not Mutex<HashMap>. Correct.
  • Bounded worker pool: Semaphore(32) correctly caps concurrent remote SQL workers per connection.
  • Compile-time transport exclusion: compile_error! macros in connection/mod.rs prevent invalid native-transport + wasm-transport combinations.
  • WASM build gate script: check-rivetkit-core-wasm.sh checks both dependency tree and feature graph for native leakage, and verifies native features fail on wasm32-unknown-unknown.
  • Runtime import guard test: runtime-import-guard.test.ts statically enforces that only napi-runtime.ts and wasm-runtime.ts may import raw binding packages.
  • RegistryState match arms: All variants enumerated explicitly; no _ => fallthrough. Correct.
  • Bridge error protocol: __RIVET_ERROR_JSON__: correctly routes structured errors across the Rust/JS boundary in both directions.
  • wasm_local_sqlite error: Typed RivetError with public: true, statusCode: 400 at config time. Correct.
  • vi.waitFor comments: All calls in new test files have adjacent // lines explaining why polling is necessary.

Summary

The architecture is sound and the implementation is largely correct. Main items to address before merging: (1) the Stopping-state SQL acceptance gap vs. the spec, (2) missing WasmActorConfig field mappings, and (3) unsafe impl cfg guard. The incomplete WASM transport is expected for a draft but should be clearly scoped in the PR description.

@NathanFlurry NathanFlurry force-pushed the 04-29-chore_rivetkit_wasm_support branch from 716eee0 to 8da73a1 Compare April 30, 2026 20:35
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 30, 2026

Preview packages published to npm

Install with:

npm install rivetkit@pr-4860

All packages published as 0.0.0-pr.4860.b5fe23b with tag pr-4860.

Engine binary is shipped via @rivetkit/engine-cli on linux-x64-musl, linux-arm64-musl, darwin-x64, and darwin-arm64. Windows users should use the release installer or set RIVET_ENGINE_BINARY.

Docker images:

docker pull rivetdev/engine:slim-b5fe23b
docker pull rivetdev/engine:full-b5fe23b
Individual packages
npm install rivetkit@pr-4860
npm install @rivetkit/react@pr-4860
npm install @rivetkit/rivetkit-napi@pr-4860
npm install @rivetkit/workflow-engine@pr-4860

@NathanFlurry NathanFlurry force-pushed the 04-29-chore_rivetkit_wasm_support branch from 8da73a1 to 06acbb1 Compare May 1, 2026 11:16
@NathanFlurry NathanFlurry force-pushed the 04-29-chore_rivetkit_wasm_support branch from 06acbb1 to 6213c3a Compare May 2, 2026 00:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant