feat: US-001 - Define SqliteKv trait in rivetkit-sqlite-native by NathanFlurry · Pull Request #4584 · rivet-dev/rivet

NathanFlurry · 2026-04-07T17:08:23Z

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

My code follows the style guidelines of this project
I have performed a self-review of my code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

railway-app · 2026-04-07T17:08:36Z

🚅 Deployed to the rivet-pr-4584 environment in rivet-frontend

Service	Status	Web	Updated (UTC)
website	😴 Sleeping (View Logs)	Web	Apr 8, 2026 at 9:31 am
frontend-inspector	❌ Build Failed (View Logs)	Web	Apr 7, 2026 at 5:43 pm
frontend-cloud	❌ Build Failed (View Logs)	Web	Apr 7, 2026 at 5:43 pm
kitchen-sink	❌ Build Failed (View Logs)	Web	Apr 7, 2026 at 5:43 pm
mcp-hub	✅ Success (View Logs)	Web	Apr 7, 2026 at 5:09 pm
ladle	❌ Build Failed (View Logs)	Web	Apr 7, 2026 at 5:09 pm

NathanFlurry · 2026-04-07T17:08:41Z

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

chore: remove udb as a dependency of envoy-client #4587 : 2 dependent PRs (#4588 , #4591 )
chore: revert actor v2 hack #4586
fix: fix pool metadata protocol version refresh, rivetkit native ws #4585
feat: US-001 - Define SqliteKv trait in rivetkit-sqlite-native #4584 👈 (View in Graphite)
feat(rust/engine-runner): minimal rust engine runner #4289
chore: delete kv channel #4583
feat(engine): rust runner #4582
chore: remove non-engine drivers #4574 : 1 other dependent PR (#4581 )
feat: dynamic actors #4397
fix: driver test suite & sqlite reliability #4559
fix: remove serverless token #4572
fix: misc token fixes #4569
fix(pb): clean up actor stop decision handling #4566
fix(envoy): use global instance, add signal handlers #4565
chore: misc fixes, add pb snapshot test #4561
fix(cache): make in memory cache global #4560
main

This stack of pull requests is managed by Graphite. Learn more about stacking.

claude

Code Review

This PR introduces a new rivet-envoy-client Rust crate (a 1:1 port of the TypeScript envoy client), a rivetkit-native N-API module bridging the Rust client to JavaScript, SQLite KV trait integration, and a large restructuring of the driver test suite. The architecture is sound and the code is generally well-structured. Below are findings organized by severity.

Bugs / Correctness Issues

handle.rs — O(n²) key matching in kv_get + unnecessary pre-clone of keys

kv_get clones the keys vec before passing ownership into the request (line 99), then uses a nested loop to match response keys back to requested keys (lines 110-122). The known-acknowledged P10 note in .agent/notes/rust-envoy-client-issues.md tracks the O(n²) complexity but the pre-clone on line 99 is an extra copy that can be avoided. Since the response keys are owned after the call, you can build a HashMap from them instead:

// keys is consumed by send_kv_request; capture the count first
let key_count = keys.len();
// ... send request ...
// In the response handler, build a lookup map from response keys to values
let mut lookup: HashMap<Vec<u8>, Vec<u8>> = resp.keys.into_iter().zip(resp.values).collect();
let result: Vec<Option<Vec<u8>>> = request_keys.into_iter().map(|k| lookup.remove(&k)).collect();

This avoids both the pre-clone and the quadratic loop.

actor.rs — req.headers cloned unnecessarily (line 301)

req is owned at that point (moved into handle_req_start), so req.headers.iter().map(|(k, v)| (k.clone(), v.clone())) can be replaced with req.headers.into_iter(). The field is not used after this line.

actor.rs — double full_headers clone in handle_ws_open (lines 404-405, 428)

full_headers is cloned once into request.headers and again passed directly as headers to the websocket callback. Since request is passed before full_headers, you can reorder the callback args or move full_headers into request and reconstruct from it, but at minimum the two clones of the same map should be noted.

handle.rs — start_serverless_actor checks version equality only (lines 302-307)

The version check uses strict equality (!=), which will reject any future protocol minor-version bump. This may be intentional for security but should be called out with a comment explaining why a range check was not used.

connection.rs — expect on protocol encode (line 229)

.expect("failed to encode message");

A panic on a serialization failure in a hot path (every outbound WebSocket message) is unrecoverable. This should propagate as an error or at minimum log and close the connection gracefully.

CLAUDE.md Convention Violations

anyhow! macro used instead of .context()

The project convention is to prefer .context() over anyhow!. Multiple files use anyhow::anyhow!(...) directly:

connection.rs:79 — anyhow::anyhow!("failed to build ws request: {e}")
handle.rs:332, 350, 351 — channel-closed error paths
envoy.rs:277, 306 — shutdown paths
kv.rs:55, 129 — error/timeout paths
bridge_actor.rs:85, 107, 144 — callback channel-closed paths

Where the source is an existing ?-propagated error, use .context("...") instead. For terminal bail! calls where there is no underlying error, anyhow::bail! (not anyhow::anyhow!) is acceptable.

Arc<Mutex<HashMap>> used in bridge_actor.rs (line 19)

pub type ResponseMap = Arc<Mutex<HashMap<String, oneshot::Sender<serde_json::Value>>>>;

Per CLAUDE.md, prefer scc::HashMap or moka::Cache over Arc<Mutex<HashMap>> for concurrent containers. The ResponseMap is accessed from multiple threads (Rust async tasks and NAPI callbacks).

Arc<Mutex<_>> redundancy in context.rs (lines 15-16)

SharedContext is already behind Arc<SharedContext>. The inner Arc<Mutex<...>> wrappers on ws_tx and protocol_metadata add a second heap allocation and refcount. This was acknowledged as P8 in .agent/notes/rust-envoy-client-issues.md as "negligible," but CLAUDE.md's preference for scc structures still applies to ws_tx—a tokio::sync::watch or an scc-based pattern could replace the mutex.

Log messages with uppercase / non-code-symbol words

CLAUDE.md requires log messages to be lowercase unless mentioning specific code symbols:

bridge_actor.rs:74 — "calling JS actor_start callback via TSFN" → "calling js actor_start callback via TSFN" (or "calling JS on_actor_start callback" since TSFN is a code-adjacent acronym)
bridge_actor.rs:76 — "TSFN call returned" → acceptable since TSFN is a code symbol
kv.rs:62 (log in handle_kv_response) — "received kv response for unknown request id" — this is correct
commands.rs:10 — "received commands" — correct

Comment style violations

CLAUDE.md requires comments to be complete sentences. Some comments are fragments:

connection.rs:214 — /// Send a message over the WebSocket. Returns true if the message could not be sent. — "Returns true if the message could not be sent" is inverted/confusing; should be "Returns true if the socket was unavailable."
envoy.rs:400-401 — multi-line comment // Wait for all actors to finish. The process manager (Docker,\n// k8s, etc.) provides the ultimate shutdown deadline. — this is fine, but uses an abbreviation list style that's acceptable.
Several inline comments use // ... fragment style but are acceptable.

Performance / Design Notes

send_actor_message in actor.rs clones message_kind and msg on every call (lines 877, 880)

Every outbound tunnel message constructs msg, then immediately clones it to buffer_msg before sending (even when the WS send succeeds and buffering never happens). The buffer_msg clone could be deferred behind the if failed branch by restructuring the buffering logic, since ws_send consumes msg by value already.

kv.rs — data.clone() in send_single_kv_request (line 91)

request.data.clone() is called because request is a &mut KvRequestEntry and data must be borrowed to construct the outbound message. This requires KvRequestData: Clone. For large KV payloads this clone is on the hot retry path. Consider storing data as Option<KvRequestData> and taking it on first send to avoid the clone.

Double ws_tx lock acquisition in handle_kv_request and process_unsent_kv_requests

handle_kv_request and process_unsent_kv_requests each acquire shared.ws_tx.lock() only to check guard.is_some(), then release it before calling send_single_kv_request, which acquires the lock again. This double-acquisition is a TOCTOU window. Consider passing the lock guard down or restructuring so the send path holds the lock once.

Architecture / Structural Concerns

events.rs — dead code in handle_ack_events (lines 23-25)

if matches!(state_update.state, protocol::ActorState::ActorStateStopped(_)) {
    // Mark handle as done - actor task will finish on its own
}

This if block is empty and does nothing. It should either be removed or have the intended behavior implemented.

actor.rs — handle_ws_open pending request is not cleaned up on can_hibernate error

If the websocket callback returns an Err, the pending request entry inserted at line 390 is removed (line 543) but the request_to_actor map in EnvoyContext already has an entry for this (gateway_id, request_id) pair (inserted in tunnel.rs:handle_ws_open). That request_to_actor entry is never cleaned up on error in the actor task.

Draft PR with open blockers

The PR is in DRAFT state and .agent/notes/native-bridge-bugs.md explicitly documents two unresolved bugs:

WebSocket connections time out client-side (client never receives open event).
SQLite sqlite3_open_v2 fails with code 14 (SQLITE_CANTOPEN).

The .agent/notes/driver-test-status.md also documents that the EngineActorDriver test suite is blocked on the engine's v2 actor workflow not processing the Running event. These are significant and confirm the draft status is appropriate—this review should be revisited once those blockers are resolved.

Test coverage

The PR removes all existing engine-runner integration tests (tests/e2e_counter_runner.rs, e2e_counter_serverless.rs, e2e_counter_serverless_upsert.rs, e2e_websocket.rs) without replacing them with equivalent tests for the new Rust envoy client. The crate currently has no test coverage at the integration level.

Minor / Nits

connection.rs:257 — unwrap_or("localhost") on host extraction is a silent fallback that will produce a misleading Host header on malformed endpoints.
envoy.rs:148 — uuid::Uuid::new_v4().to_string() for envoy key generation is fine but the envoy_key field on EnvoyConfig being Option<String> means callers who forget to set it get a random UUID silently. A builder pattern or explicit required field would make this contract clearer.
.opencode/package-lock.json appears to be a stray editor config/lock file committed unintentionally.
.agent/notes/ and .agent/specs/ files are tracked per CLAUDE.md convention, which is correct.

claude · 2026-04-07T17:20:58Z

PR Review: feat: US-001 - Define SqliteKv trait / Rust envoy-client + rivetkit-native

This PR replaces the TypeScript engine/sdks/typescript/envoy-client with a new Rust implementation (engine/sdks/rust/envoy-client) and adds a new rivetkit-typescript/packages/rivetkit-native NAPI crate that bridges the Rust envoy client into Node.js. The self-contained notes in .agent/notes/ honestly document current status (actions pass, WebSocket and SQLite fail).

Bugs

B1: WebSocket outgoing message index collides with send_actor_message

In engine/sdks/rust/envoy-client/src/actor.rs, the spawned outgoing-WS task sends ToRivetWebSocketMessage with an independent idx counter starting at 0. send_actor_message also uses a per-request envoy_message_index starting at 0 via PendingRequest. Both counters advance independently producing duplicate message_index values. The comment block even allocates an Arc<AtomicU16> and then immediately does nothing with it. The .agent/notes/native-bridge-bugs.md correctly identifies this as a likely cause of the WebSocket failure. This needs a unified counter before the feature can ship.

B2: Hibernation restore silently drops all outgoing WebSocket messages

handle_hws_restore creates (hws_outgoing_tx, _hws_outgoing_rx) and passes hws_sender to the websocket() callback. The receiver is immediately dropped. Any message the actor sends during restore via that sender is silently discarded. The restore path needs to either spawn the outgoing forward task (same as the normal open path) or explicitly reject sends.

B3: shutting_down is duplicated with inconsistent types

SharedContext.shutting_down is an AtomicBool (context.rs) while EnvoyContext.shutting_down is a plain bool (envoy.rs). Only EnvoyContext.shutting_down is read in handle_shutdown; SharedContext.shutting_down is set in start_envoy_sync but never read. One of these is dead.

B4: start_envoy_js advertises async but is synchronous

In rivetkit-typescript/packages/rivetkit-native/src/lib.rs, start_envoy_js is documented as "Start the native envoy client asynchronously" but its body calls start_envoy_sync_js and returns immediately. The misleading name/doc should be corrected or the function should be removed.

B5: respond_callback silently drops responses after the first

JsEnvoyHandle::respond_callback removes the oneshot sender from response_map and sends. If JS accidentally calls respond_callback twice with the same response_id, the second call silently does nothing. The function should log a warning when no sender is found.

Dead Code / Incomplete Implementation

actor.rs: ws_msg_counter (Arc<AtomicU16>) is created and immediately unused. Remove it or connect it to the outgoing task.
handle.rs start_serverless_actor: if the envoy never starts (e.g., engine unreachable), self.started().await hangs forever. Consider adding a timeout or documenting the expectation.
bridge_actor.rs: can_hibernate always returns false but is_hibernatable is passed to the websocket() callback and silently ignored. Add debug_assert!(!is_hibernatable) to make the intent clear.

Performance

connection.rs ws_url(): calls .replace() three times, allocating on each call. A single-pass approach would be cleaner.
handle.rs kv_get: clones the full keys vec before the request just to do a linear scan for response correlation. A HashMap keyed by the key bytes would eliminate both.
kv.rs cleanup_old_kv_requests: collects expired IDs into a Vec for a two-phase delete. A retain-based approach avoids the allocation.
utils.rs BufferMap: uses cyrb53 (53-bit non-cryptographic hash) as the sole key for WebSocket connection state. A collision would route messages from one connection into another. Prefer a composite (gateway_id, request_id) key.

Security / Robustness

utils.rs cyrb53: hash collisions are possible and would cross-contaminate WebSocket connection state. Use the composite (gateway_id, request_id) tuple directly as the key.
connection.rs extract_host: uses string replace() and splits on /. If the URL ever contains a username/password, the host extraction will include user:pass@host. Use the url crate or tokio_tungstenite::tungstenite::http::Uri instead.
rivetkit-native/src/lib.rs: each call to start_envoy_sync_js creates a new tokio::runtime::Runtime. Multiple runtimes will coexist if the JS layer calls this more than once. Consider a global lazy runtime or document the single-call contract.

Code Style

Comments using em dashes violate CLAUDE.md conventions. Use periods to separate sentences instead.
bridge_actor.rs on_actor_start contains tracing::info! calls that appear to be debug instrumentation left from development. Standardize or remove.
test-envoy/src/server.rs start_serverless: if a second serverless start arrives before the first envoy is ready, the first handle is silently dropped. Consider returning a 409 if an envoy is already running.
rivetkit-typescript/packages/rivetkit-native/index.js and wrapper.js are checked in as JS in the source tree. Ensure these are generated by the build and not maintained by hand, otherwise they will drift from the Rust definitions.

Test Coverage

The PR deletes all tests from engine/sdks/rust/engine-runner/tests/ (~1,370 lines across 6 files) with no replacement. Only fixture files are added to the driver test suite. No new test files that run against the native driver are present. The PR should not merge with zero test coverage of the new Rust envoy-client code. At minimum, the test-envoy integration scenario (actions: ping, counter) should be gated in CI.

Summary

The architecture is solid and the porting approach is methodical. The two blocking issues before merge are:

WebSocket message index collision (B1) - the counter coordination is half-implemented and actively causes the WebSocket failure.
Deleted tests with no replacement - coverage gap on the most important new code.

The hibernation restore outgoing-channel issue (B2) and the BufferMap hash-collision risk are the next-highest priority items.

NathanFlurry mentioned this pull request Apr 7, 2026

feat(rust/engine-runner): minimal rust engine runner #4289

Open

11 tasks

railway-app bot had a problem deploying to rivet-frontend / rivet-pr-4584 April 7, 2026 17:08 Failure

This was referenced Apr 7, 2026

feat(engine): rust runner #4582

Open

chore: delete kv channel #4583

Open

NathanFlurry force-pushed the rivetkit-native branch from 1577bd9 to 7205731 Compare April 7, 2026 17:15

railway-app bot had a problem deploying to rivet-frontend / rivet-pr-4584 April 7, 2026 17:15 Failure

claude bot reviewed Apr 7, 2026

View reviewed changes

feat: rivetkit native

c84155e

NathanFlurry force-pushed the rivetkit-native branch from 7205731 to c84155e Compare April 7, 2026 17:42

railway-app bot had a problem deploying to rivet-frontend / rivet-pr-4584 April 7, 2026 17:42 Failure

MasterPtato mentioned this pull request Apr 8, 2026

fix: fix pool metadata protocol version refresh, rivetkit native ws #4585

Open

11 tasks

This was referenced Apr 8, 2026

chore: revert actor v2 hack #4586

Open

chore: remove udb as a dependency of envoy-client #4587

Open

chore: tunnel auth #4588

Draft

NathanFlurry marked this pull request as ready for review April 8, 2026 10:37

NathanFlurry mentioned this pull request Apr 8, 2026

chore: fix driver test suite #4591

Open

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: US-001 - Define SqliteKv trait in rivetkit-sqlite-native#4584

feat: US-001 - Define SqliteKv trait in rivetkit-sqlite-native#4584
NathanFlurry wants to merge 1 commit into02-23-feat_rust_engine-runner_feature_parity_with_typescriptfrom
rivetkit-native

NathanFlurry commented Apr 7, 2026

Uh oh!

railway-app bot commented Apr 7, 2026 •

edited

Loading

Uh oh!

NathanFlurry commented Apr 7, 2026 •

edited

Loading

Uh oh!

claude bot left a comment

Uh oh!

claude bot commented Apr 7, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

NathanFlurry commented Apr 7, 2026

Description

Type of change

How Has This Been Tested?

Checklist:

Uh oh!

railway-app bot commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NathanFlurry commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Code Review

Bugs / Correctness Issues

CLAUDE.md Convention Violations

Performance / Design Notes

Architecture / Structural Concerns

Minor / Nits

Uh oh!

claude bot commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: feat: US-001 - Define SqliteKv trait / Rust envoy-client + rivetkit-native

Bugs

Dead Code / Incomplete Implementation

Performance

Security / Robustness

Code Style

Test Coverage

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

railway-app bot commented Apr 7, 2026 •

edited

Loading

NathanFlurry commented Apr 7, 2026 •

edited

Loading

claude bot commented Apr 7, 2026 •

edited

Loading