feat: US-001 - Define SqliteKv trait in rivetkit-sqlite-native#4584
feat: US-001 - Define SqliteKv trait in rivetkit-sqlite-native#4584NathanFlurry wants to merge 1 commit into02-23-feat_rust_engine-runner_feature_parity_with_typescriptfrom
Conversation
|
🚅 Deployed to the rivet-pr-4584 environment in rivet-frontend
|
1577bd9 to
7205731
Compare
There was a problem hiding this comment.
Code Review
This PR introduces a new rivet-envoy-client Rust crate (a 1:1 port of the TypeScript envoy client), a rivetkit-native N-API module bridging the Rust client to JavaScript, SQLite KV trait integration, and a large restructuring of the driver test suite. The architecture is sound and the code is generally well-structured. Below are findings organized by severity.
Bugs / Correctness Issues
handle.rs — O(n²) key matching in kv_get + unnecessary pre-clone of keys
kv_get clones the keys vec before passing ownership into the request (line 99), then uses a nested loop to match response keys back to requested keys (lines 110-122). The known-acknowledged P10 note in .agent/notes/rust-envoy-client-issues.md tracks the O(n²) complexity but the pre-clone on line 99 is an extra copy that can be avoided. Since the response keys are owned after the call, you can build a HashMap from them instead:
// keys is consumed by send_kv_request; capture the count first
let key_count = keys.len();
// ... send request ...
// In the response handler, build a lookup map from response keys to values
let mut lookup: HashMap<Vec<u8>, Vec<u8>> = resp.keys.into_iter().zip(resp.values).collect();
let result: Vec<Option<Vec<u8>>> = request_keys.into_iter().map(|k| lookup.remove(&k)).collect();This avoids both the pre-clone and the quadratic loop.
actor.rs — req.headers cloned unnecessarily (line 301)
req is owned at that point (moved into handle_req_start), so req.headers.iter().map(|(k, v)| (k.clone(), v.clone())) can be replaced with req.headers.into_iter(). The field is not used after this line.
actor.rs — double full_headers clone in handle_ws_open (lines 404-405, 428)
full_headers is cloned once into request.headers and again passed directly as headers to the websocket callback. Since request is passed before full_headers, you can reorder the callback args or move full_headers into request and reconstruct from it, but at minimum the two clones of the same map should be noted.
handle.rs — start_serverless_actor checks version equality only (lines 302-307)
The version check uses strict equality (!=), which will reject any future protocol minor-version bump. This may be intentional for security but should be called out with a comment explaining why a range check was not used.
connection.rs — expect on protocol encode (line 229)
.expect("failed to encode message");A panic on a serialization failure in a hot path (every outbound WebSocket message) is unrecoverable. This should propagate as an error or at minimum log and close the connection gracefully.
CLAUDE.md Convention Violations
anyhow! macro used instead of .context()
The project convention is to prefer .context() over anyhow!. Multiple files use anyhow::anyhow!(...) directly:
connection.rs:79—anyhow::anyhow!("failed to build ws request: {e}")handle.rs:332, 350, 351— channel-closed error pathsenvoy.rs:277, 306— shutdown pathskv.rs:55, 129— error/timeout pathsbridge_actor.rs:85, 107, 144— callback channel-closed paths
Where the source is an existing ?-propagated error, use .context("...") instead. For terminal bail! calls where there is no underlying error, anyhow::bail! (not anyhow::anyhow!) is acceptable.
Arc<Mutex<HashMap>> used in bridge_actor.rs (line 19)
pub type ResponseMap = Arc<Mutex<HashMap<String, oneshot::Sender<serde_json::Value>>>>;Per CLAUDE.md, prefer scc::HashMap or moka::Cache over Arc<Mutex<HashMap>> for concurrent containers. The ResponseMap is accessed from multiple threads (Rust async tasks and NAPI callbacks).
Arc<Mutex<_>> redundancy in context.rs (lines 15-16)
SharedContext is already behind Arc<SharedContext>. The inner Arc<Mutex<...>> wrappers on ws_tx and protocol_metadata add a second heap allocation and refcount. This was acknowledged as P8 in .agent/notes/rust-envoy-client-issues.md as "negligible," but CLAUDE.md's preference for scc structures still applies to ws_tx—a tokio::sync::watch or an scc-based pattern could replace the mutex.
Log messages with uppercase / non-code-symbol words
CLAUDE.md requires log messages to be lowercase unless mentioning specific code symbols:
bridge_actor.rs:74—"calling JS actor_start callback via TSFN"→"calling js actor_start callback via TSFN"(or"calling JS on_actor_start callback"since TSFN is a code-adjacent acronym)bridge_actor.rs:76—"TSFN call returned"→ acceptable since TSFN is a code symbolkv.rs:62 (log in handle_kv_response)—"received kv response for unknown request id"— this is correctcommands.rs:10—"received commands"— correct
Comment style violations
CLAUDE.md requires comments to be complete sentences. Some comments are fragments:
connection.rs:214—/// Send a message over the WebSocket. Returns true if the message could not be sent.— "Returns true if the message could not be sent" is inverted/confusing; should be "Returnstrueif the socket was unavailable."envoy.rs:400-401— multi-line comment// Wait for all actors to finish. The process manager (Docker,\n// k8s, etc.) provides the ultimate shutdown deadline.— this is fine, but uses an abbreviation list style that's acceptable.- Several inline comments use
// ...fragment style but are acceptable.
Performance / Design Notes
send_actor_message in actor.rs clones message_kind and msg on every call (lines 877, 880)
Every outbound tunnel message constructs msg, then immediately clones it to buffer_msg before sending (even when the WS send succeeds and buffering never happens). The buffer_msg clone could be deferred behind the if failed branch by restructuring the buffering logic, since ws_send consumes msg by value already.
kv.rs — data.clone() in send_single_kv_request (line 91)
request.data.clone() is called because request is a &mut KvRequestEntry and data must be borrowed to construct the outbound message. This requires KvRequestData: Clone. For large KV payloads this clone is on the hot retry path. Consider storing data as Option<KvRequestData> and taking it on first send to avoid the clone.
Double ws_tx lock acquisition in handle_kv_request and process_unsent_kv_requests
handle_kv_request and process_unsent_kv_requests each acquire shared.ws_tx.lock() only to check guard.is_some(), then release it before calling send_single_kv_request, which acquires the lock again. This double-acquisition is a TOCTOU window. Consider passing the lock guard down or restructuring so the send path holds the lock once.
Architecture / Structural Concerns
events.rs — dead code in handle_ack_events (lines 23-25)
if matches!(state_update.state, protocol::ActorState::ActorStateStopped(_)) {
// Mark handle as done - actor task will finish on its own
}This if block is empty and does nothing. It should either be removed or have the intended behavior implemented.
actor.rs — handle_ws_open pending request is not cleaned up on can_hibernate error
If the websocket callback returns an Err, the pending request entry inserted at line 390 is removed (line 543) but the request_to_actor map in EnvoyContext already has an entry for this (gateway_id, request_id) pair (inserted in tunnel.rs:handle_ws_open). That request_to_actor entry is never cleaned up on error in the actor task.
Draft PR with open blockers
The PR is in DRAFT state and .agent/notes/native-bridge-bugs.md explicitly documents two unresolved bugs:
- WebSocket connections time out client-side (client never receives
openevent). - SQLite
sqlite3_open_v2fails with code 14 (SQLITE_CANTOPEN).
The .agent/notes/driver-test-status.md also documents that the EngineActorDriver test suite is blocked on the engine's v2 actor workflow not processing the Running event. These are significant and confirm the draft status is appropriate—this review should be revisited once those blockers are resolved.
Test coverage
The PR removes all existing engine-runner integration tests (tests/e2e_counter_runner.rs, e2e_counter_serverless.rs, e2e_counter_serverless_upsert.rs, e2e_websocket.rs) without replacing them with equivalent tests for the new Rust envoy client. The crate currently has no test coverage at the integration level.
Minor / Nits
connection.rs:257—unwrap_or("localhost")on host extraction is a silent fallback that will produce a misleadingHostheader on malformed endpoints.envoy.rs:148—uuid::Uuid::new_v4().to_string()for envoy key generation is fine but theenvoy_keyfield onEnvoyConfigbeingOption<String>means callers who forget to set it get a random UUID silently. A builder pattern or explicit required field would make this contract clearer..opencode/package-lock.jsonappears to be a stray editor config/lock file committed unintentionally..agent/notes/and.agent/specs/files are tracked per CLAUDE.md convention, which is correct.
PR Review: feat: US-001 - Define SqliteKv trait / Rust envoy-client + rivetkit-nativeThis PR replaces the TypeScript BugsB1: WebSocket outgoing message index collides with In B2: Hibernation restore silently drops all outgoing WebSocket messages
B3:
B4: In B5:
Dead Code / Incomplete Implementation
Performance
Security / Robustness
Code Style
Test CoverageThe PR deletes all tests from SummaryThe architecture is solid and the porting approach is methodical. The two blocking issues before merge are:
The hibernation restore outgoing-channel issue (B2) and the |
7205731 to
c84155e
Compare

Description
Please include a summary of the changes and the related issue. Please also include relevant motivation and context.
Type of change
How Has This Been Tested?
Please describe the tests that you ran to verify your changes.
Checklist: