Skip to content

fix(rivetkit): isolate engine envoys and propagate startup failures#4643

Merged
NathanFlurry merged 1 commit intomainfrom
fix/isolate-engine-envoys
Apr 24, 2026
Merged

fix(rivetkit): isolate engine envoys and propagate startup failures#4643
NathanFlurry merged 1 commit intomainfrom
fix/isolate-engine-envoys

Conversation

@NathanFlurry
Copy link
Copy Markdown
Member

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Copy link
Copy Markdown
Member Author

NathanFlurry commented Apr 13, 2026

This stack of pull requests is managed by Graphite. Learn more about stacking.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 13, 2026

PR Review: fix(rivetkit): isolate engine envoys and propagate startup failures

Overview

This PR makes two focused, related changes:

  1. Propagates envoy startup failures rather than silently swallowing them across Rust and TypeScript layers.
  2. Isolates engine envoys by always passing notGlobal: true in the EngineActorDriver, preventing multiple driver instances from sharing a global singleton envoy.

Code Quality

handle.rsstarted() return type change

Good change. Returning anyhow::Result<()> instead of () is the right call. watch::Receiver::changed() returns Err(RecvError) when all senders are dropped (i.e., the envoy task died), and previously that was silently discarded via let _ = .... The error message is clear and actionable.

envoy_handle.rs — two chained map_err

Correct and complete. The outer map_err handles JoinError from spawn(...).await, and the inner one handles anyhow::Error from handle.started(). The ? between them correctly short-circuits on join failure. No issues.

actor-driver.ts — rejection handling in .then()

Solid fix. Previously, if envoy.started() rejected, it would produce an unhandled promise rejection and leave #envoyStarted unresolved. Now startup failures are forwarded to the deferred and will surface to any caller awaiting the envoy.

actor-driver.tsnotGlobal: true

The isolation intent is clear and matches the GLOBAL_ENVOY singleton avoidance logic in envoy.rs. The field already existed in the config (wrapper.d.ts marks it optional, defaulting to false), so this is a safe, additive change. Every EngineActorDriver now owns its envoy rather than sharing one globally.

Concerns

start_envoy uses .expect() instead of returning Result

pub async fn start_envoy(config: EnvoyConfig) -> EnvoyHandle {
    let handle = start_envoy_sync(config);
    handle
        .started()
        .await
        .expect("envoy failed to start before returning handle");
    handle
}

The function signature returns a bare EnvoyHandle, so a panic is the only option without changing the return type to anyhow::Result<EnvoyHandle>. For an envoy startup failure this will crash the calling task entirely. The panic message is descriptive, but if any callers outside test code use start_envoy (vs. the sync variant), those will panic rather than receive a recoverable Result. Worth a follow-up to change the return type to Result<EnvoyHandle> if this function is called from production paths.

No tests added

The checklist item for tests is unchecked. Testing startup-failure propagation requires intentionally killing the envoy, but even a minimal test verifying that two startEnvoySync calls with notGlobal: true return distinct handles (distinct envoy keys) would strengthen confidence in the isolation change.

Connection overhead from always-isolated envoys

notGlobal: true means every EngineActorDriver instance opens its own WebSocket connection to the envoy. If multiple drivers are created (e.g., in tests or multi-tenant setups), connections multiply. This appears intentional given the PR title, but worth confirming existing callers that relied on the global singleton are not paying unexpected connection costs.

Minor

  • The notGlobal field in wrapper.d.ts is typed as notGlobal?: boolean (optional). Making it required, or at least documenting the default-to-global behavior, would help callers avoid accidentally sharing the singleton.

Summary

Clean, focused fix. Error propagation is correct at every layer. The main follow-up worth tracking is making start_envoy return Result<EnvoyHandle> to avoid a potential panic on startup failure in production paths.

@NathanFlurry NathanFlurry force-pushed the fix/isolate-engine-envoys branch from 6c170d1 to 2452fc9 Compare April 24, 2026 07:33
@NathanFlurry NathanFlurry force-pushed the test/remove-wasm-sqlite-fixtures branch from 2b872b3 to 18a97e7 Compare April 24, 2026 07:33
Base automatically changed from test/remove-wasm-sqlite-fixtures to main April 24, 2026 07:38
@NathanFlurry NathanFlurry merged commit 2452fc9 into main Apr 24, 2026
10 of 26 checks passed
@NathanFlurry NathanFlurry deleted the fix/isolate-engine-envoys branch April 24, 2026 07:39
This was referenced Apr 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant