Skip to content

fix(envoy-client): dedupe replayed commands by index#4811

Draft
NathanFlurry wants to merge 1 commit intoengine-stabilize/envoy-client-lifecyclefrom
engine-stabilize/envoy-client-command-dedup
Draft

fix(envoy-client): dedupe replayed commands by index#4811
NathanFlurry wants to merge 1 commit intoengine-stabilize/envoy-client-lifecyclefrom
engine-stabilize/envoy-client-command-dedup

Conversation

@NathanFlurry
Copy link
Copy Markdown
Member

Stack Context

Builds on #4801 (engine-stabilize/envoy-client-lifecycle), which already added
an immediate send_command_ack after every handle_commands batch. This PR
closes the residual race where a command arrives, is processed, but the ack
frame doesn't reach the engine before the WS drops — on reconnect,
pegboard-envoy re-streams the same command from UDB and the envoy processes
it twice.

Why?

EnvoyContext::handle_commands had no dedup on the receive path. Two concrete
bugs:

  1. Replayed CommandStartActor clobbered live actors. insert_actor
    unconditionally re-inserts into ctx.actors[actor_id][generation],
    dropping the existing handle, event_history, and active_http_request_count
    on the floor. The spawned actor task was orphaned.
  2. Replayed CommandStopActor re-fired ToActor::Stop. Idempotent in
    best case, double-shutdown signal in worst case.

ActorEntry::last_command_idx already existed but was only read by
send_command_ack — never checked against incoming command indices.

What?

  • EnvoyContext gains processed_command_idx: HashMap<(String, u32), i64>.
    Persists across remove_actor so a replayed start for an
    already-stopped actor cannot resurrect it.
  • handle_commands skips any command whose checkpoint.index <= last_processed
    for the (actor_id, generation) pair, then records the new index before
    dispatching.
  • New integration test under tests/command_dedup.rs covers same-index replay,
    stale lower-index replay, monotonic processing, and per-actor / per-generation
    isolation. Lives in tests/ rather than inline #[cfg(test)] mod tests
    because a pre-existing 7-vs-8 arg mismatch in actor.rs test code blocks the
    lib-test target on this branch — the integration test target compiles
    independently.

Test plan

  • cargo check -p rivet-envoy-client
  • cargo test -p rivet-envoy-client --test command_dedup (2 passed)

Copy link
Copy Markdown
Member Author

NathanFlurry commented Apr 27, 2026

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant