fix(envoy-client): clear processed_command_idx after ack send by NathanFlurry · Pull Request #4812 · rivet-dev/rivet

NathanFlurry · 2026-04-27T09:56:20Z

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

My code follows the style guidelines of this project
I have performed a self-review of my code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

NathanFlurry · 2026-04-27T09:57:19Z

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

claude · 2026-04-27T10:01:22Z

Code Review

Overview

This PR fixes a memory management issue in envoy-client by clearing processed_command_idx entries after the command ack is sent to pegboard-envoy. The two-line change to call signature + the cleanup loop together address a scenario where the dedup map grows unboundedly for long-lived processes with many actor lifecycles.

Correctness: Scope of the Clear is Right

The cleanup loop iterates last_command_checkpoints, which is built from ctx.actors — meaning only currently active actors get their entries cleared. Entries for already-stopped actors (the ones that survive remove_actor intentionally) are untouched. This is correct per the field's doc comment:

Persists across remove_actor so a replayed CommandStartActor for an already-stopped actor cannot resurrect it.

Race Condition (Acknowledged TODO)

The TODO comment is accurate and well-written. The specific scenario:

ws_send returns (bytes accepted by OS)
processed_command_idx entries cleared ← this PR
WS drops / pegboard-envoy crashes before committing the FDB clear_range
Reconnect → pegboard-envoy replays the commands
No dedup entry → CommandStartActor re-executes → live actor gets replaced

The window is narrow (bytes in flight vs. FDB commit), but it exists. A proper fix would require an explicit ack-of-ack from pegboard-envoy before clearing. Leaving this as a TODO is acceptable, but consider filing it in .agent/todo/ so it doesn't get lost.

Minor Issues

Unnecessary full-Vec clone (commands.rs:108)

last_command_checkpoints: last_command_checkpoints.clone(),

The clone is needed because last_command_checkpoints is later iterated to build the removal keys. But cloning the whole Vec<ActorCheckpoint> (each entry contains a heap-allocated String UUID) is avoidable. Prefer extracting keys before the send to keep the original ownership clear:

// Extract keys before consuming last_command_checkpoints
let keys_to_remove: Vec<(String, u32)> = last_command_checkpoints
    .iter()
    .map(|cp| (cp.actor_id.clone(), cp.generation))
    .collect();

ws_send(
    &ctx.shared,
    protocol::ToRivet::ToRivetAckCommands(protocol::ToRivetAckCommands {
        last_command_checkpoints,  // moved, no clone
    }),
)
.await;

for key in keys_to_remove {
    ctx.processed_command_idx.remove(&key);
}

This is semantically clearer (the original goes into the wire message) and avoids allocating a second copy of the index: i64 fields.

Inner clone in the removal loop (commands.rs:124)

ctx.processed_command_idx.remove(&(cp.actor_id.clone(), cp.generation));

HashMap::remove requires an owned key here because the key type is (String, u32). If this ever becomes a hot path, HashMap<(Arc<str>, u32), i64> would allow remove with a borrow. Not a blocker for this change.

What's NOT Addressed (pre-existing)

Stopped-actor entries accumulate forever. After remove_actor, entries stay in processed_command_idx indefinitely. This is intentional to block resurrection, but there's no pruning once pegboard-envoy has confirmed it deleted those commands from FDB. This PR doesn't make it worse, but the map will keep growing for processes that run many actor lifecycles.

Checklist

Signature change &EnvoyContext → &mut EnvoyContext is minimal and correct
Only active actors (in ctx.actors) are cleared; stopped-actor dedup protection preserved
Race condition is documented with an actionable TODO
PR description not filled out (template only) — please describe the motivation
Consider tracking the ack-of-ack TODO in .agent/todo/
Minor: restructure to avoid the full-Vec clone

fix(envoy-client): clear processed_command_idx after ack send

f65d4b3

NathanFlurry mentioned this pull request Apr 27, 2026

fix(pegboard): persist and replay hibernating requests #4656

Open

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(envoy-client): clear processed_command_idx after ack send#4812

fix(envoy-client): clear processed_command_idx after ack send#4812
NathanFlurry wants to merge 1 commit intoengine-stabilize/envoy-client-command-dedupfrom
04-27-fix_envoy-client_clear_processed_command_idx_after_ack_send

NathanFlurry commented Apr 27, 2026

Uh oh!

NathanFlurry commented Apr 27, 2026 •

edited

Loading

Uh oh!

claude Bot commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

NathanFlurry commented Apr 27, 2026

Description

Type of change

How Has This Been Tested?

Checklist:

Uh oh!

NathanFlurry commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude Bot commented Apr 27, 2026

Code Review

Overview

Correctness: Scope of the Clear is Right

Race Condition (Acknowledged TODO)

Minor Issues

What's NOT Addressed (pre-existing)

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

NathanFlurry commented Apr 27, 2026 •

edited

Loading