test(engine): stabilize runner and envoy suites by NathanFlurry · Pull Request #4809 · rivet-dev/rivet

NathanFlurry · 2026-04-27T07:16:05Z

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

My code follows the style guidelines of this project
I have performed a self-review of my code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

NathanFlurry · 2026-04-27T07:16:50Z

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

claude · 2026-04-27T07:21:44Z

PR 4809 Code Review: stabilize runner and envoy suites

This is a DRAFT PR. Review provided for early feedback.

Overview

Large test-infrastructure overhaul (11,713 additions, 816 deletions, 43 files). The PR:

Rewrites test_runner.rs — Native Pegboard Runner WebSocket client speaking /runners/connect directly instead of bridging through Envoy. Gives accurate coverage for the legacy runner path.
Refactors test_envoy.rs — Wraps rivet-test-envoy into an in-process Envoy struct.
Adds new Envoy test suites (tests/envoy/) — alarm, KV CRUD/list/drop/delete-range/misc, lifecycle, actor API, auth.
Stabilizes flaky runner tests — approximately 40 tests marked ignored with detailed explanations; adds retry logic for setup helpers.
Removes actor_import_export_e2e.rs.

The direction is correct: separating the two connection paths into distinct test clients ensures envoy tests exercise the envoy protocol and runner tests exercise the runner protocol.

Issues

Critical — Spin-wait in Runner::wait_ready() violates CLAUDE.md

CLAUDE.md prohibits: "Never poll a shared-state counter with loop { if ready; sleep(Nms).await; }. Pair the counter with a tokio::sync::Notify..."

The current implementation polls every 25ms using AtomicBool. Fix by adding a tokio::sync::Notify field, signalling it in handle_message when ToClientInit arrives, and awaiting notified() in wait_ready().

Medium — actors mutex held across on_stop().await

In Runner::handle_stop_actor(), the actors mutex guard remains alive while on_stop().await executes. This blocks concurrent handle_start_actor and has_actor calls. Fix by scoping the lock acquisition so the guard drops before calling the callback: take the actor out of the map inside a block that owns the guard, then call on_stop() after the block ends.

Medium — Misleading _kv_tx name in test_envoy.rs

In TestEnvoyCallbacks::actor_config(), the variable _kv_tx is moved into ActorConfig::kv_request_tx. It is not unused. The underscore prefix signals "intentionally unused" to readers and clippy. Rename to kv_tx.

Medium — Unexplained behaviour changes in actors_scheduling_errors.rs

The assertion "no_runners_available" is changed to "no_runner_config_configured" without explanation. Also, actor_crash_destroy_policy flips from asserting "no error on destroy" to asserting "Crashed error is set with a crash message". If these reflect real engine behaviour changes they should be documented in the PR description or in a comment.

Low — std::sync::Mutex in async code

Test actors use std::sync::Mutex for notify-sender patterns. CLAUDE.md requires parking_lot::Mutex for forced-sync contexts in async code. No guard crosses an .await so the risk is low, but the rule should be followed consistently.

Low — Approximately 40 ignored tests with no tracking path

Each ignore comment names the failure symptom clearly. However no linked issue tracks re-enabling them. Several failures ("times out in full engine sweep") suggest a systemic setup problem; fixing it would unblock many tests at once. Consider linking a tracking issue in the ignore reason.

Low — Deprecated polling helper used in new envoy alarm tests

wait_for_actor_wake_polling documents itself as "DEPRECATED for other tests" in its docstring, yet new envoy alarm tests (basic_alarm, alarm_in_the_past, etc.) call it. Use the event-driven wait_for_actor_wake_from_alarm where possible to reduce timing flakiness.

Positive observations

Excellent test coverage in the new envoy suite: alarm edge cases (overdue, clear, replace, multiple sets, multi-cycle), KV CRUD/batch/list/range/drop, auth rejection via WebSocket close frame, pagination regression tests.
Correct separation: the native Runner client speaks the real runner protocol (/runners/connect), catching regressions the old envoy-bridged approach missed.
Retry logic in setup_test_namespace and upsert_normal_runner_config (60s timeout, 100ms backoff) addresses "replica not configured yet" races without papering them over.
Deterministic DC ordering via dcs.sort_by_key prevents tests from depending on HashMap iteration order.
Regression test alarm_overdue_during_sleep_transition_fires_via_reallocation is a well-documented, targeted regression for the Decision::Sleep overdue-alarm race.
with_auth_admin_token on TestOpts makes auth tests self-contained.
The bulk_create_actors name change (all actors share the same name, differentiated by key) better matches production actor semantics.

Minor style notes

get_current_timestamp_ms() in actors_alarm.rs is a one-liner over rivet_util::timestamp::now(). Calling it directly removes the indirection.
Comments in AlarmAndSleepActor::on_start say "gen 2" but the condition checks generation == 1 (runner protocol starts generations at 1). Align comments with the actual values.

Review generated by claude[bot]

NathanFlurry force-pushed the engine-stabilize/runner-envoy-tests branch from f9362ab to c12d710 Compare April 27, 2026 07:30

NathanFlurry force-pushed the engine-stabilize/reject-empty-actor-kv-values branch 2 times, most recently from 91328b0 to ff28b88 Compare April 27, 2026 08:31

NathanFlurry force-pushed the engine-stabilize/runner-envoy-tests branch from c12d710 to a2d5d39 Compare April 27, 2026 08:31

NathanFlurry mentioned this pull request Apr 27, 2026

fix(envoy-client): dedupe replayed commands by index #4811

Draft

2 tasks

NathanFlurry added 2 commits April 27, 2026 02:35

test(engine): stabilize runner and envoy suites

fce80f3

test(engine): ignore non-sleep crash policy envoy tests

7f0614e

NathanFlurry force-pushed the engine-stabilize/reject-empty-actor-kv-values branch from ff28b88 to a35b876 Compare April 27, 2026 09:56

NathanFlurry force-pushed the engine-stabilize/runner-envoy-tests branch from a2d5d39 to 7f0614e Compare April 27, 2026 09:56

This was referenced Apr 27, 2026

fix(sqlite-storage): allow invalidate_v1_migration on non-migrating META #4813

Open

refactor(sqlite-storage): rename SqliteOrigin variants for clarity #4814

Open

This was referenced Apr 27, 2026

refactor(sqlite-storage): extract DBHead to vbare-versioned protocol crate #4815

Open

test(engine): cover envoy Sleep crash policy wake and recovery paths #4816

Draft

NathanFlurry marked this pull request as ready for review April 27, 2026 10:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(engine): stabilize runner and envoy suites#4809

test(engine): stabilize runner and envoy suites#4809
NathanFlurry wants to merge 2 commits intoengine-stabilize/reject-empty-actor-kv-valuesfrom
engine-stabilize/runner-envoy-tests

NathanFlurry commented Apr 27, 2026

Uh oh!

NathanFlurry commented Apr 27, 2026 •

edited

Loading

Uh oh!

claude Bot commented Apr 27, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

NathanFlurry commented Apr 27, 2026

Description

Type of change

How Has This Been Tested?

Checklist:

Uh oh!

NathanFlurry commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude Bot commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

NathanFlurry commented Apr 27, 2026 •

edited

Loading

claude Bot commented Apr 27, 2026 •

edited

Loading