Skip to content

test(engine): stabilize runner and envoy suites#4809

Open
NathanFlurry wants to merge 2 commits intoengine-stabilize/reject-empty-actor-kv-valuesfrom
engine-stabilize/runner-envoy-tests
Open

test(engine): stabilize runner and envoy suites#4809
NathanFlurry wants to merge 2 commits intoengine-stabilize/reject-empty-actor-kv-valuesfrom
engine-stabilize/runner-envoy-tests

Conversation

@NathanFlurry
Copy link
Copy Markdown
Member

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Copy link
Copy Markdown
Member Author

NathanFlurry commented Apr 27, 2026

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 27, 2026

PR 4809 Code Review: stabilize runner and envoy suites

This is a DRAFT PR. Review provided for early feedback.

Overview

Large test-infrastructure overhaul (11,713 additions, 816 deletions, 43 files). The PR:

  1. Rewrites test_runner.rs — Native Pegboard Runner WebSocket client speaking /runners/connect directly instead of bridging through Envoy. Gives accurate coverage for the legacy runner path.
  2. Refactors test_envoy.rs — Wraps rivet-test-envoy into an in-process Envoy struct.
  3. Adds new Envoy test suites (tests/envoy/) — alarm, KV CRUD/list/drop/delete-range/misc, lifecycle, actor API, auth.
  4. Stabilizes flaky runner tests — approximately 40 tests marked ignored with detailed explanations; adds retry logic for setup helpers.
  5. Removes actor_import_export_e2e.rs.

The direction is correct: separating the two connection paths into distinct test clients ensures envoy tests exercise the envoy protocol and runner tests exercise the runner protocol.

Issues

Critical — Spin-wait in Runner::wait_ready() violates CLAUDE.md

CLAUDE.md prohibits: "Never poll a shared-state counter with loop { if ready; sleep(Nms).await; }. Pair the counter with a tokio::sync::Notify..."

The current implementation polls every 25ms using AtomicBool. Fix by adding a tokio::sync::Notify field, signalling it in handle_message when ToClientInit arrives, and awaiting notified() in wait_ready().

Medium — actors mutex held across on_stop().await

In Runner::handle_stop_actor(), the actors mutex guard remains alive while on_stop().await executes. This blocks concurrent handle_start_actor and has_actor calls. Fix by scoping the lock acquisition so the guard drops before calling the callback: take the actor out of the map inside a block that owns the guard, then call on_stop() after the block ends.

Medium — Misleading _kv_tx name in test_envoy.rs

In TestEnvoyCallbacks::actor_config(), the variable _kv_tx is moved into ActorConfig::kv_request_tx. It is not unused. The underscore prefix signals "intentionally unused" to readers and clippy. Rename to kv_tx.

Medium — Unexplained behaviour changes in actors_scheduling_errors.rs

The assertion "no_runners_available" is changed to "no_runner_config_configured" without explanation. Also, actor_crash_destroy_policy flips from asserting "no error on destroy" to asserting "Crashed error is set with a crash message". If these reflect real engine behaviour changes they should be documented in the PR description or in a comment.

Low — std::sync::Mutex in async code

Test actors use std::sync::Mutex for notify-sender patterns. CLAUDE.md requires parking_lot::Mutex for forced-sync contexts in async code. No guard crosses an .await so the risk is low, but the rule should be followed consistently.

Low — Approximately 40 ignored tests with no tracking path

Each ignore comment names the failure symptom clearly. However no linked issue tracks re-enabling them. Several failures ("times out in full engine sweep") suggest a systemic setup problem; fixing it would unblock many tests at once. Consider linking a tracking issue in the ignore reason.

Low — Deprecated polling helper used in new envoy alarm tests

wait_for_actor_wake_polling documents itself as "DEPRECATED for other tests" in its docstring, yet new envoy alarm tests (basic_alarm, alarm_in_the_past, etc.) call it. Use the event-driven wait_for_actor_wake_from_alarm where possible to reduce timing flakiness.

Positive observations

  • Excellent test coverage in the new envoy suite: alarm edge cases (overdue, clear, replace, multiple sets, multi-cycle), KV CRUD/batch/list/range/drop, auth rejection via WebSocket close frame, pagination regression tests.
  • Correct separation: the native Runner client speaks the real runner protocol (/runners/connect), catching regressions the old envoy-bridged approach missed.
  • Retry logic in setup_test_namespace and upsert_normal_runner_config (60s timeout, 100ms backoff) addresses "replica not configured yet" races without papering them over.
  • Deterministic DC ordering via dcs.sort_by_key prevents tests from depending on HashMap iteration order.
  • Regression test alarm_overdue_during_sleep_transition_fires_via_reallocation is a well-documented, targeted regression for the Decision::Sleep overdue-alarm race.
  • with_auth_admin_token on TestOpts makes auth tests self-contained.
  • The bulk_create_actors name change (all actors share the same name, differentiated by key) better matches production actor semantics.

Minor style notes

  • get_current_timestamp_ms() in actors_alarm.rs is a one-liner over rivet_util::timestamp::now(). Calling it directly removes the indirection.
  • Comments in AlarmAndSleepActor::on_start say "gen 2" but the condition checks generation == 1 (runner protocol starts generations at 1). Align comments with the actual values.

Review generated by claude[bot]

@NathanFlurry NathanFlurry force-pushed the engine-stabilize/runner-envoy-tests branch from f9362ab to c12d710 Compare April 27, 2026 07:30
@NathanFlurry NathanFlurry force-pushed the engine-stabilize/reject-empty-actor-kv-values branch 2 times, most recently from 91328b0 to ff28b88 Compare April 27, 2026 08:31
@NathanFlurry NathanFlurry force-pushed the engine-stabilize/runner-envoy-tests branch from c12d710 to a2d5d39 Compare April 27, 2026 08:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant