chore: move rivetkit to task model by NathanFlurry · Pull Request #4680 · rivet-dev/rivet

NathanFlurry · 2026-04-20T00:08:23Z

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

My code follows the style guidelines of this project
I have performed a self-review of my code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

railway-app · 2026-04-20T00:08:31Z

🚅 Deployed to the rivet-pr-4680 environment in rivet-frontend

Service	Status	Web	Updated (UTC)
kitchen-sink	❌ Build Failed (View Logs)	Web	Apr 22, 2026 at 9:04 am
website	😴 Sleeping (View Logs)	Web	Apr 22, 2026 at 8:58 am
frontend-cloud	❌ Build Failed (View Logs)	Web	Apr 22, 2026 at 8:46 am
frontend-inspector	❌ Build Failed (View Logs)	Web	Apr 22, 2026 at 8:46 am
mcp-hub	✅ Success (View Logs)	Web	Apr 22, 2026 at 8:45 am
ladle	❌ Build Failed (View Logs)	Web	Apr 20, 2026 at 12:09 am

NathanFlurry · 2026-04-20T00:08:48Z

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

claude · 2026-04-20T00:12:20Z

PR #4680: chore: move rivetkit to task model - Code Review (Updated)

Note: This is a large draft PR (~40k additions, ~11k deletions). The diff exceeds GitHub's API size limit; this review covers key architectural changes.

Overview

This PR replaces the previous SleepController-centric lifecycle model with an event-driven ActorTask model:

Introduces ActorTask owning lifecycle channels and driving the actor through a tokio::select! main loop.
Replaces scattered atomic counters with WorkRegistry backed by a new AsyncCounter utility.
Introduces LifecycleEvent as an internal event bus for cross-subsystem notifications (state mutations, inspector attach, sleep ticks).
Moves the multi-phase shutdown state machine (ShutdownPhase enum) into ActorTask with explicit boxed Future steps.
Migrates active_http_request_count in envoy-client from AtomicUsize to AsyncCounter so SleepController can register a zero-notify directly.

Architecture - Positive

Event-driven drain - replacing the 10ms polling loop in wait_for_sleep_idle_window with Notify wakeups eliminates the latency floor on graceful shutdown.
AsyncCounter design - clean RAII abstraction with wait_zero(deadline) and fan-out register_zero_notify. Well-tested with start_paused = true.
Explicit ShutdownPhase state machine - boxed Future steps (SendingFinalize -> AwaitingFinalizeReply -> DrainingBefore -> ...) are clearer than the old nested timeout logic.
ConnHandles iterator - RAII read-lock-backed with #[must_use]; the doc warning against holding across .await is correct.
Test coverage in tests/modules/task.rs (2,932 lines) - comprehensive across state saves, sleep grace, inspector serialization, shutdown drain, hibernation persistence, and alarms.

Issues and Concerns

1. Global test hooks - parallel test isolation risk

task.rs defines three #[cfg(test)] process-global statics (SHUTDOWN_CLEANUP_HOOK, LIFECYCLE_EVENT_HOOK, SHUTDOWN_REPLY_HOOK). Two concurrent tests can race: test A installs a hook, test B overwrites and clears it on drop, test A fires and finds nothing. The test_hook_lock async mutex in task.rs mitigates this within that file, but other test modules reaching these globals do not share that lock.

Recommendation: verify all hook-using tests hold test_hook_lock() for the full guard lifetime, or annotate with #[tokio::test(flavor = "current_thread")].

2. AsyncCounter::increment uses Ordering::Relaxed while load uses Ordering::Acquire

A concurrent load(Acquire) in wait_zero could observe stale zero after a Relaxed increment. The Notify-loop preserves eventual correctness, but the asymmetry is a footgun. Consider Ordering::Release on increment, or add a comment explaining why Relaxed is safe.

3. lookup_http_request_counter registers a zero-notify on every call

zero_observers is a Vec that grows with each registration. The retain in decrement prunes dead Weak entries, but live duplicates accumulate and send redundant wakeups. Make register_zero_notify idempotent, or assert it is called at most once per counter.

4. ConnHandles read lock - no compile-time enforcement of the await constraint

The #[must_use] doc warns against holding across .await, but for conn in ctx.conns() { ... .await } is syntactically valid and Rust won't catch it. Current callsites all use .collect() safely, but this is a trap. Consider keeping ConnHandles crate-private and exposing ctx.conns_snapshot() -> Vec<ConnHandle> for external callers.

5. #[allow(dead_code)] on WorkRegistry signals incomplete wiring

websocket_callback and shutdown_counter are declared but their RegionGuard/CountGuard callers are not visible in this diff. Confirm this is intentional scaffolding or wire it before landing.

6. Dual sleep-path for wired vs. legacy actors could race

context.rs says sleep() is a user-facing bridge and ActorTask owns the shutdown drain. sleep.rs says start_sleep_timer is a compatibility timer for contexts not wired to ActorTask. If a wired actor can still trigger start_sleep_timer, both timers race. Gate the legacy path with a debug_assert! or remove it for ActorTask-wired contexts.

7. HWS restore error path - new early-exit before running state lacks test

Hibernation restore now runs before "send running state". A failure sends ActorStateStopped(Error) without ever sending ActorStateRunning. Confirm the engine handles "stop without running" gracefully, and add a test for this error path.

Minor / Style

sleep_tick, state_save_tick, inspector_serialize_state_tick are three identical async fn(Option<Instant>) bodies - extract a shared helper.
wait_for_count/wait_for_state test helpers use a 50x10ms busy loop; consider start_paused = true + advance() for deterministic timing.
CountGuard type alias is unused per the dead_code suppression - remove until needed.

Summary

The architectural direction is correct and code quality is high. Main items before landing:

Global test hook isolation - guard or document single-thread requirement.
AsyncCounter::increment ordering - document Relaxed rationale or bump to Release.
Dual sleep-path - gate or remove the legacy SleepController timer for ActorTask-owned contexts.
WorkRegistry dead_code - confirm scaffolding intent or wire before landing.
HWS restore error path test - add coverage for the new early-exit failure case.

railway-app Bot had a problem deploying to rivet-frontend / rivet-pr-4680 April 20, 2026 00:08 Failure

NathanFlurry mentioned this pull request Apr 20, 2026

feat(rivetkit): sqlite vfs v2 #4673

Draft

11 tasks

NathanFlurry mentioned this pull request Apr 20, 2026

chore: rivetkit to rust #4679

Draft

11 tasks

railway-app Bot had a problem deploying to rivet-frontend / rivet-pr-4680 April 21, 2026 00:44 Failure

NathanFlurry mentioned this pull request Apr 22, 2026

feat: import/export actors #4640

Draft

11 tasks

NathanFlurry force-pushed the 04-19-chore_move_rivetkit_to_task_model branch from 5ca4ef7 to 710f2df Compare April 22, 2026 08:44

railway-app Bot had a problem deploying to rivet-frontend / rivet-pr-4680 April 22, 2026 08:44 Failure

NathanFlurry mentioned this pull request Apr 22, 2026

chore: sqlite v1 to v2 data migration #4692

Draft

11 tasks

chore: move rivetkit to task model

5b8e173

NathanFlurry force-pushed the 04-19-chore_move_rivetkit_to_task_model branch from 710f2df to 5b8e173 Compare April 22, 2026 09:04

railway-app Bot had a problem deploying to rivet-frontend / rivet-pr-4680 April 22, 2026 09:04 Failure

This was referenced Apr 23, 2026

chore: fix remaining issues with rivetkit-core #4701

Draft

chore: rivetkit core/napi/typescript follow up review #4702

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: move rivetkit to task model#4680

chore: move rivetkit to task model#4680
NathanFlurry wants to merge 1 commit into04-16-chore_rivetkit_to_rustfrom
04-19-chore_move_rivetkit_to_task_model

NathanFlurry commented Apr 20, 2026

Uh oh!

railway-app Bot commented Apr 20, 2026 •

edited

Loading

Uh oh!

NathanFlurry commented Apr 20, 2026 •

edited

Loading

Uh oh!

claude Bot commented Apr 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

NathanFlurry commented Apr 20, 2026

Description

Type of change

How Has This Been Tested?

Checklist:

Uh oh!

railway-app Bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NathanFlurry commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude Bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR #4680: chore: move rivetkit to task model - Code Review (Updated)

Overview

Architecture - Positive

Issues and Concerns

Minor / Style

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

railway-app Bot commented Apr 20, 2026 •

edited

Loading

NathanFlurry commented Apr 20, 2026 •

edited

Loading

claude Bot commented Apr 20, 2026 •

edited

Loading