fix(rivetkit): fire abort signal on shutdown finalize by NathanFlurry · Pull Request #4957 · rivet-dev/rivet

NathanFlurry · 2026-05-04T15:39:31Z

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

My code follows the style guidelines of this project
I have performed a self-review of my code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

claude · 2026-05-04T15:55:28Z

PR #4957 Review: fire abort signal on shutdown finalize

Overview

This PR changes the semantics of c.abortSignal / c.aborted: the abort signal now fires at the end of shutdown finalization, not at the start of the grace period (and not early on ctx.destroy()). The motivation is sound - the old behavior meant c.aborted was true inside onSleep/onDestroy hooks, which was confusing and unexpected for users.

Code Quality

Rust (context.rs / task.rs) - solid changes:

Removing early abort_signal.lock().cancel() from both mark_destroy_requested variants is clean and consistent with the new semantics.
The rename cancel_abort_signal_for_sleep to cancel_abort_signal_for_shutdown_finalize accurately reflects where it fires.
Wrapping shutdown work in an async block in finalize_shutdown is correct exception safety - ensures cancel_abort_signal_for_shutdown_finalize() runs even when save_final_state or finish_shutdown_cleanup_with_ctx returns an error. mark_destroy_completed is also correctly guarded on result.is_ok().

TypeScript (mod.ts) - clean:

Extracting #abortActorSignal() with an idempotency guard is good. The signal.aborted pre-check is defensive, though AbortController.abort() is already a no-op once aborted.
Moving all abort calls into finally blocks in both shutdown paths (graceful and force) guarantees the signal fires regardless of intermediate errors.
Removing the three early this.#abortController.abort() call sites (grace entry, onStop destroy path, ctx.destroy()) is a clean, consistent deletion.

Tests - correct:

task.rs: assert!(!ctx.actor_aborted()) after sleep grace starts and assert!(ctx.actor_aborted()) after finalization completes correctly validate the new signal timing.
queue.rs: Calling cancel_abort_signal_for_shutdown_finalize() directly is more precise than routing through mark_destroy_requested(), since the test only cares about queue abort behavior.

Main Concern: Tick Loop Pattern Is Broken by New Semantics

The documentation now says (lifecycle.mdx line 291):

On shutdown, the run handler is awaited during the graceful shutdown window. c.abortSignal fires after that window finishes or times out.

But the tick loop example on line 303 still uses the old pattern - a while (!c.aborted) loop that listens to c.abortSignal inside the run handler to break out:

// run handler calls c.abortSignal.addEventListener("abort", resolve, { once: true })
// inside a while (!c.aborted) loop to exit on shutdown

With the new semantics this loop will never exit cleanly during shutdown:

Shutdown starts, grace period begins.
#waitForRunHandler(graceMs) races the run promise against a deadline.
c.aborted is false, so the loop keeps ticking every second through the entire grace period.
Deadline fires, #waitForRunHandler logs "run handler did not complete in time" and moves on.
finally block fires #abortActorSignal() - the addEventListener callback in the still-live run handler fires, the loop exits - but shutdown is already complete.

This turns every tick-loop actor into a run-handler leak on every sleep/destroy cycle. The same issue applies to the second example at lifecycle.mdx line 1062.

Suggested fix: Update both tick-loop examples to reflect the new intended pattern. If the signal is now only for post-finalization cleanup (e.g. releasing external resources), show that use case instead. For breaking out of a run loop during shutdown, the docs should either recommend c.keepAwake() to let the actor settle naturally before sleep, or show a separate AbortController (as queues.mdx line 428 now correctly advises).

Minor Issues

join_aborted_run_handle name is now misleading: in the normal happy path the run handle is not aborted before joining. Consider renaming to join_run_handle.
queue_abort_signal relationship: queue_abort_signal shares a CancellationToken with abort_signal (both set together in reset_abort_signal_for_start). Cancelling abort_signal at finalize will also unblock queue waits blocked on queue_abort_signal. Worth a comment on cancel_abort_signal_for_shutdown_finalize since the name implies only the actor signal.
lifecycle.mdx shutdown sequence (line 874): Step 4 says c.abortSignal fires "as final shutdown completes" - correct but ambiguous about ordering relative to cleanupDatabase(). Based on the finally block it fires before database cleanup; adding that precision would help.

Summary

The core behavioral change is correct and well-implemented. The exception-safety refactor in finalize_shutdown and the finally-placement in TypeScript are both good. The main gap is that the tick-loop code examples in lifecycle.mdx are now semantically broken - they still rely on c.abortSignal to exit the run loop, but the signal no longer fires until after the run handler is abandoned. Those examples need updating before this merges.

claude · 2026-05-04T16:06:05Z

PR Review - fix(rivetkit) - fire abort signal on shutdown finalize

Overview

This PR unifies the abort signal firing path for both sleep and destroy shutdown modes. Previously, calling ctx.destroy() would synchronously fire the abort signal the moment user code called it. Now, the abort signal fires when grace starts, consistently for both sleep and destroy. The change aligns behavior with documented semantics and prevents surprising mid-action aborts.

What Changed

Rust (context.rs): Removes early abort_signal.cancel() from mark_destroy_requested(). Renames cancel_abort_signal_for_sleep() to cancel_actor_abort_signal() to reflect its unified purpose.
Rust (task.rs): Wraps the finalize critical section in an async block so record_shutdown_wait is always called (even on error) while mark_destroy_completed is still gated on success.
TypeScript (mod.ts): Removes the early synchronous abort in destroy(). Consolidates both sleep and destroy onto abortActorSignal() at the same point in the shutdown sequence.
Tests: Adds destroyAbortSignalActor fixture and a new test verifying c.aborted / c.abortSignal.aborted remain false immediately after c.destroy() is called from within an action.
Docs: Updates internal docs, public docs, and log messages to reflect the unified behavior.

Code Quality

Strengths:

The behavioral change is well-motivated. Firing the abort synchronously inside an action was a footgun - state mutations could be cut off mid-way by the signal.
abortActorSignal() early-returns if already aborted, which is defensive and avoids a redundant try/catch.
The async block pattern in finish_shutdown() is a slight improvement: record_shutdown_wait is now always called even when save_final_state or finish_shutdown_cleanup_with_ctx returns an error. Previously those early returns silently skipped the timing metric.
The vi.waitFor comment in the new test is properly adjacent to the call, meeting the pnpm run check:wait-for-comments requirement.

Issues and Suggestions

Minor - Warning message is less actionable after the change

Both joinRunHandleOrWarn log sites were updated from a message that told users what API to use (c.abortSignal) to a vaguer one. Consider:

run handler did not complete in time, it may have leaked - use c.abortSignal to exit long-running loops gracefully on shutdown

Minor - Queue test semantic drift

In queue.rs, the test changed from queue.mark_destroy_requested() to queue.cancel_actor_abort_signal(). Those two calls are no longer equivalent: the old call also set the destroy-requested flag. If the test was exercising that the queue correctly unblocks on a destroy request, it should also set the destroy flag, or the comment should explain why the signal-only cancel is sufficient.

Nit - abortActorSignal early return is redundant

AbortController.abort() is idempotent - calling it on an already-aborted controller is a no-op. The signal.aborted guard before calling abort() is harmless but unnecessary since the outer try/catch already handles edge cases.

Test Coverage

New fixture actor (destroyAbortSignalActor) and test directly cover the fixed behavior.
The test validates behavior persists through to actual destroy via observer.wasDestroyed.
No test covers the sleep path explicitly verifying signal timing relative to grace entry, though that path is unchanged by this PR.

Summary

Solid fix with clean implementation. The core change is correct and eliminates a surprising early-abort footgun. The two items worth addressing before merge are the less-actionable warning message and the queue test semantic drift. Everything else is minor.

NathanFlurry · 2026-05-04T16:10:56Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

NathanFlurry force-pushed the abort-signal/finalize-once branch from 11daf92 to f677b5c Compare May 4, 2026 15:43

NathanFlurry force-pushed the startup-preload/use-startup-kv branch from 3c7167b to 19adfab Compare May 4, 2026 15:43

This was referenced May 4, 2026

fix(guard): handle bypass preflight before query parse #4944

Merged

feat(kitchen-sink): add mock agent event log copy #4943

Merged

NathanFlurry force-pushed the abort-signal/finalize-once branch from f677b5c to 878a458 Compare May 4, 2026 15:56

This was referenced May 4, 2026

fix(gateway): split actor response wait errors #4945

Merged

fix(rivetkit-core): keep sleeping actors reachable during grace #4939

Merged

NathanFlurry force-pushed the abort-signal/finalize-once branch from 878a458 to c2198e9 Compare May 4, 2026 16:10

NathanFlurry force-pushed the startup-preload/use-startup-kv branch from 19adfab to 314c8ff Compare May 4, 2026 16:10

This was referenced May 4, 2026

fix(rivetkit): use startup kv preload #4956

Merged

fix(client): flatten skipReadyWait option #4955

Merged

abcxff mentioned this pull request May 4, 2026

fix(rivetkit): call importNodeDependencies() in drizzle db() to fix migration crypto error #4963

Closed

11 tasks

This was referenced May 5, 2026

chore: update comment #4966

Merged

fix(rivetkit): fix empty state [slopfix] #4967

Closed

This was referenced May 5, 2026

fix(rivetkit): hide pending connections during preflight #4969

Merged

refactor(depot-client): split embedded depot transport #4970

Merged

docs(rivetkit): document actor key overlap #4971

Closed

fix(rivetkit): align destroy abort signal timing

7764046

NathanFlurry force-pushed the abort-signal/finalize-once branch from c2198e9 to 7764046 Compare May 5, 2026 11:38

This was referenced May 5, 2026

fix(rivetkit): fix crypto dep in drizzle #4972

Merged

test(rivetkit): cover failed preflight disconnect handling #4973

Merged

fix(rivetkit): defer initialized persist until state exists #4974

Merged

fix(engine): install rustls provider with pools #4975

Merged

NathanFlurry marked this pull request as ready for review May 5, 2026 13:13

This was referenced May 5, 2026

fix(rivetkit): decode legacy v4 actor schedule args #4976

Merged

fix(sqlite): stop actors on head fence mismatch #4977

Merged

Base automatically changed from startup-preload/use-startup-kv to main May 5, 2026 14:58

NathanFlurry merged commit 7764046 into main May 5, 2026
6 of 13 checks passed

NathanFlurry deleted the abort-signal/finalize-once branch May 5, 2026 14:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(rivetkit): fire abort signal on shutdown finalize#4957

fix(rivetkit): fire abort signal on shutdown finalize#4957
NathanFlurry merged 1 commit intomainfrom
abort-signal/finalize-once

NathanFlurry commented May 4, 2026

Uh oh!

claude Bot commented May 4, 2026 •

edited

Loading

Uh oh!

claude Bot commented May 4, 2026 •

edited

Loading

Uh oh!

NathanFlurry commented May 4, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

NathanFlurry commented May 4, 2026

Description

Type of change

How Has This Been Tested?

Checklist:

Uh oh!

claude Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR #4957 Review: fire abort signal on shutdown finalize

Overview

Code Quality

Main Concern: Tick Loop Pattern Is Broken by New Semantics

Minor Issues

Summary

Uh oh!

claude Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NathanFlurry commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude Bot commented May 4, 2026 •

edited

Loading

claude Bot commented May 4, 2026 •

edited

Loading

NathanFlurry commented May 4, 2026 •

edited

Loading