Skip to content

fix(rivetkit): fire abort signal on shutdown finalize#4957

Merged
NathanFlurry merged 1 commit intomainfrom
abort-signal/finalize-once
May 5, 2026
Merged

fix(rivetkit): fire abort signal on shutdown finalize#4957
NathanFlurry merged 1 commit intomainfrom
abort-signal/finalize-once

Conversation

@NathanFlurry
Copy link
Copy Markdown
Member

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 4, 2026

PR #4957 Review: fire abort signal on shutdown finalize

Overview

This PR changes the semantics of c.abortSignal / c.aborted: the abort signal now fires at the end of shutdown finalization, not at the start of the grace period (and not early on ctx.destroy()). The motivation is sound - the old behavior meant c.aborted was true inside onSleep/onDestroy hooks, which was confusing and unexpected for users.


Code Quality

Rust (context.rs / task.rs) - solid changes:

  • Removing early abort_signal.lock().cancel() from both mark_destroy_requested variants is clean and consistent with the new semantics.
  • The rename cancel_abort_signal_for_sleep to cancel_abort_signal_for_shutdown_finalize accurately reflects where it fires.
  • Wrapping shutdown work in an async block in finalize_shutdown is correct exception safety - ensures cancel_abort_signal_for_shutdown_finalize() runs even when save_final_state or finish_shutdown_cleanup_with_ctx returns an error. mark_destroy_completed is also correctly guarded on result.is_ok().

TypeScript (mod.ts) - clean:

  • Extracting #abortActorSignal() with an idempotency guard is good. The signal.aborted pre-check is defensive, though AbortController.abort() is already a no-op once aborted.
  • Moving all abort calls into finally blocks in both shutdown paths (graceful and force) guarantees the signal fires regardless of intermediate errors.
  • Removing the three early this.#abortController.abort() call sites (grace entry, onStop destroy path, ctx.destroy()) is a clean, consistent deletion.

Tests - correct:

  • task.rs: assert!(!ctx.actor_aborted()) after sleep grace starts and assert!(ctx.actor_aborted()) after finalization completes correctly validate the new signal timing.
  • queue.rs: Calling cancel_abort_signal_for_shutdown_finalize() directly is more precise than routing through mark_destroy_requested(), since the test only cares about queue abort behavior.

Main Concern: Tick Loop Pattern Is Broken by New Semantics

The documentation now says (lifecycle.mdx line 291):

On shutdown, the run handler is awaited during the graceful shutdown window. c.abortSignal fires after that window finishes or times out.

But the tick loop example on line 303 still uses the old pattern - a while (!c.aborted) loop that listens to c.abortSignal inside the run handler to break out:

// run handler calls c.abortSignal.addEventListener("abort", resolve, { once: true })
// inside a while (!c.aborted) loop to exit on shutdown

With the new semantics this loop will never exit cleanly during shutdown:

  1. Shutdown starts, grace period begins.
  2. #waitForRunHandler(graceMs) races the run promise against a deadline.
  3. c.aborted is false, so the loop keeps ticking every second through the entire grace period.
  4. Deadline fires, #waitForRunHandler logs "run handler did not complete in time" and moves on.
  5. finally block fires #abortActorSignal() - the addEventListener callback in the still-live run handler fires, the loop exits - but shutdown is already complete.

This turns every tick-loop actor into a run-handler leak on every sleep/destroy cycle. The same issue applies to the second example at lifecycle.mdx line 1062.

Suggested fix: Update both tick-loop examples to reflect the new intended pattern. If the signal is now only for post-finalization cleanup (e.g. releasing external resources), show that use case instead. For breaking out of a run loop during shutdown, the docs should either recommend c.keepAwake() to let the actor settle naturally before sleep, or show a separate AbortController (as queues.mdx line 428 now correctly advises).


Minor Issues

  • join_aborted_run_handle name is now misleading: in the normal happy path the run handle is not aborted before joining. Consider renaming to join_run_handle.
  • queue_abort_signal relationship: queue_abort_signal shares a CancellationToken with abort_signal (both set together in reset_abort_signal_for_start). Cancelling abort_signal at finalize will also unblock queue waits blocked on queue_abort_signal. Worth a comment on cancel_abort_signal_for_shutdown_finalize since the name implies only the actor signal.
  • lifecycle.mdx shutdown sequence (line 874): Step 4 says c.abortSignal fires "as final shutdown completes" - correct but ambiguous about ordering relative to cleanupDatabase(). Based on the finally block it fires before database cleanup; adding that precision would help.

Summary

The core behavioral change is correct and well-implemented. The exception-safety refactor in finalize_shutdown and the finally-placement in TypeScript are both good. The main gap is that the tick-loop code examples in lifecycle.mdx are now semantically broken - they still rely on c.abortSignal to exit the run loop, but the signal no longer fires until after the run handler is abandoned. Those examples need updating before this merges.

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 4, 2026

PR Review - fix(rivetkit) - fire abort signal on shutdown finalize

Overview

This PR unifies the abort signal firing path for both sleep and destroy shutdown modes. Previously, calling ctx.destroy() would synchronously fire the abort signal the moment user code called it. Now, the abort signal fires when grace starts, consistently for both sleep and destroy. The change aligns behavior with documented semantics and prevents surprising mid-action aborts.


What Changed

  • Rust (context.rs): Removes early abort_signal.cancel() from mark_destroy_requested(). Renames cancel_abort_signal_for_sleep() to cancel_actor_abort_signal() to reflect its unified purpose.
  • Rust (task.rs): Wraps the finalize critical section in an async block so record_shutdown_wait is always called (even on error) while mark_destroy_completed is still gated on success.
  • TypeScript (mod.ts): Removes the early synchronous abort in destroy(). Consolidates both sleep and destroy onto abortActorSignal() at the same point in the shutdown sequence.
  • Tests: Adds destroyAbortSignalActor fixture and a new test verifying c.aborted / c.abortSignal.aborted remain false immediately after c.destroy() is called from within an action.
  • Docs: Updates internal docs, public docs, and log messages to reflect the unified behavior.

Code Quality

Strengths:

  • The behavioral change is well-motivated. Firing the abort synchronously inside an action was a footgun - state mutations could be cut off mid-way by the signal.
  • abortActorSignal() early-returns if already aborted, which is defensive and avoids a redundant try/catch.
  • The async block pattern in finish_shutdown() is a slight improvement: record_shutdown_wait is now always called even when save_final_state or finish_shutdown_cleanup_with_ctx returns an error. Previously those early returns silently skipped the timing metric.
  • The vi.waitFor comment in the new test is properly adjacent to the call, meeting the pnpm run check:wait-for-comments requirement.

Issues and Suggestions

Minor - Warning message is less actionable after the change

Both joinRunHandleOrWarn log sites were updated from a message that told users what API to use (c.abortSignal) to a vaguer one. Consider:

run handler did not complete in time, it may have leaked - use c.abortSignal to exit long-running loops gracefully on shutdown

Minor - Queue test semantic drift

In queue.rs, the test changed from queue.mark_destroy_requested() to queue.cancel_actor_abort_signal(). Those two calls are no longer equivalent: the old call also set the destroy-requested flag. If the test was exercising that the queue correctly unblocks on a destroy request, it should also set the destroy flag, or the comment should explain why the signal-only cancel is sufficient.

Nit - abortActorSignal early return is redundant

AbortController.abort() is idempotent - calling it on an already-aborted controller is a no-op. The signal.aborted guard before calling abort() is harmless but unnecessary since the outer try/catch already handles edge cases.


Test Coverage

  • New fixture actor (destroyAbortSignalActor) and test directly cover the fixed behavior.
  • The test validates behavior persists through to actual destroy via observer.wasDestroyed.
  • No test covers the sleep path explicitly verifying signal timing relative to grace entry, though that path is unchanged by this PR.

Summary

Solid fix with clean implementation. The core change is correct and eliminates a surprising early-abort footgun. The two items worth addressing before merge are the less-actionable warning message and the queue test semantic drift. Everything else is minor.

Copy link
Copy Markdown
Member Author

NathanFlurry commented May 4, 2026

This stack of pull requests is managed by Graphite. Learn more about stacking.

@NathanFlurry NathanFlurry force-pushed the abort-signal/finalize-once branch from c2198e9 to 7764046 Compare May 5, 2026 11:38
@NathanFlurry NathanFlurry marked this pull request as ready for review May 5, 2026 13:13
Base automatically changed from startup-preload/use-startup-kv to main May 5, 2026 14:58
@NathanFlurry NathanFlurry merged commit 7764046 into main May 5, 2026
6 of 13 checks passed
@NathanFlurry NathanFlurry deleted the abort-signal/finalize-once branch May 5, 2026 14:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant