Skip to content

fix(rivetkit-core): cap SerializeState shutdown timeout at 15s#4742

Draft
NathanFlurry wants to merge 1 commit intosleep-cleanup/01-core-remove-prevent-sleepfrom
sleep-cleanup/02-serialize-state-timeout-15s
Draft

fix(rivetkit-core): cap SerializeState shutdown timeout at 15s#4742
NathanFlurry wants to merge 1 commit intosleep-cleanup/01-core-remove-prevent-sleepfrom
sleep-cleanup/02-serialize-state-timeout-15s

Conversation

@NathanFlurry
Copy link
Copy Markdown
Member

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Copy link
Copy Markdown
Member Author

NathanFlurry commented Apr 24, 2026

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 24, 2026

Code Review: fix(rivetkit-core): cap SerializeState shutdown timeout at 15s

Overview

This is a minimal, single-line change that halves the SERIALIZE_STATE_SHUTDOWN_SANITY_CAP constant from 30s to 15s in rivetkit-rust/packages/rivetkit-core/src/actor/task.rs. This constant serves as a hard upper-bound timeout on the SerializeState reply during save_final_state() at shutdown — a safety net to prevent a hung serialization callback from blocking the entire shutdown path indefinitely.


Correctness

The new value (15s) now aligns with the default lifecycle timeouts:

  • DEFAULT_ON_DESTROY_TIMEOUT = 15s
  • DEFAULT_SLEEP_GRACE_PERIOD = 15s

This alignment is good: it prevents the sanity cap from extending shutdown beyond the configured lifecycle window. Previously, a hung SerializeState could add up to 30 additional seconds on top of the outer shutdown grace period — longer than the grace period itself.

Potential timing concern: The sanity cap and the outer shutdown grace period are now the same duration. Depending on when save_final_state() is called within the shutdown sequence, there is a possibility that both timeouts fire concurrently rather than the inner one reliably firing first. This is not a correctness bug (both paths degrade to an empty-delta save), but it is worth verifying that the outer shutdown timeout does not race with this cap in a way that skips cleanup after save_state() returns.


Code Quality

  • The change is clean, focused, and follows the existing pattern.
  • The timeout-on-error paths already log appropriately. "serializeState timed out" is logged on cap expiry, so operators will have visibility.
  • A fallback to Vec::new() (empty deltas) means state may be partially lost on cap expiry, which is acceptable as a last-resort shutdown behavior.

Suggestions

  1. PR description is incomplete. The template checkboxes are all unchecked and no context was provided about why 30s was too long or what problem this fixes (e.g., actors were hanging at shutdown for 30s). A brief note on motivation would help future readers.

  2. Consider a comment next to the constant explaining its relationship to DEFAULT_ON_DESTROY_TIMEOUT / DEFAULT_SLEEP_GRACE_PERIOD, since keeping them in sync matters:

// Must not exceed the shutdown grace period (DEFAULT_ON_DESTROY_TIMEOUT / DEFAULT_SLEEP_GRACE_PERIOD).
const SERIALIZE_STATE_SHUTDOWN_SANITY_CAP: Duration = Duration::from_secs(15);
  1. No test coverage. Given this is a timing/timeout change, it would be worth adding a test that verifies shutdown completes within the expected time window even when SerializeState hangs. The test framework already has LONG_SHUTDOWN_DRAIN_WARNING_THRESHOLD suggesting this kind of test is viable.

Summary

The change is correct and well-motivated: the 30s cap was counterproductively long for a shutdown path bounded at 15s by the lifecycle config. Minor concerns are the incomplete PR description and the lack of a comment tying the constant to the lifecycle defaults.

@NathanFlurry NathanFlurry force-pushed the sleep-cleanup/01-core-remove-prevent-sleep branch from 96f71c0 to c43b558 Compare April 24, 2026 10:32
@NathanFlurry NathanFlurry force-pushed the sleep-cleanup/02-serialize-state-timeout-15s branch from f32030d to 96f22d0 Compare April 24, 2026 10:32
@NathanFlurry NathanFlurry force-pushed the sleep-cleanup/01-core-remove-prevent-sleep branch from c43b558 to f4dbc84 Compare April 24, 2026 11:48
@NathanFlurry NathanFlurry force-pushed the sleep-cleanup/02-serialize-state-timeout-15s branch from 96f22d0 to cfc540f Compare April 24, 2026 11:48
@NathanFlurry NathanFlurry force-pushed the sleep-cleanup/01-core-remove-prevent-sleep branch from f4dbc84 to b3b1cf8 Compare April 24, 2026 12:14
@NathanFlurry NathanFlurry force-pushed the sleep-cleanup/02-serialize-state-timeout-15s branch 2 times, most recently from 0c5605c to ed50b2a Compare April 24, 2026 12:32
@NathanFlurry NathanFlurry changed the base branch from sleep-cleanup/01-core-remove-prevent-sleep to graphite-base/4742 April 24, 2026 12:59
@NathanFlurry NathanFlurry force-pushed the sleep-cleanup/02-serialize-state-timeout-15s branch from ed50b2a to 9ba012c Compare April 24, 2026 13:16
@NathanFlurry NathanFlurry changed the base branch from graphite-base/4742 to sleep-cleanup/01-core-remove-prevent-sleep April 24, 2026 13:16
@github-actions
Copy link
Copy Markdown
Contributor

Preview packages published to npm

Install with:

npm install rivetkit@pr-4742

All packages published as 0.0.0-pr.4742.801cfc2 with tag pr-4742.

Engine binary is shipped via @rivetkit/engine-cli on linux-x64-musl, linux-arm64-musl, darwin-x64, and darwin-arm64. Windows users should use the release installer or set RIVET_ENGINE_BINARY.

Docker images:

docker pull rivetdev/engine:slim-801cfc2
docker pull rivetdev/engine:full-801cfc2
Individual packages
npm install rivetkit@pr-4742
npm install @rivetkit/react@pr-4742
npm install @rivetkit/rivetkit-napi@pr-4742
npm install @rivetkit/workflow-engine@pr-4742

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant