Skip to content

fix(pegboard): stabilize actor2 validation and lifecycle#4804

Closed
NathanFlurry wants to merge 1 commit intoengine-stabilize/envoy-evict-actor-keysfrom
engine-stabilize/actor2-validation-lifecycle
Closed

fix(pegboard): stabilize actor2 validation and lifecycle#4804
NathanFlurry wants to merge 1 commit intoengine-stabilize/envoy-evict-actor-keysfrom
engine-stabilize/actor2-validation-lifecycle

Conversation

@NathanFlurry
Copy link
Copy Markdown
Member

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Copy link
Copy Markdown
Member Author

NathanFlurry commented Apr 27, 2026

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 27, 2026

PR Review: fix(pegboard): stabilize actor2 validation and lifecycle

Overview

This PR makes three categories of fixes to the actor2 workflow:

  1. Validation gating — adds an upfront Validate activity before state is written, so invalid actors fail fast and cleanly
  2. crash_policy propagation — fixes a hardcoded CrashPolicy::Sleep bug and wires the actual policy through the actor2 lifecycle
  3. Transition correctness — aligns handle_stopped allocation transitions with the initial-allocation path (Serverful → Starting, Serverless → Allocating) and resets stale sleep_ts/reschedule_ts on reallocation

Correctness

crash_policy propagation is a clear bug fix. build_actors_from_workflows was silently ignoring the stored policy and forcing Sleep — now it reads from state. Good.

Transition fix for Serverful (runtime.rs:624–642, 671–688): The new match block mirrors the pattern already used in the initial allocation path (runtime.rs:463–480). Setting state.transition before start_new_generation is also consistent with how the initial path does it. Both changes look correct.

sleep_ts / reschedule_ts reset in allocate (runtime.rs:301–302): Clearing these on reallocation is correct — stale timestamps from a previous generation would otherwise bleed into the new one.

Validation before InitStateAndUdbInput (mod.rs:121–148): Gating the entire workflow on a validation activity prevents half-initialized state from landing in UDB. This is the right architectural move.


Issues

Unused name field in ValidateInput (mod.rs:256–260): name is present in the struct but never read inside validate(). Since it's identical to the v1 setup at actor/setup.rs:12–16, it may be intended for future format/length checks. It currently produces dead-field noise and the compiler will warn. Either add the validation or remove the field until it's needed.

Code duplication: The entire validate function body (mod.rs:265–303) is a verbatim copy of actor/setup.rs:20–54, including the ValidateInput struct and MAX_INPUT_SIZE constant. These should live in a shared module (e.g., ops/actor/ or workflows/actor/shared.rs) to avoid the two implementations drifting.

Dead constants (mod.rs:19–20): SQLITE_SCHEMA_VERSION_V1 and SQLITE_SCHEMA_VERSION_V2 are declared pub const but have no usages anywhere in the codebase. If they're placeholder scaffolding for an upcoming migration, a comment explaining their purpose would help; otherwise remove them.

Magic number in key validation (mod.rs:291–296): The limit 1024 appears twice inline. Extract it as const MAX_KEY_SIZE: usize = 1024; alongside MAX_INPUT_SIZE for consistency.


Migration / backward compatibility

#[serde(default)] on State.crash_policy (mod.rs:43–44): Existing persisted actor2 workflows that pre-date this field will deserialize with CrashPolicy::Destroy (the #[default] variant). Previously, build_actors_from_workflows was returning CrashPolicy::Sleep for those actors. The behavioral change — from sleep-on-crash to destroy-on-crash for in-flight actors that don't have the field yet — should be explicitly confirmed as intentional, since it affects live actors surviving a rolling deploy.


Summary

The core fixes (crash_policy propagation, stale-timestamp reset, Serverful transition, validation gating) are all correct and well-targeted. The main things to address before merging: the duplicated validation code, the unused name field, the dead SQLITE_SCHEMA_VERSION constants, and explicit confirmation that the serde default for crash_policy is the desired migration behavior.

@NathanFlurry NathanFlurry force-pushed the engine-stabilize/envoy-evict-actor-keys branch from a0065e2 to ba8ebbf Compare April 27, 2026 07:30
@NathanFlurry NathanFlurry force-pushed the engine-stabilize/actor2-validation-lifecycle branch 2 times, most recently from 00a4dc6 to bea788e Compare April 27, 2026 08:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant