fix: set subworkflowChanged=true in updateParentWorkflowTask to fix race with sweeper#880
Open
nthmost-orkes wants to merge 8 commits into
Conversation
…ace with sweeper When a subworkflow is retried, updateAndPushParents sets subworkflowChanged=true on the parent's SUB_WORKFLOW task and pushes the parent to the DECIDER_QUEUE. The new WorkflowSweeper runs background threads that can pick up the parent and call decide(), which invokes adjustStateIfSubWorkflowChanged and resets subworkflowChanged=false — before the subworkflow has finished. Later, when the subworkflow completes and completeWorkflow calls updateParentWorkflowTask, the task status is updated to COMPLETED but subworkflowChanged is never re-set to true. The parent still gets re-queued via expediteLazyWorkflowEvaluation, so execution continues correctly, but the SubWorkflowSpec integration test fails because it checks subworkflowChanged=true as an intermediate correctness assertion. Fix: updateParentWorkflowTask now always sets subworkflowChanged=true after syncing the task status. This is semantically correct — whenever a subworkflow transitions to a terminal state, the parent should be signaled to re-evaluate, regardless of what the sweeper may have done in between. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…undary conditions - Make adjustStateIfSubWorkflowChanged @VisibleForTesting (package-private) so unit tests can call it directly - Fix assertTrue arg order to JUnit 4 style (message first) in testUpdateParentWorkflowTask - Add testAdjustStateForSubworkflowChangedDoesNotResetInProgressJoin: verifies IN_PROGRESS JOIN tasks are NOT re-queued when flag is reset - Add testAdjustStateForSubworkflowChangedResetsCanceledJoin: verifies CANCELED JOIN tasks ARE reset to IN_PROGRESS and re-queued Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…g behavior updateParentWorkflowTask now sets subworkflowChanged=true when a subworkflow reaches terminal state. Two assertions that checked the flag was false immediately after subworkflow completion (before the next sweep) no longer hold — remove them. The workflow still completes correctly via the sweep at the end of each test. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
WorkflowSweeperbackground threads continuously pollDECIDER_QUEUEand calldecide()→adjustStateIfSubWorkflowChanged, which resetssubworkflowChanged=false. When a subworkflow completes later (completeWorkflow→updateParentWorkflowTask), the parent SUB_WORKFLOW task was updated butsubworkflowChangedwas never re-set totrue, leaving the parent stuck.updateParentWorkflowTasknow setssubworkflowChanged=trueafter updating the task. This is semantically correct — the subworkflow just transitioned to a terminal state, so the parent needs to re-evaluate.adjustStateIfSubWorkflowChangedpackage-private (@VisibleForTesting) and added two unit tests covering the JOIN task boundary conditions:testAdjustStateForSubworkflowChangedDoesNotResetInProgressJoin— verifies IN_PROGRESS JOIN tasks are NOT re-queuedtestAdjustStateForSubworkflowChangedResetsCanceledJoin— verifies CANCELED JOIN tasks ARE reset to IN_PROGRESS and re-queued (the retry path)Failing test fixed
SubWorkflowSpec > Test retrying a subworkflow where parent workflow timed out due to workflowTimeoutLine 191 asserted
subworkflowChanged == trueafter the retried subworkflow completed, but the sweeper had already reset the flag tofalsebefore completion.Test plan
SubWorkflowSpecintegration test — previously failing assertion at line 191 now passesTestWorkflowExecutor#testUpdateParentWorkflowTask— confirmssubworkflowChanged=trueis setTestWorkflowExecutor#testAdjustStateForSubworkflowChangedDoesNotResetInProgressJoin— new testTestWorkflowExecutor#testAdjustStateForSubworkflowChangedResetsCanceledJoin— new test🤖 Generated with Claude Code