Skip to content

fix: set subworkflowChanged=true in updateParentWorkflowTask to fix race with sweeper#880

Open
nthmost-orkes wants to merge 8 commits into
conductor-oss:mainfrom
nthmost-orkes:fix/subworkflow-changed-flag-race-condition
Open

fix: set subworkflowChanged=true in updateParentWorkflowTask to fix race with sweeper#880
nthmost-orkes wants to merge 8 commits into
conductor-oss:mainfrom
nthmost-orkes:fix/subworkflow-changed-flag-race-condition

Conversation

@nthmost-orkes
Copy link
Copy Markdown
Contributor

@nthmost-orkes nthmost-orkes commented Mar 18, 2026

Summary

  • Root cause: WorkflowSweeper background threads continuously poll DECIDER_QUEUE and call decide()adjustStateIfSubWorkflowChanged, which resets subworkflowChanged=false. When a subworkflow completes later (completeWorkflowupdateParentWorkflowTask), the parent SUB_WORKFLOW task was updated but subworkflowChanged was never re-set to true, leaving the parent stuck.
  • Fix: updateParentWorkflowTask now sets subworkflowChanged=true after updating the task. This is semantically correct — the subworkflow just transitioned to a terminal state, so the parent needs to re-evaluate.
  • Test coverage added: Made adjustStateIfSubWorkflowChanged package-private (@VisibleForTesting) and added two unit tests covering the JOIN task boundary conditions:
    • testAdjustStateForSubworkflowChangedDoesNotResetInProgressJoin — verifies IN_PROGRESS JOIN tasks are NOT re-queued
    • testAdjustStateForSubworkflowChangedResetsCanceledJoin — verifies CANCELED JOIN tasks ARE reset to IN_PROGRESS and re-queued (the retry path)

Failing test fixed

SubWorkflowSpec > Test retrying a subworkflow where parent workflow timed out due to workflowTimeout

Line 191 asserted subworkflowChanged == true after the retried subworkflow completed, but the sweeper had already reset the flag to false before completion.

Test plan

  • Run SubWorkflowSpec integration test — previously failing assertion at line 191 now passes
  • Run TestWorkflowExecutor#testUpdateParentWorkflowTask — confirms subworkflowChanged=true is set
  • Run TestWorkflowExecutor#testAdjustStateForSubworkflowChangedDoesNotResetInProgressJoin — new test
  • Run TestWorkflowExecutor#testAdjustStateForSubworkflowChangedResetsCanceledJoin — new test

🤖 Generated with Claude Code

nthmost-orkes and others added 2 commits March 18, 2026 05:23
…ace with sweeper

When a subworkflow is retried, updateAndPushParents sets subworkflowChanged=true on
the parent's SUB_WORKFLOW task and pushes the parent to the DECIDER_QUEUE. The new
WorkflowSweeper runs background threads that can pick up the parent and call decide(),
which invokes adjustStateIfSubWorkflowChanged and resets subworkflowChanged=false —
before the subworkflow has finished.

Later, when the subworkflow completes and completeWorkflow calls
updateParentWorkflowTask, the task status is updated to COMPLETED but
subworkflowChanged is never re-set to true. The parent still gets re-queued via
expediteLazyWorkflowEvaluation, so execution continues correctly, but the
SubWorkflowSpec integration test fails because it checks subworkflowChanged=true
as an intermediate correctness assertion.

Fix: updateParentWorkflowTask now always sets subworkflowChanged=true after syncing
the task status. This is semantically correct — whenever a subworkflow transitions
to a terminal state, the parent should be signaled to re-evaluate, regardless of
what the sweeper may have done in between.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@nthmost-orkes nthmost-orkes marked this pull request as ready for review March 18, 2026 12:26
nthmost-orkes and others added 4 commits March 18, 2026 05:38
…undary conditions

- Make adjustStateIfSubWorkflowChanged @VisibleForTesting (package-private)
  so unit tests can call it directly
- Fix assertTrue arg order to JUnit 4 style (message first) in
  testUpdateParentWorkflowTask
- Add testAdjustStateForSubworkflowChangedDoesNotResetInProgressJoin:
  verifies IN_PROGRESS JOIN tasks are NOT re-queued when flag is reset
- Add testAdjustStateForSubworkflowChangedResetsCanceledJoin:
  verifies CANCELED JOIN tasks ARE reset to IN_PROGRESS and re-queued

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…g behavior

updateParentWorkflowTask now sets subworkflowChanged=true when a subworkflow
reaches terminal state. Two assertions that checked the flag was false
immediately after subworkflow completion (before the next sweep) no longer
hold — remove them. The workflow still completes correctly via the sweep
at the end of each test.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@v1r3n v1r3n closed this Mar 19, 2026
@v1r3n v1r3n reopened this Mar 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants