Skip to content

fix(safe_message_handler): use state field for ClusterShutdown instead of local variable#458

Merged
codemonkeycxy merged 2 commits into
mainfrom
xinyi/safe-message-handler-bug-3wvu
Apr 27, 2026
Merged

fix(safe_message_handler): use state field for ClusterShutdown instead of local variable#458
codemonkeycxy merged 2 commits into
mainfrom
xinyi/safe-message-handler-bug-3wvu

Conversation

@codemonkeycxy
Copy link
Copy Markdown
Contributor

Summary

  • ClusterManagerState.ClusterShutdown was checked in both AssignNodesToJobs and DeleteJob handlers to reject work after shutdown, but was never actually set to true
  • Only a local shouldShutdown variable was being set in the shutdown signal handler, making the guard in the update handlers dead code
  • Updates arriving during the AllHandlersFinished drain at shutdown could slip through and do real work
  • Fix: replace shouldShutdown with cm.state.ClusterShutdown as the single source of truth for both the loop break condition and the update handler guards

Test plan

  • Run go run safe_message_handler/worker/main.go and go run safe_message_handler/starter/main.go and verify clean shutdown
  • Verify update handlers correctly reject work after ShutdownCluster signal is sent

🤖 Generated with Claude Code

ClusterShutdown in ClusterManagerState was checked by update handlers to
reject work after shutdown, but was never set — only a local shouldShutdown
variable was set. This made the shutdown guard dead code, allowing updates
to slip through during the AllHandlersFinished drain. Replace shouldShutdown
with cm.state.ClusterShutdown as the single source of truth.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes shutdown guarding in the safe_message_handler example workflow by making ClusterManagerState.ClusterShutdown the single source of truth, ensuring update handlers correctly reject work after shutdown.

Changes:

  • Set cm.state.ClusterShutdown = true when the shutdown signal is received.
  • Use cm.state.ClusterShutdown (instead of a per-iteration local variable) as the run-loop break condition.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 295 to 304
selector.AddReceive(cm.shutdownCh, func(c workflow.ReceiveChannel, _ bool) {
c.Receive(ctx, nil)
shouldShutdown = true
cm.state.ClusterShutdown = true
})
selector.AddFuture(workflow.NewTimer(ctx, cm.sleepInterval), func(f workflow.Future) {
cm.performHealthCheck(ctx)
})
selector.Select(ctx)
if shouldShutdown {
if cm.state.ClusterShutdown {
break
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a regression test to ensure updates are rejected after ShutdownCluster is signaled. This change makes cm.state.ClusterShutdown the source of truth, but there’s currently no unit test asserting that an AssignNodesToJobs/DeleteJob update attempted after shutdown returns the expected error (especially while the workflow is draining via workflow.AllHandlersFinished).

Copilot uses AI. Check for mistakes.
@yuandrew
Copy link
Copy Markdown
Contributor

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

@codemonkeycxy codemonkeycxy enabled auto-merge (squash) April 27, 2026 18:06
@codemonkeycxy codemonkeycxy merged commit b5cf8ca into main Apr 27, 2026
4 checks passed
@codemonkeycxy codemonkeycxy deleted the xinyi/safe-message-handler-bug-3wvu branch April 27, 2026 18:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants