You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fix broker stuck in SYNCHRONIZING on DB error during rollback (#4995)
Service brokers can become permanently stuck in SYNCHRONIZING state when
a database connection failure occurs while a failed job attempts to
revert the broker state. Without intervention, the broker remains
unusable even after the database recovers.
This change implements a multi-layered error handling approach:
1. Immediate rollback: Best-effort state reversion in the job's rescue
block with graceful error handling that doesn't mask the original
failure
2. Failure recovery hook: New recover_from_failure method invoked when
jobs transition to FAILED state after retries are exhausted. This
serves as a safety net to set the broker to SYNCHRONIZATION_FAILED
when the database becomes available again
3. Conditional updates: WHERE clauses ensure only SYNCHRONIZING brokers
are affected, protecting against overwriting newer states
The failure hook infrastructure is implemented in PollableJobWrapper and
WrappingJob, allowing any job to implement recover_from_failure for
cleanup when transitioning to permanent failure.
Changes:
- Add PollableJobWrapper.failure hook that calls recover_from_failure
- Add WrappingJob.recover_from_failure delegation with respond_to? check
- Implement recover_from_failure in UpdateBrokerJob and
SynchronizeBrokerCatalogJob to set brokers to SYNCHRONIZATION_FAILED
- Add graceful error handling to rollback_broker_state
- Add comprehensive test coverage for all new behavior
0 commit comments