Skip to content

Commit 089fba5

Browse files
Shivs11claude
andauthored
Fix pinned workflow trampolining via revision-based signal suppression (#9895)
## Summary - Matching now sends the real `targetDeploymentRevisionNumber` for pinned workflows instead of hardcoded 0 - Added `revision_number` field to `LastNotifiedTargetVersion` (server-internal) and `DeclinedTargetVersionUpgrade` (public API) - Case 4 in the target version change switch now uses `targetRevisionNumber <= declined.RevisionNumber` instead of version string comparison, preventing trampolining when stale matching partitions send outdated target versions ## Problem When a pinned workflow CaNs and declines a target version upgrade, a stale matching partition can send an older target version. The old case 4 compared version strings (`declined.buildId == target.buildId`), which didn't match the stale version — causing the workflow to re-signal, CaN again, and trampoline indefinitely between stale and up-to-date partitions. ## Test plan - [x] `TestStalePartition_RevisionSuppressesTrampolining` — integration test that simulates a stale partition via `rollbackTaskQueueToVersion` and verifies: - Stale partition (revision 0) is suppressed after declining at a higher revision - Genuinely new version (v4 at higher revision) correctly fires the signal - [x] Verified test **fails** when matching + history changes are reverted (matching sends 0, old string comparison) ## API repo dependency - api: `temporalio/api@trampolining-rev-number` - api-go: `temporalio/api-go@trampolining-rev-number` 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- CURSOR_SUMMARY --> --- > [!NOTE] > **Medium Risk** > Touches pinned-workflow versioning logic across matching, history, and persistence; a bug here could change when workflows are signaled to continue-as-new or upgrade, affecting routing behavior. > > **Overview** > Prevents pinned workflows from repeatedly continue-as-new “trampolining” when task queue partitions have stale routing data by **tracking and comparing target routing-config revision numbers**. > > Matching now propagates the real `targetDeploymentRevisionNumber` for pinned workflow tasks, and history threads this through `AddWorkflowTaskStartedEvent` to persist `LastNotifiedTargetVersion.revision_number` and carry it into `DeclinedTargetVersionUpgrade` on continue-as-new. The pinned target-change decision in `workflow_task_state_machine` switches case-4 suppression from deployment version string equality to `targetRevisionNumber <= declined.RevisionNumber`, and adds an integration test (`TestStalePartition_RevisionSuppressesTrampolining`) covering stale vs fresh partition behavior. > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit 4114662. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup> <!-- /CURSOR_SUMMARY --> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 17ab93a commit 089fba5

20 files changed

Lines changed: 216 additions & 20 deletions

File tree

api/persistence/v1/executions.pb.go

Lines changed: 16 additions & 4 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

go.mod

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,7 @@ require (
6363
go.opentelemetry.io/otel/sdk v1.43.0
6464
go.opentelemetry.io/otel/sdk/metric v1.43.0
6565
go.opentelemetry.io/otel/trace v1.43.0
66-
go.temporal.io/api v1.62.10-0.20260420202918-975b82732988
66+
go.temporal.io/api v1.62.10-0.20260421204157-0617d4e3bba2
6767
go.temporal.io/auto-scaled-workers v0.0.0-20260407181057-edd947d743d2
6868
go.temporal.io/sdk v1.41.1
6969
go.uber.org/fx v1.24.0

go.sum

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -469,8 +469,8 @@ go.opentelemetry.io/proto/slim/otlp/collector/profiles/v1development v0.3.0 h1:R
469469
go.opentelemetry.io/proto/slim/otlp/collector/profiles/v1development v0.3.0/go.mod h1:I89cynRj8y+383o7tEQVg2SVA6SRgDVIouWPUVXjx0U=
470470
go.opentelemetry.io/proto/slim/otlp/profiles/v1development v0.3.0 h1:CQvJSldHRUN6Z8jsUeYv8J0lXRvygALXIzsmAeCcZE0=
471471
go.opentelemetry.io/proto/slim/otlp/profiles/v1development v0.3.0/go.mod h1:xSQ+mEfJe/GjK1LXEyVOoSI1N9JV9ZI923X5kup43W4=
472-
go.temporal.io/api v1.62.10-0.20260420202918-975b82732988 h1:ZJ1SLhzqMz62LR0nIcle5MSHJzqERBGj/tsxQfX8YTo=
473-
go.temporal.io/api v1.62.10-0.20260420202918-975b82732988/go.mod h1:iaxoP/9OXMJcQkETTECfwYq4cw/bj4nwov8b3ZLVnXM=
472+
go.temporal.io/api v1.62.10-0.20260421204157-0617d4e3bba2 h1:9s2PjMyiiRg49fmjgWDo5RIx2MuWu5K4S8a7pThNpzU=
473+
go.temporal.io/api v1.62.10-0.20260421204157-0617d4e3bba2/go.mod h1:iaxoP/9OXMJcQkETTECfwYq4cw/bj4nwov8b3ZLVnXM=
474474
go.temporal.io/auto-scaled-workers v0.0.0-20260407181057-edd947d743d2 h1:1hKeH3GyR6YD6LKMHGCZ76t6h1Sgha0hXVQBxWi3dlQ=
475475
go.temporal.io/auto-scaled-workers v0.0.0-20260407181057-edd947d743d2/go.mod h1:T8dnzVPeO+gaUTj9eDgm/lT2lZH4+JXNvrGaQGyVi50=
476476
go.temporal.io/sdk v1.41.1 h1:yOpvsHyDD1lNuwlGBv/SUodCPhjv9nDeC9lLHW/fJUA=

proto/internal/temporal/server/api/persistence/v1/executions.proto

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -333,6 +333,10 @@ message TimeSkippingInfo {
333333
// Used only within server persistence; never flows to the public API.
334334
message LastNotifiedTargetVersion {
335335
temporal.api.deployment.v1.WorkerDeploymentVersion deployment_version = 1;
336+
// Revision number of the task queue routing config at the time the
337+
// notification was sent. Carried forward to DeclinedTargetVersionUpgrade
338+
// at continue-as-new time.
339+
int64 revision_number = 2;
336340
}
337341

338342
message ExecutionStats {

service/history/api/create_workflow_util.go

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -113,6 +113,7 @@ func NewWorkflowWithSignal(
113113
nil,
114114
false,
115115
nil,
116+
0,
116117
)
117118
if err != nil {
118119
// Unable to add WorkflowTaskStarted event to history

service/history/api/recordworkflowtaskstarted/api.go

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -170,6 +170,7 @@ func Invoke(
170170
workflowLease.GetContext().UpdateRegistry(ctx),
171171
false,
172172
req.TargetDeploymentVersion,
173+
req.TaskDispatchRevisionNumber,
173174
)
174175
if err != nil {
175176
// Unable to add WorkflowTaskStarted event to history

service/history/api/respondworkflowtaskcompleted/api.go

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -603,6 +603,7 @@ func (handler *WorkflowTaskCompletedHandler) Invoke(
603603
workflowLease.GetContext().UpdateRegistry(ctx),
604604
false,
605605
nil,
606+
0,
606607
)
607608
if err != nil {
608609
return nil, err
@@ -728,6 +729,7 @@ func (handler *WorkflowTaskCompletedHandler) Invoke(
728729
workflowLease.GetContext().UpdateRegistry(ctx),
729730
false,
730731
nil,
732+
0,
731733
)
732734
if err != nil {
733735
return nil, err

service/history/api/respondworkflowtaskcompleted/api_test.go

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -728,6 +728,7 @@ func (s *WorkflowTaskCompletedHandlerSuite) createSentUpdate(tv *testvars.TestVa
728728
nil,
729729
false,
730730
nil,
731+
0,
731732
)
732733
taskToken := &tokenspb.Task{
733734
Attempt: 1,
@@ -807,6 +808,7 @@ func (s *WorkflowTaskCompletedHandlerSuite) createPausedWorkflowWithWFT(tv *test
807808
nil,
808809
false,
809810
nil,
811+
0,
810812
)
811813
_, _ = ms.AddWorkflowTaskCompletedEvent(wt, &workflowservice.RespondWorkflowTaskCompletedRequest{
812814
Identity: tv.Any().String(),
@@ -824,6 +826,7 @@ func (s *WorkflowTaskCompletedHandlerSuite) createPausedWorkflowWithWFT(tv *test
824826
nil,
825827
false,
826828
nil,
829+
0,
827830
)
828831
taskToken := &tokenspb.Task{
829832
Attempt: 1,

service/history/api/verifyfirstworkflowtaskscheduled/api_test.go

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -238,6 +238,7 @@ func (s *VerifyFirstWorkflowTaskScheduledSuite) TestVerifyFirstWorkflowTaskSched
238238
nil,
239239
false,
240240
nil,
241+
0,
241242
)
242243
wt.StartedEventID = workflowTasksStartEvent.GetEventId()
243244

service/history/history_engine_test.go

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6618,6 +6618,7 @@ func addWorkflowTaskStartedEventWithRequestID(ms historyi.MutableState, schedule
66186618
nil,
66196619
false,
66206620
nil,
6621+
0,
66216622
)
66226623

66236624
return event

0 commit comments

Comments
 (0)