Commit f8bbf7c
authored
Standalone Activity: preserve server-generated request IDs across restarts (#9724)
## What changed?
When generating a request ID server-side, set it on the request struct
before any cloning so that the mutation is re-used by all retries.
## Why?
Without this, there is a bug, although I have not attempted to repro it:
1. Request arrives at Frontend without `requestID`
2. Inside retry interceptor loop, `requestID` is generated and set on a
cloned copy
3. Request proceeds to history, starts the execution, but then some
networking condition in the cell causes `RetryableInterceptor ` not to
receive the Ack (it sees a context expiry)
4. Frontend retries, **generating a new request ID**. But meanwhile the
activity has completed. This would be rare, but technically possible.
5. The default reuse policy permits a second execution to be started.
This would be a bug: the second start should have been prevented by the
request ID. If the user's activity lacks idempotency protection it will
lead to incorrectness in the user's systems.
## How did you test it?
- [x] built
- [x] added weak new unit test(s) for the Start case.
- [x] manually tested:
```diff
commit fa2476c
Author: Dan Davison <dan.davison@temporal.io>
Date: 2 days ago
Not-for-merge functional test for request-ID stability across server retries
Use a package-level atomic to fail StartActivityExecution once after the
activity is created at history, triggering the RetryableInterceptor.
Without the fix, the retry generates a new request ID and gets
ActivityExecutionAlreadyStarted. With the fix, the retry reuses the
same request ID and the dedup succeeds.
diff --git a/chasm/lib/activity/frontend.go b/chasm/lib/activity/frontend.go
index c013b17..2a7b96b 100644
--- a/chasm/lib/activity/frontend.go
+++ b/chasm/lib/activity/frontend.go
@@ -2,6 +2,7 @@
import (
"context"
+ "sync/atomic"
"github.com/google/uuid"
apiactivitypb "go.temporal.io/api/activity/v1" //nolint:importas
@@ -35,6 +36,10 @@ type FrontendHandler interface {
var ErrStandaloneActivityDisabled = serviceerror.NewUnimplemented("Standalone activity is disabled")
+// TestStartFailOnce, when set to true, causes the next StartActivityExecution to return Unavailable
+// after the activity is created. It fires once (CAS to false).
+var TestStartFailOnce atomic.Bool
+
type frontendHandler struct {
FrontendHandler
client activitypb.ActivityServiceClient
@@ -100,6 +105,13 @@ func (h *frontendHandler) StartActivityExecution(ctx context.Context, req *workf
NamespaceId: namespaceID.String(),
FrontendRequest: modifiedReq,
})
+ if err != nil {
+ return nil, err
+ }
+
+ if TestStartFailOnce.CompareAndSwap(true, false) {
+ return nil, serviceerror.NewUnavailable("test: injected failure after successful creation")
+ }
return resp.GetFrontendResponse(), err
}
diff --git a/tests/standalone_activity_test.go b/tests/standalone_activity_test.go
index 6afc7b6..d5ded28 100644
--- a/tests/standalone_activity_test.go
+++ b/tests/standalone_activity_test.go
@@ -270,6 +270,36 @@ func (s *standaloneActivityTestSuite) TestIDConflictPolicy() {
})
}
+func (s *standaloneActivityTestSuite) TestServerGeneratedRequestIDStableAcrossRetries() {
+ t := s.T()
+ ctx, cancel := context.WithTimeout(t.Context(), 10*time.Second)
+ defer cancel()
+
+ activityID := testcore.RandomizeStr(t.Name())
+ taskQueue := testcore.RandomizeStr(t.Name())
+
+ // Make the handler fail once with a retryable error so the RetryableInterceptor retries.
+ activity.TestStartFailOnce.Store(true)
+
+ resp, err := s.FrontendClient().StartActivityExecution(ctx, &workflowservice.StartActivityExecutionRequest{
+ Namespace: s.Namespace().String(),
+ ActivityId: activityID,
+ ActivityType: s.tv.ActivityType(),
+ Identity: s.tv.WorkerIdentity(),
+ Input: defaultInput,
+ TaskQueue: &taskqueuepb.TaskQueue{
+ Name: taskQueue,
+ },
+ StartToCloseTimeout: durationpb.New(defaultStartToCloseTimeout),
+ // No RequestId — server generates one.
+ })
+ // With the fix, the retry uses the same request ID, so history recognizes it as a dedup
+ // and succeeds (with Started=false). Without the fix, the retry generates a new request ID
+ // and gets ActivityExecutionAlreadyStarted.
+ require.NoError(t, err)
+ require.NotNil(t, resp)
+}
+
func (s *standaloneActivityTestSuite) TestPollActivityTaskQueue() {
t := s.T()
ctx, cancel := context.WithTimeout(t.Context(), 10*time.Second)
```
## Potential risks
Could introduce incorrectness into Standalone Activity
<!-- CURSOR_SUMMARY -->
---
> [!NOTE]
> **Medium Risk**
> Changes request ID generation semantics for standalone activity
Start/Cancel/Terminate paths to improve deduplication across retries;
risk is moderate because it touches request mutation behavior that
affects idempotency and retry interactions.
>
> **Overview**
> Ensures standalone activity requests reuse a **single** `RequestId`
across frontend retries by generating the server-side ID *before*
cloning/mutating the request (so subsequent retry attempts see the same
ID).
>
> Removes the prior pre-mutation cloning for
`TerminateActivityExecution` and `RequestCancelActivityExecution`
request-ID population, and adds a unit test (`frontend_test.go`)
asserting `StartActivityExecution` keeps a stable `RequestId` across
multiple `validateAndPopulateStartRequest` calls.
>
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
885f60d. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->1 parent 268e312 commit f8bbf7c
2 files changed
Lines changed: 68 additions & 8 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
294 | 294 | | |
295 | 295 | | |
296 | 296 | | |
297 | | - | |
298 | | - | |
299 | 297 | | |
300 | 298 | | |
301 | 299 | | |
| |||
333 | 331 | | |
334 | 332 | | |
335 | 333 | | |
336 | | - | |
337 | | - | |
338 | 334 | | |
339 | 335 | | |
340 | 336 | | |
| |||
362 | 358 | | |
363 | 359 | | |
364 | 360 | | |
| 361 | + | |
| 362 | + | |
| 363 | + | |
365 | 364 | | |
366 | 365 | | |
367 | 366 | | |
| |||
400 | 399 | | |
401 | 400 | | |
402 | 401 | | |
403 | | - | |
404 | | - | |
405 | | - | |
406 | | - | |
407 | 402 | | |
408 | 403 | | |
409 | 404 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
0 commit comments