You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Note over UC: Execution suspended, returns PENDING
@@ -562,36 +563,29 @@ This approach ensures suspension happens precisely when no thread can make progr
562
563
#### Advanced Feature: In-Process Completion
563
564
In scenarios where waits or step retries would normally suspend execution, but other active threads prevent suspension, the SDK automatically switches to in-process completion by polling the backend until timing conditions are met. This allows complex concurrent workflows to complete efficiently without unnecessary Lambda re-invocations or extended waiting periods.
**Complex case - Blocking on retrying operations:**
575
-
```java
576
-
var future1 = context.stepAsync("step1", () -> failsAndRetries());
577
-
var result = context.step("step2", () -> future1.get() +"-processed");
578
-
```
566
+
### Active Thread Tracking and Operation Completion Coordination
579
567
580
-
**Without phasers:** Simple thread counting fails because step2's thread would stay registered while blocked on `future1.get()`, preventing `activeThreads.isEmpty()`from triggering suspension → Lambda stays active during step1's retry delay instead of suspending.
568
+
Each piece of user code - main function body, step body or child context body - runs in its own thread. Execution manager tracks active running threads. When a new step or child context is created, a new thread will be created and registered in execution manager. When the user code is blocked on `get()` or synchronous durable operations, the thread will be deregistered from execution manager. When there is no active running thread, the function execution will be suspended.
581
569
582
-
**What should happen instead:** step2's root thread must deregister when blocked, allow suspension during step1's retry, then coordinate re-registration when step1 completes with checkpointed results.
570
+
These user threads and the system thread use CompletableFuture to communicate the completion of operations. When a context executes a step, the communication happens as shown below
583
571
584
-
**The problem:** When step1 retries, step2's root thread must:
585
-
1. Deregister (to allow suspension during retry delay)
586
-
2. Block until step1 either completes successfully or wants to suspend for another retry
587
-
3. Re-register when step1 finishes or when resuming from suspension
588
-
4. Ensure step1's result is checkpointed before proceeding
| 2 | checkpoint START event (synchronously or asynchronously) | (not created) | call checkpoint API |
576
+
| 3 | create and register the Step thread | execute user code for the step | (idle) |
577
+
| 4 | call `get()`, deregister the context thread and wait for the CompletableFuture to complete | (continue) | (idle) |
578
+
| 5 | (blocked) | checkpoint the step result and wait for checkpoint call to complete | call checkpoint API, and handle the API response. If it is a terminal response, it will complete the Step operation CompletableFuture, register and unblock the context thread. |
579
+
| 6 | retrieve the result of the step | deregister and terminate the Step thread | (idle) |
589
580
590
-
**Additional complex scenarios:**
591
-
-**Nested blocking:** Multiple threads blocking on each other's results
592
-
-**Future operations:**`runInChildContext` with multiple child threads coordinating
593
-
-**Race conditions:** Ensuring checkpoint completion before thread lifecycle changes
581
+
If the user code completes quickly, an alternative scenario could happen as follows
594
582
595
-
These scenarios are why we chose **phasers** - a multi-party synchronization primitive that coordinates checkpoint-driven completion.
| 2 | checkpoint START event (synchronously or asynchronously) | (not created) | call checkpoint API |
587
+
| 3 | create and register the Step thread | execute user code for the step and complete quickly | (idle) |
588
+
| 5 | (do something else or just get starved) | checkpoint the step result and wait for checkpoint call to complete | call checkpoint API, and handle the API response. If it is a terminal response, it will complete the Step operation CompletableFuture. |
589
+
| 4 | call `get()`. It's not blocked because CompletableFuture is already completed | deregister and terminate the Step thread | (idle) |
590
+
| 6 | retrieve the result of the step | (ended) | (idle) |
596
591
597
-
See [ADR-002: Phaser-Based Operation Coordination](adr/002-phaser-based-coordination.md) for detailed implementation and usage patterns.
0 commit comments