Status: Accepted
Date: 2026-02-16
The TypeScript and Python durable execution SDKs support child contexts via OperationType.CONTEXT, enabling isolated sub-workflows with independent operation counters and checkpoint logs. The Java SDK needs the same capability to support fan-out/fan-in, parallel processing branches, and hierarchical workflow composition.
var futureA = ctx.runInChildContextAsync("branch-a", String.class, child -> {
child.step("validate", Void.class, stepCtx -> validate(order));
child.wait(null, Duration.ofMinutes(5));
return child.step("charge", String.class, stepCtx -> charge(order));
});
var futureB = ctx.runInChildContextAsync("branch-b", String.class, child -> { ... });
var results = DurableFuture.allOf(futureA, futureB);A child context is a CONTEXT operation in the checkpoint log with a three-phase lifecycle:
- START (fire-and-forget) — marks the child context as in-progress
- Inner operations checkpoint with
parentIdset to the child context's operation ID - SUCCEED or FAIL (blocking) — finalizes the child context
Op ID | Parent ID | Type | Action | Payload
------|-----------|---------|---------|--------
3 | null | CONTEXT | START | —
3-1 | 3 | STEP | START | —
3-1 | 3 | STEP | SUCCEED | "result"
3 | null | CONTEXT | SUCCEED | "final result"
Inner operation IDs are prefixed with the parent context's operation ID using - as separator (e.g., "3-1", "3-2"). This matches the JavaScript SDK's stepPrefix convention and ensures global uniqueness — the backend validates type consistency by operation ID alone.
- Root context:
"1","2","3" - Child context
"1":"1-1","1-2","1-3" - Nested child context
"1-2":"1-2-1","1-2-2"
A global executionMode doesn't work for child contexts — a child may be replaying while the parent is already executing. Each DurableContext tracks its own replay state via an isReplaying field, initialized by checking ExecutionManager.hasOperationsForContext(contextId). StepContext does not track replay state because steps are retried by attempt, not replayed as independent contexts.
Child context user code runs in a separate thread (same pattern as StepOperation):
registerActiveThreadbefore the executor runs (on parent thread)setCurrentContextinside the executor threadderegisterActiveThreadin the finally blockSuspendExecutionExceptioncaught in finally (suspension already signaled)
Results < 256KB are checkpointed directly. Results ≥ 256KB trigger the ReplayChildren flow:
- SUCCEED checkpoint with empty payload +
ContextOptions { replayChildren: true } - On replay, child context re-executes; inner operations replay from cache
- No new SUCCEED checkpoint during reconstruction
| Cached status | Behavior |
|---|---|
| SUCCEEDED | Return cached result |
SUCCEEDED + replayChildren=true |
Re-execute child to reconstruct large result |
| FAILED | Re-throw cached error |
| STARTED | Re-execute (interrupted mid-flight) |
Rejected: Breaks operation ID uniqueness. A CONTEXT op with ID "1" and an inner STEP with ID "1" (different parentId) would trigger InvalidParameterValueException from the backend.
Rejected: Adds complexity to ExecutionManager for something that's naturally per-context. The TypeScript SDK uses per-entity replay state for the same reason.
Positive:
- Aligns with TypeScript and Python SDK implementations
- Enables fan-out/fan-in, parallel branches, hierarchical workflows
- Clean separation: each child context is self-contained
- Nested child contexts chain naturally via ID prefixing
Negative:
- More threads to coordinate
- Per-context replay state adds complexity vs. global mode
Deferred:
- Orphan detection in
CheckpointBatcher summaryGeneratorfor large-result observability- Higher-level
map/parallelcombinators (differentOperationSubTypevalues, sameCONTEXToperation type)