|
| 1 | +# ADR-004: Child Context Execution (`runInChildContext`) |
| 2 | + |
| 3 | +**Status:** Accepted |
| 4 | +**Date:** 2026-02-16 |
| 5 | + |
| 6 | +## Context |
| 7 | + |
| 8 | +The TypeScript and Python durable execution SDKs support child contexts via `OperationType.CONTEXT`, enabling isolated sub-workflows with independent operation counters and checkpoint logs. The Java SDK needs the same capability to support fan-out/fan-in, parallel processing branches, and hierarchical workflow composition. |
| 9 | + |
| 10 | +```java |
| 11 | +var futureA = ctx.runInChildContextAsync("branch-a", String.class, child -> { |
| 12 | + child.step("validate", Void.class, () -> validate(order)); |
| 13 | + child.wait(Duration.ofMinutes(5)); |
| 14 | + return child.step("charge", String.class, () -> charge(order)); |
| 15 | +}); |
| 16 | +var futureB = ctx.runInChildContextAsync("branch-b", String.class, child -> { ... }); |
| 17 | +var results = DurableFuture.allOf(futureA, futureB); |
| 18 | +``` |
| 19 | + |
| 20 | +## Decision |
| 21 | + |
| 22 | +### Child context as a CONTEXT operation |
| 23 | + |
| 24 | +A child context is a `CONTEXT` operation in the checkpoint log with a three-phase lifecycle: |
| 25 | + |
| 26 | +1. **START** (fire-and-forget) — marks the child context as in-progress |
| 27 | +2. Inner operations checkpoint with `parentId` set to the child context's operation ID |
| 28 | +3. **SUCCEED** or **FAIL** (blocking) — finalizes the child context |
| 29 | + |
| 30 | +``` |
| 31 | +Op ID | Parent ID | Type | Action | Payload |
| 32 | +------|-----------|---------|---------|-------- |
| 33 | +3 | null | CONTEXT | START | — |
| 34 | +3-1 | 3 | STEP | START | — |
| 35 | +3-1 | 3 | STEP | SUCCEED | "result" |
| 36 | +3 | null | CONTEXT | SUCCEED | "final result" |
| 37 | +``` |
| 38 | + |
| 39 | +### Operation ID prefixing |
| 40 | + |
| 41 | +Inner operation IDs are prefixed with the parent context's operation ID using `-` as separator (e.g., `"3-1"`, `"3-2"`). This matches the JavaScript SDK's `stepPrefix` convention and ensures global uniqueness — the backend validates type consistency by operation ID alone. |
| 42 | + |
| 43 | +- Root context: `"1"`, `"2"`, `"3"` |
| 44 | +- Child context `"1"`: `"1-1"`, `"1-2"`, `"1-3"` |
| 45 | +- Nested child context `"1-2"`: `"1-2-1"`, `"1-2-2"` |
| 46 | + |
| 47 | +### Per-context replay state |
| 48 | + |
| 49 | +A global `executionMode` doesn't work for child contexts — a child may be replaying while the parent is already executing. Each `DurableContext` tracks its own replay state via an `isReplaying` field, initialized by checking `ExecutionManager.hasOperationsForContext(contextId)`. |
| 50 | + |
| 51 | +### Thread model |
| 52 | + |
| 53 | +Child context user code runs in a separate thread (same pattern as `StepOperation`): |
| 54 | +- `registerActiveThread` before the executor runs (on parent thread) |
| 55 | +- `setCurrentContext` inside the executor thread |
| 56 | +- `deregisterActiveThread` in the finally block |
| 57 | +- `SuspendExecutionException` caught in finally (suspension already signaled) |
| 58 | + |
| 59 | +### Large result handling |
| 60 | + |
| 61 | +Results < 256KB are checkpointed directly. Results ≥ 256KB trigger the `ReplayChildren` flow: |
| 62 | +- SUCCEED checkpoint with empty payload + `ContextOptions { replayChildren: true }` |
| 63 | +- On replay, child context re-executes; inner operations replay from cache |
| 64 | +- No new SUCCEED checkpoint during reconstruction |
| 65 | + |
| 66 | +### Replay behavior |
| 67 | + |
| 68 | +| Cached status | Behavior | |
| 69 | +|---------------|----------| |
| 70 | +| SUCCEEDED | Return cached result | |
| 71 | +| SUCCEEDED + `replayChildren=true` | Re-execute child to reconstruct large result | |
| 72 | +| FAILED | Re-throw cached error | |
| 73 | +| STARTED | Re-execute (interrupted mid-flight) | |
| 74 | + |
| 75 | +## Alternatives Considered |
| 76 | + |
| 77 | +### Flatten child operations into root checkpoint log |
| 78 | +**Rejected:** Breaks operation ID uniqueness. A CONTEXT op with ID `"1"` and an inner STEP with ID `"1"` (different `parentId`) would trigger `InvalidParameterValueException` from the backend. |
| 79 | + |
| 80 | +### Global replay state with context tracking |
| 81 | +**Rejected:** Adds complexity to `ExecutionManager` for something that's naturally per-context. The TypeScript SDK uses per-entity replay state for the same reason. |
| 82 | + |
| 83 | +## Consequences |
| 84 | + |
| 85 | +**Positive:** |
| 86 | +- Aligns with TypeScript and Python SDK implementations |
| 87 | +- Enables fan-out/fan-in, parallel branches, hierarchical workflows |
| 88 | +- Clean separation: each child context is self-contained |
| 89 | +- Nested child contexts chain naturally via ID prefixing |
| 90 | + |
| 91 | +**Negative:** |
| 92 | +- More threads to coordinate |
| 93 | +- Per-context replay state adds complexity vs. global mode |
| 94 | + |
| 95 | +**Deferred:** |
| 96 | +- Orphan detection in `CheckpointBatcher` |
| 97 | +- `summaryGenerator` for large-result observability |
| 98 | +- Higher-level `map`/`parallel` combinators (different `OperationSubType` values, same `CONTEXT` operation type) |
0 commit comments