|
| 1 | +# Parallel Operations Design Plan |
| 2 | + |
| 3 | +## Overview |
| 4 | + |
| 5 | +Add parallel execution capability to the AWS Lambda Durable Execution SDK, allowing multiple branches to run concurrently within a single durable function execution. |
| 6 | + |
| 7 | +## API Design |
| 8 | + |
| 9 | +### User Interface |
| 10 | + |
| 11 | +```java |
| 12 | +try (var parallelContext = ctx.parallel(ParallelConfig.builder().build())) { |
| 13 | + DurableFuture<Boolean> task1 = parallelContext.branch("validate", Boolean.class, branchContext -> validate()); |
| 14 | + DurableFuture<String> task2 = parallelContext.branch("process", String.class, branchContext -> process()); |
| 15 | + parallelContext.join(); // Wait for completion based on config |
| 16 | + |
| 17 | + // Access results |
| 18 | + Boolean validated = task1.get(); |
| 19 | + String processed = task2.get(); |
| 20 | +} |
| 21 | +``` |
| 22 | + |
| 23 | +### Core Components |
| 24 | + |
| 25 | +#### 1. ParallelConfig |
| 26 | +Configuration object controlling parallel execution behavior: |
| 27 | + |
| 28 | +```java |
| 29 | +ParallelConfig config = ParallelConfig.builder() |
| 30 | + .maxConcurrency(5) // Max branches running simultaneously |
| 31 | + .minSuccessful(3) // Minimum successful branches required (-1 = all) |
| 32 | + .toleratedFailureCount(2) // Max failures before stopping execution |
| 33 | + .build(); |
| 34 | +``` |
| 35 | + |
| 36 | +**Configuration Rules:** |
| 37 | +- `maxConcurrency`: Controls resource usage, prevents overwhelming the system |
| 38 | +- `minSuccessful`: Enables "best effort" scenarios where not all branches need to succeed |
| 39 | +- `toleratedFailureCount`: Fail-fast behavior when too many branches fail |
| 40 | + |
| 41 | +#### 2. ParallelContext |
| 42 | +Manages the lifecycle of parallel branches: |
| 43 | + |
| 44 | +```java |
| 45 | +public class ParallelContext implements AutoCloseable { |
| 46 | + // Create branches |
| 47 | + public <T> DurableFuture<T> branch(String name, Class<T> resultType, Function<DurableContext, T> func); |
| 48 | + public <T> DurableFuture<T> branch(String name, TypeToken<T> resultType, Function<DurableContext, T> func); |
| 49 | + |
| 50 | + // Wait for completion |
| 51 | + public void join(); |
| 52 | + |
| 53 | + // AutoCloseable ensures join() is called |
| 54 | + public void close(); |
| 55 | +} |
| 56 | +``` |
| 57 | + |
| 58 | +#### 3. DurableContext Integration |
| 59 | +Add single method to existing `DurableContext`: |
| 60 | + |
| 61 | +```java |
| 62 | +public ParallelContext parallel(ParallelConfig config); |
| 63 | +``` |
| 64 | + |
| 65 | +## Implementation Strategy |
| 66 | + |
| 67 | +### 1. Leverage Existing Child Context Infrastructure |
| 68 | + |
| 69 | +Each parallel branch will be implemented as a `ChildContextOperation`: |
| 70 | +- **Isolation**: Each branch has its own checkpoint log |
| 71 | +- **Replay Safety**: Branches replay independently |
| 72 | +- **Error Handling**: Branch failures don't affect other branches directly |
| 73 | + |
| 74 | +### 2. Execution Flow |
| 75 | + |
| 76 | +1. **Branch Registration**: `branch()` calls create `ChildContextOperation` instances but don't execute immediately |
| 77 | +2. **Execution Start**: `join()` triggers execution of branches respecting `maxConcurrency` |
| 78 | +3. **Concurrency Control**: Use a queue to manage pending branches when `maxConcurrency` is reached |
| 79 | +4. **Completion Logic**: Monitor success/failure counts against configuration thresholds |
| 80 | +5. **Result Collection**: Return results via `DurableFuture` instances |
| 81 | + |
| 82 | + |
| 83 | +### 4. Error Handling Strategy |
| 84 | + |
| 85 | +**Branch-Level Failures:** |
| 86 | +- Individual branch failures are captured in their respective `DurableFuture` |
| 87 | +- Don't immediately fail the entire parallel operation |
| 88 | +- Count towards `failureCount` for threshold checking |
| 89 | + |
| 90 | +**Parallel-Level Failures:** |
| 91 | +- Exceed `toleratedFailureCount`: Stop starting new branches, wait for running ones |
| 92 | +- Insufficient `minSuccessful`: Throw `ParallelExecutionException` after all branches complete |
| 93 | +- Configuration validation errors: Fail immediately |
| 94 | + |
| 95 | +## Key Design Decisions |
| 96 | + |
| 97 | +### 1. Build on Child Contexts |
| 98 | +- **Pros**: Reuses existing isolation and checkpointing logic |
| 99 | +- **Cons**: Each branch has overhead of a separate child context |
| 100 | +- **Decision**: Acceptable trade-off for clean isolation and replay safety |
| 101 | + |
| 102 | +### 2. Eager vs Lazy Execution |
| 103 | +- **Chosen**: Lazy execution (branches start only on `join()`) |
| 104 | +- **Rationale**: Allows all branches to be registered before execution starts, enabling better concurrency planning |
| 105 | + |
| 106 | +### 3. AutoCloseable Pattern |
| 107 | +- **Purpose**: Ensures `join()` is called even if user forgets |
| 108 | +- **Behavior**: If `close()` is called before `join()`, automatically call `join()` |
| 109 | + |
| 110 | +### 4. Configuration Validation |
| 111 | +- Validate at `ParallelConfig.build()` time: |
| 112 | + - `maxConcurrency > 0` |
| 113 | + - `minSuccessful >= -1` (where -1 means "all") |
| 114 | + - `toleratedFailureCount >= 0` |
| 115 | + - `minSuccessful + toleratedFailureCount <= total branches` (validated at runtime) |
| 116 | + |
| 117 | +## Implementation Files |
| 118 | + |
| 119 | +### New Files to Create |
| 120 | +1. `ParallelConfig.java` - Configuration builder |
| 121 | +2. `ParallelContext.java` - User-facing parallel context |
| 122 | +3. `operation/ParallelOperation.java` - Core execution logic |
| 123 | +4. `exception/ParallelExecutionException.java` - Parallel-specific exceptions |
| 124 | + |
| 125 | +### Files to Modify |
| 126 | +1. `DurableContext.java` - Add `parallel()` method |
| 127 | +2. `DurableFuture.java` - Ensure compatibility with parallel results (likely no changes needed) |
| 128 | + |
| 129 | +## Testing Strategy |
| 130 | + |
| 131 | +### Unit Tests |
| 132 | +- `ParallelConfigTest` - Configuration validation |
| 133 | +- `ParallelOperationTest` - Core execution logic with mocked child contexts |
| 134 | + |
| 135 | +### Integration Tests |
| 136 | +- Success scenarios with various configurations |
| 137 | +- Failure scenarios (exceeding thresholds) |
| 138 | +- Concurrency limits |
| 139 | +- Replay behavior |
| 140 | + |
| 141 | +### Example Implementation |
| 142 | +- `ParallelExample.java` in examples module |
| 143 | +- Demonstrate common patterns and error handling |
0 commit comments