Skip to content

Commit 2628d54

Browse files
committed
Merge branch 'feat/README' into 'main'
feat: added README See merge request mxschell/aws-durable-execution-sdk-java!15
2 parents 1bba0c3 + 29b4cb1 commit 2628d54

1 file changed

Lines changed: 163 additions & 49 deletions

File tree

README.md

Lines changed: 163 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -59,17 +59,37 @@ public class OrderProcessor extends DurableHandler<Order, OrderResult> {
5959
Steps execute your code and checkpoint the result. On replay, cached results are returned without re-execution.
6060

6161
```java
62-
// Basic step
62+
// Basic step (blocks until complete)
6363
var result = ctx.step("fetch-user", User.class, () -> userService.getUser(userId));
6464

65-
// Step with retry strategy
65+
// Step with custom configuration (retries, semantics, serialization)
6666
var result = ctx.step("call-api", Response.class,
6767
() -> externalApi.call(request),
6868
StepConfig.builder()
69-
.retryStrategy(RetryStrategies.Presets.DEFAULT)
69+
.retryStrategy(...)
70+
.semantics(...)
7071
.build());
7172
```
7273

74+
See [Step Configuration](#step-configuration) for retry strategies, delivery semantics, and per-step serialization options.
75+
76+
### stepAsync() and DurableFuture – Concurrent Operations
77+
78+
`stepAsync()` starts a step in the background and returns a `DurableFuture<T>` immediately. This enables concurrent execution patterns.
79+
80+
```java
81+
// Start multiple operations concurrently
82+
DurableFuture<User> userFuture = ctx.stepAsync("fetch-user", User.class,
83+
() -> userService.getUser(userId));
84+
DurableFuture<List<Order>> ordersFuture = ctx.stepAsync("fetch-orders",
85+
new TypeToken<List<Order>>() {}, () -> orderService.getOrders(userId));
86+
87+
// Both operations run in parallel
88+
// Block and get results when needed
89+
User user = userFuture.get();
90+
List<Order> orders = ordersFuture.get();
91+
```
92+
7393
### wait() – Suspend Without Cost
7494

7595
Waits suspend the function and resume after the specified duration. You're not charged during suspension.
@@ -82,27 +102,62 @@ ctx.wait(Duration.ofMinutes(30));
82102
ctx.wait("cooling-off-period", Duration.ofDays(7));
83103
```
84104

105+
## Step Configuration
106+
107+
Configure step behavior with `StepConfig`:
108+
109+
```java
110+
ctx.step("my-step", Result.class, () -> doWork(),
111+
StepConfig.builder()
112+
.retryStrategy(...) // How to handle failures
113+
.semantics(...) // At-least-once vs at-most-once
114+
.serDes(...) // Custom serialization
115+
.build());
116+
```
117+
85118
### Retry Strategies
86119

87120
Configure how steps handle transient failures:
88121

89122
```java
90-
// No retry (fail immediately)
123+
// No retry - fail immediately (default)
91124
StepConfig.builder().retryStrategy(RetryStrategies.Presets.NO_RETRY).build()
92125

93-
// Default exponential backoff
94-
StepConfig.builder().retryStrategy(RetryStrategies.Presets.DEFAULT).build()
95-
96-
// Custom strategy
126+
// Exponential backoff with jitter
97127
StepConfig.builder()
98-
.retryStrategy(RetryStrategies.exponentialBackoff()
99-
.maxAttempts(5)
100-
.initialDelay(Duration.ofSeconds(1))
101-
.maxDelay(Duration.ofMinutes(1))
102-
.build())
128+
.retryStrategy(RetryStrategies.exponentialBackoff(
129+
5, // max attempts
130+
Duration.ofSeconds(2), // initial delay
131+
Duration.ofSeconds(30), // max delay
132+
2.0, // backoff multiplier
133+
JitterStrategy.FULL)) // randomize delays
103134
.build()
104135
```
105136

137+
### Step Semantics
138+
139+
Control how steps behave when interrupted mid-execution:
140+
141+
| Semantic | Behavior | Use Case |
142+
|----------|----------|----------|
143+
| `AT_LEAST_ONCE_PER_RETRY` (default) | Re-executes step if interrupted before completion | Idempotent operations (database upserts, API calls with idempotency keys) |
144+
| `AT_MOST_ONCE_PER_RETRY` | Never re-executes; throws `StepInterruptedException` if interrupted | Non-idempotent operations (sending emails, charging payments) |
145+
146+
```java
147+
// Default: at-least-once (step may re-run if interrupted)
148+
var result = ctx.step("idempotent-update", Result.class,
149+
() -> database.upsert(record));
150+
151+
// At-most-once: step will not re-run if interrupted
152+
var result = ctx.step("send-email", Result.class,
153+
() -> emailService.send(notification),
154+
StepConfig.builder()
155+
.semantics(StepSemantics.AT_MOST_ONCE_PER_RETRY)
156+
.build());
157+
```
158+
159+
With `AT_MOST_ONCE_PER_RETRY`, if a step starts but the function is interrupted before the result is checkpointed, the SDK throws `StepInterruptedException` on replay instead of re-executing. Handle this to implement your own recovery logic.
160+
106161
### Generic Types
107162

108163
For complex generic types like `List<User>`, use `TypeToken`:
@@ -112,9 +167,65 @@ var users = ctx.step("fetch-users", new TypeToken<List<User>>() {},
112167
() -> userService.getAllUsers());
113168
```
114169

170+
## Configuration
171+
172+
Customize SDK behavior by overriding `createConfiguration()` in your handler:
173+
174+
```java
175+
@Override
176+
protected DurableConfig createConfiguration() {
177+
// Custom Lambda client with connection pooling
178+
var lambdaClient = LambdaClient.builder()
179+
.httpClient(ApacheHttpClient.builder()
180+
.maxConnections(50)
181+
.connectionTimeout(Duration.ofSeconds(30))
182+
.build())
183+
.build();
184+
185+
return DurableConfig.builder()
186+
.withDurableExecutionClient(new LambdaDurableFunctionsClient(lambdaClient))
187+
.withSerDes(new MyCustomSerDes()) // Custom serialization
188+
.withExecutorService(Executors.newFixedThreadPool(10)) // Custom thread pool
189+
.build();
190+
}
191+
```
192+
193+
| Option | Description | Default |
194+
|--------|-------------|---------|
195+
| `withDurableExecutionClient()` | Client for checkpoint operations | Auto-configured Lambda client |
196+
| `withSerDes()` | Serializer for step results | Jackson with default settings |
197+
| `withExecutorService()` | Thread pool for async step execution | Cached daemon thread pool |
198+
199+
## Error Handling
200+
201+
The SDK throws specific exceptions to help you handle different failure scenarios:
202+
203+
| Exception | When Thrown | How to Handle |
204+
|-----------|-------------|---------------|
205+
| `StepFailedException` | Step exhausted all retry attempts | Catch to implement fallback logic or let execution fail |
206+
| `StepInterruptedException` | `AT_MOST_ONCE` step was interrupted before completion | Implement manual recovery (check if operation completed externally) |
207+
| `NonDeterministicExecutionException` | Code changed between original execution and replay | Fix code to maintain determinism; don't change step order/names |
208+
209+
```java
210+
try {
211+
var result = ctx.step("charge-payment", Payment.class,
212+
() -> paymentService.charge(amount),
213+
StepConfig.builder()
214+
.semantics(StepSemantics.AT_MOST_ONCE_PER_RETRY)
215+
.build());
216+
} catch (StepInterruptedException e) {
217+
// Step started but we don't know if it completed
218+
// Check payment status externally before retrying
219+
var status = paymentService.checkStatus(transactionId);
220+
if (status.isPending()) {
221+
throw e; // Let it fail - manual intervention needed
222+
}
223+
}
224+
```
225+
115226
## Testing
116227

117-
The SDK includes testing utilities for fast, local development without deploying to AWS.
228+
The SDK includes testing utilities for both local development and cloud-based integration testing.
118229

119230
### Installation
120231

@@ -133,7 +244,7 @@ The SDK includes testing utilities for fast, local development without deploying
133244
@Test
134245
void testOrderProcessing() {
135246
var handler = new OrderProcessor();
136-
var runner = LocalDurableTestRunner.create(Order.class, handler::handleRequest);
247+
var runner = LocalDurableTestRunner.create(Order.class, handler);
137248

138249
var result = runner.runUntilComplete(new Order("order-123", items));
139250

@@ -147,14 +258,34 @@ void testOrderProcessing() {
147258
```java
148259
var result = runner.runUntilComplete(input);
149260

150-
// Check all operations
151-
for (var op : result.getOperations()) {
152-
System.out.println(op.getName() + ": " + op.getStatus());
153-
}
154-
155-
// Get specific operation
261+
// Verify specific step completed
156262
var paymentOp = result.getOperation("process-payment");
263+
assertNotNull(paymentOp);
157264
assertEquals(OperationStatus.SUCCEEDED, paymentOp.getStatus());
265+
266+
// Get step result
267+
var paymentResult = paymentOp.getStepResult(Payment.class);
268+
assertNotNull(paymentResult.getTransactionId());
269+
270+
// Inspect all operations
271+
List<TestOperation> succeeded = result.getSucceededOperations();
272+
List<TestOperation> failed = result.getFailedOperations();
273+
```
274+
275+
### Controlling Time in Tests
276+
277+
By default, `runUntilComplete()` skips wait durations. For testing time-dependent logic, disable this:
278+
279+
```java
280+
var runner = LocalDurableTestRunner.create(Order.class, handler)
281+
.withSkipTime(false); // Don't auto-advance time
282+
283+
var result = runner.run(input);
284+
assertEquals(ExecutionStatus.PENDING, result.getStatus()); // Blocked on wait
285+
286+
runner.advanceTime(); // Manually advance past the wait
287+
result = runner.run(input);
288+
assertEquals(ExecutionStatus.SUCCEEDED, result.getStatus());
158289
```
159290

160291
### Cloud Testing
@@ -173,41 +304,20 @@ assertEquals(ExecutionStatus.SUCCEEDED, result.getStatus());
173304

174305
## Deployment
175306

176-
### SAM Template
177-
178-
```yaml
179-
Resources:
180-
MyDurableFunction:
181-
Type: AWS::Serverless::Function
182-
Properties:
183-
PackageType: Image
184-
FunctionName: my-durable-function
185-
ImageConfig:
186-
Command: ["com.example.OrderProcessor::handleRequest"]
187-
DurableConfig:
188-
ExecutionTimeout: 3600 # Max execution time in seconds
189-
RetentionPeriodInDays: 7 # How long to keep execution history
190-
Policies:
191-
- Statement:
192-
- Effect: Allow
193-
Action:
194-
- lambda:CheckpointDurableExecutions
195-
- lambda:GetDurableExecutionState
196-
Resource: !Sub "arn:aws:lambda:${AWS::Region}:${AWS::AccountId}:function:my-durable-function"
197-
```
198-
199-
### Build and Deploy
307+
The [examples](./examples) module includes a complete SAM template and deployment instructions.
200308

201309
```bash
202-
# Build the project
310+
cd examples
203311
mvn clean package
204-
205-
# Deploy with SAM
206312
sam build
207313
sam deploy --guided
208314
```
209315

210-
See the [examples](./examples) directory for complete SAM templates and working examples.
316+
Key deployment requirements:
317+
- `DurableConfig` in SAM template with `ExecutionTimeout` and `RetentionPeriodInDays`
318+
- IAM permissions for `lambda:CheckpointDurableExecutions` and `lambda:GetDurableExecutionState`
319+
320+
See [examples/template.yaml](./examples/template.yaml) and [examples/README.md](./examples/README.md) for details.
211321

212322
## Examples
213323

@@ -217,6 +327,10 @@ See the [examples](./examples) directory for complete SAM templates and working
217327
| [WaitExample](./examples/src/main/java/com/amazonaws/lambda/durable/examples/WaitExample.java) | Using wait operations |
218328
| [RetryExample](./examples/src/main/java/com/amazonaws/lambda/durable/examples/RetryExample.java) | Configuring retry strategies |
219329
| [GenericTypesExample](./examples/src/main/java/com/amazonaws/lambda/durable/examples/GenericTypesExample.java) | Working with generic types |
330+
| [CustomConfigExample](./examples/src/main/java/com/amazonaws/lambda/durable/examples/CustomConfigExample.java) | Custom Lambda client and SerDes |
331+
| [WaitAtLeastExample](./examples/src/main/java/com/amazonaws/lambda/durable/examples/WaitAtLeastExample.java) | Concurrent stepAsync() with wait() |
332+
| [RetryInProcessExample](./examples/src/main/java/com/amazonaws/lambda/durable/examples/RetryInProcessExample.java) | In-process retry with concurrent operations |
333+
| [WaitAtLeastInProcessExample](./examples/src/main/java/com/amazonaws/lambda/durable/examples/WaitAtLeastInProcessExample.java) | Wait completes before async step (no suspension) |
220334

221335
## Requirements
222336

0 commit comments

Comments
 (0)