|
| 1 | +# AGENTS.md |
| 2 | + |
| 3 | +AI coding agent instructions for the AWS Lambda Durable Execution Java SDK. |
| 4 | + |
| 5 | +## Project Overview |
| 6 | + |
| 7 | +**Java SDK for AWS Lambda Durable Functions** - enables building resilient, multi-step workflows that can run for up to one year with automatic state management and failure recovery. |
| 8 | + |
| 9 | +### Key Concepts |
| 10 | + |
| 11 | +- **Checkpoint-and-replay**: Operations create checkpoints; on interruption, replay skips completed work |
| 12 | +- **Durable operations**: `step()` executes with retry, `wait()` suspends without compute charges |
| 13 | +- **Use cases**: Order processing, human approvals, AI agent workflows, distributed transactions |
| 14 | + |
| 15 | +This implements the Java version of AWS's durable execution SDK (official SDKs exist for JavaScript/TypeScript and Python). |
| 16 | + |
| 17 | +## Build & Test Commands |
| 18 | + |
| 19 | +```bash |
| 20 | +# Build all modules |
| 21 | +mvn clean install |
| 22 | + |
| 23 | +# Run unit tests only |
| 24 | +mvn test |
| 25 | + |
| 26 | +# Run specific test class |
| 27 | +mvn test -Dtest=DurableContextTest |
| 28 | + |
| 29 | +# Skip tests |
| 30 | +mvn install -DskipTests |
| 31 | +``` |
| 32 | + |
| 33 | +## Key Directories |
| 34 | + |
| 35 | +``` |
| 36 | +sdk/ # Core SDK module |
| 37 | +├── src/main/java/com/amazonaws/lambda/durable/ |
| 38 | +│ ├── DurableHandler.java # Lambda entry point (extend this) |
| 39 | +│ ├── DurableContext.java # User-facing API (step, wait) |
| 40 | +│ ├── DurableExecutor.java # Execution lifecycle |
| 41 | +│ ├── execution/ # Thread coordination, checkpointing |
| 42 | +│ ├── operation/ # StepOperation, WaitOperation |
| 43 | +│ ├── model/ # Data structures |
| 44 | +│ ├── serde/ # JSON serialization |
| 45 | +│ ├── client/ # AWS API integration |
| 46 | +│ └── exception/ # Domain exceptions |
| 47 | +
|
| 48 | +sdk-testing/ # Test utilities (LocalDurableTestRunner, etc.) |
| 49 | +examples/ # Customer-facing examples with local and cloud tests |
| 50 | +sdk-integration-tests/ # Integration tests for the sdk |
| 51 | +``` |
| 52 | + |
| 53 | +## Coding Guidelines |
| 54 | + |
| 55 | +### Java Style (MUST follow) |
| 56 | + |
| 57 | +```java |
| 58 | +// USE var when type is obvious |
| 59 | +var ctx = new DurableContext(); |
| 60 | +var operations = new HashMap<Integer, Operation>(); |
| 61 | + |
| 62 | +// USE static imports for common utilities and factory methods |
| 63 | +import static org.junit.jupiter.api.Assertions.*; // Tests |
| 64 | +import static java.util.Collections.emptyList; // Factory methods |
| 65 | +import static com.amazonaws.lambda.durable.model.Status.*; // Enums |
| 66 | + |
| 67 | +// AVOID fully qualified names in code |
| 68 | +// Bad: com.amazonaws.lambda.durable.model.Status.SUCCESS |
| 69 | +// Good: import static and use SUCCESS directly |
| 70 | + |
| 71 | +// USE constructor injection |
| 72 | +public DurableExecutor(DurableExecutionClient client, SerDes serDes) { |
| 73 | + this.client = client; |
| 74 | + this.serDes = serDes; |
| 75 | +} |
| 76 | +``` |
| 77 | + |
| 78 | +### Architecture Rules |
| 79 | + |
| 80 | +- **No unnecessary interfaces** - Use concrete classes when only one implementation exists |
| 81 | +- **Constructor injection** - All dependencies via constructor, no field injection |
| 82 | +- **Defensive copies** - Copy mutable collections in constructors |
| 83 | +- **Single responsibility** - One class, one job |
| 84 | +- **Methods ≤30 lines** - Extract if longer |
| 85 | + |
| 86 | +### Package Naming |
| 87 | + |
| 88 | +Prefer descriptive domain names: `model`, `execution`, `operation`, `serde`, `exception` |
| 89 | + |
| 90 | +## Do Not |
| 91 | + |
| 92 | +- Add new dependencies without explicit approval |
| 93 | +- Create interfaces for single implementations |
| 94 | +- Write tests for POJO getters/setters |
| 95 | +- Expose mutable state via getters |
| 96 | +- Change public API signatures without instruction |
| 97 | +- Swallow exceptions silently |
| 98 | +- Use field injection |
| 99 | + |
| 100 | +## Testing Approach |
| 101 | + |
| 102 | +### Test Organization |
| 103 | + |
| 104 | +``` |
| 105 | +sdk/src/test/ # Unit tests for SDK internals |
| 106 | +├── DurableContextTest # Test DurableContext behavior |
| 107 | +├── DurableExecutorTest # Test execution lifecycle |
| 108 | +├── serde/JacksonSerDesTest # Test serialization |
| 109 | +└── retry/RetryStrategiesTest # Test retry logic |
| 110 | +
|
| 111 | +sdk-integration-tests/src/test/ # Integration tests (SDK + mock AWS) |
| 112 | +├── IntegrationTest # End-to-end with LocalDurableTestRunner |
| 113 | +├── RetryIntegrationTest # Retry behavior across operations |
| 114 | +└── StepSemanticsIntegrationTest # Step execution semantics |
| 115 | +
|
| 116 | +examples/src/test/ # Customer-facing examples + cloud tests |
| 117 | +├── SimpleStepExampleTest # Local test with LocalDurableTestRunner |
| 118 | +├── WaitExampleTest # Local test for wait operations |
| 119 | +└── CloudBasedIntegrationTest # Cloud tests with CloudDurableTestRunner |
| 120 | +``` |
| 121 | + |
| 122 | +### Testing Strategy |
| 123 | + |
| 124 | +**Unit Tests (sdk/src/test/)** |
| 125 | +- Test individual classes in isolation |
| 126 | +- Mock dependencies |
| 127 | +- Fast, no external dependencies |
| 128 | +- Run on every build |
| 129 | + |
| 130 | +```java |
| 131 | +@Test |
| 132 | +void stepReturnsResultOnReplay() { |
| 133 | + var context = createTestContext(completedOperations); |
| 134 | + var result = context.step("test", String.class, () -> "new"); |
| 135 | + assertEquals("cached", result); // Returns cached, doesn't re-execute |
| 136 | +} |
| 137 | +``` |
| 138 | + |
| 139 | +**Integration Tests (sdk-integration-tests/)** |
| 140 | +- Test SDK components working together |
| 141 | +- Use `LocalDurableTestRunner` (in-memory, no AWS) |
| 142 | +- Test replay, checkpointing, error handling |
| 143 | +- Run on every build |
| 144 | + |
| 145 | +```java |
| 146 | +@Test |
| 147 | +void testRetryBehavior() { |
| 148 | + var runner = LocalDurableTestRunner.create(Input.class, handler::handleRequest); |
| 149 | + var result = runner.run(new Input("test")); |
| 150 | + assertEquals(ExecutionStatus.SUCCEEDED, result.getStatus()); |
| 151 | +} |
| 152 | +``` |
| 153 | + |
| 154 | +**Example Tests (examples/src/test/)** |
| 155 | +- Demonstrate SDK usage patterns |
| 156 | +- Local tests use `LocalDurableTestRunner` |
| 157 | +- Cloud tests use `CloudDurableTestRunner` (requires deployed Lambda) |
| 158 | +- Cloud tests disabled by default (`-Dtest.cloud.enabled=true`) |
| 159 | + |
| 160 | +```java |
| 161 | +@Test |
| 162 | +@EnabledIf("isCloudTestsEnabled") |
| 163 | +void testAgainstRealLambda() { |
| 164 | + var arn = "arn:aws:lambda:us-east-1:123456789012:function:my-fn"; |
| 165 | + var runner = CloudDurableTestRunner.create(arn, Input.class, Output.class); |
| 166 | + var result = runner.run(new Input("test")); |
| 167 | + assertEquals(ExecutionStatus.SUCCEEDED, result.getStatus()); |
| 168 | +} |
| 169 | +``` |
| 170 | + |
| 171 | +### Test Guidelines |
| 172 | + |
| 173 | +- Test business logic, replay behavior, edge cases |
| 174 | +- Don't test POJO getters/setters |
| 175 | +- Use `LocalDurableTestRunner` for fast tests |
| 176 | +- Use `CloudDurableTestRunner` only for end-to-end validation |
| 177 | +- JUnit 5 with static imports for assertions |
| 178 | + |
| 179 | +## Architecture Essentials |
| 180 | + |
| 181 | +### Checkpoint-and-Replay |
| 182 | + |
| 183 | +1. Operations get sequential IDs |
| 184 | +2. Completed operations stored in ExecutionManager |
| 185 | +3. On replay: return cached result, skip re-execution |
| 186 | +4. New operations: execute, checkpoint, continue |
| 187 | + |
| 188 | +### Key Classes |
| 189 | + |
| 190 | +| Class | Responsibility | |
| 191 | +|-------|----------------| |
| 192 | +| `DurableHandler<I,O>` | Lambda entry point, extend this | |
| 193 | +| `DurableContext` | User API: `step()`, `wait()` | |
| 194 | +| `DurableExecutor` | Orchestrates execution lifecycle | |
| 195 | +| `ExecutionManager` | Thread coordination, state management | |
| 196 | +| `CheckpointBatcher` | Batches checkpoint API calls (750KB limit) | |
| 197 | +| `StepOperation` | Executes steps with retry logic | |
| 198 | +| `WaitOperation` | Handles wait checkpointing | |
| 199 | + |
| 200 | +## Common Tasks |
| 201 | + |
| 202 | +### Add a New Operation Type |
| 203 | + |
| 204 | +1. Create class in `operation/` implementing `DurableOperation<T>` |
| 205 | +2. Add method to `DurableContext` that delegates to new operation |
| 206 | +3. Add tests for: first execution, replay, error cases |
| 207 | + |
| 208 | +### Add a Test |
| 209 | + |
| 210 | +```java |
| 211 | +@Test |
| 212 | +void descriptiveTestName() { |
| 213 | + // Given |
| 214 | + var handler = new MyHandler(); |
| 215 | + var runner = LocalDurableTestRunner.create(MyInput.class, handler::handleRequest); |
| 216 | + |
| 217 | + // When |
| 218 | + var result = runner.runUntilComplete(new MyInput("test")); |
| 219 | + |
| 220 | + // Then |
| 221 | + assertEquals(expected, result); |
| 222 | +} |
| 223 | +``` |
| 224 | + |
| 225 | +### Debug Thread Coordination |
| 226 | + |
| 227 | +Check `ExecutionManager` for thread registration and coordination logic if debugging concurrency issues. |
| 228 | + |
| 229 | +## When Unsure |
| 230 | + |
| 231 | +- Ask clarifying questions before making assumptions |
| 232 | +- Check existing code for patterns (especially in `operation/` package) |
| 233 | +- Prefer minimal changes over large refactors |
| 234 | + |
| 235 | +## Further Reading |
| 236 | + |
| 237 | +### Official AWS SDKs |
| 238 | + |
| 239 | +- **JavaScript/TypeScript**: https://github.com/aws/aws-durable-execution-sdk-js |
| 240 | +- **Python**: https://github.com/aws/aws-durable-execution-sdk-python |
| 241 | + |
| 242 | +### AWS Documentation |
| 243 | + |
| 244 | +- [Lambda Durable Functions](https://docs.aws.amazon.com/lambda/latest/dg/durable-functions.html) |
| 245 | +- [Durable Execution SDK](https://docs.aws.amazon.com/lambda/latest/dg/durable-execution-sdk.html) |
| 246 | +- [Best Practices](https://docs.aws.amazon.com/lambda/latest/dg/durable-best-practices.html) |
0 commit comments