Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
70 changes: 9 additions & 61 deletions aws-lambda-durable-functions-power/POWER.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,63 +15,18 @@ author: "AWS"

Build resilient multi-step applications and AI workflows that can execute for up to 1 year while maintaining reliable progress despite interruptions.

## Onboarding
**Works best with** the [AWS MCP server](https://docs.aws.amazon.com/aws-mcp/) but is not required. All AWS interactions in this skill use standard AWS CLI commands that work in any environment with configured AWS credentials.

### Step 1: Validate Prerequisites
## Critical Rules

Before using AWS Lambda durable functions, verify:
Read these before writing any code. Each one is a constraint that will silently break a function if violated.

1. **AWS CLI** is installed (2.33.22 or higher) and configured:

```bash
aws --version
aws sts get-caller-identity
```

2. **Runtime environment** is ready:
- For TypeScript/JavaScript: Node.js 22+ (`node --version`)
- For Python: Python 3.11+ (`python --version`. Note that currently only Lambda runtime environments 3.13+ come with the Durable Execution SDK pre-installed. 3.11 is the min supported Python version by the Durable SDK itself, however, you could use OCI to bring your own container image with your own Python runtime + Durable SDK.)

3. **Deployment capability** exists (one of):
- AWS SAM CLI (`sam --version`) 1.153.1 or higher
- AWS CDK (`cdk --version`) v2.237.1 or higher
- Direct Lambda deployment access

## Step 2: Check user and project preferences

Ask which IaC framework to use for new projects.
Ask which programming language to use if unclear, clarify between JavaScript and TypeScript if necessary.
Ask to create a git repo for projects if one doesn't exist already.

### Error Scenarios

#### Unsupported Language

- List detected language
- State: "Durable Execution SDK is not yet available for [framework]"
- Suggest supported languages as alternatives

#### Unsupported IaC Framework

- List detected framework
- State: "[framework] might not support Lambda durable functions yet"
- Suggest supported frameworks as alternatives

### Step 3: Install SDK

**For TypeScript/JavaScript:**

```bash
npm install @aws/durable-execution-sdk-js
npm install --save-dev @aws/durable-execution-sdk-js-testing
```

**For Python:**

```bash
pip install aws-durable-execution-sdk-python
pip install aws-durable-execution-sdk-python-testing
```
1. **Durable execution must be enabled at function creation time — it cannot be retrofitted.** A new Lambda function must be created with durable execution turned on. Migrate the logic into the new function; do not attempt to install the SDK and wrap the handler of the existing function and expect it to work.
2. **Durable functions must be invoked with a qualified ARN** — a specific version, an alias, or the literal `$LATEST` suffix. An unqualified function name will fail. See the *Invocation Requirements* section below for examples.
3. **Durable operations cannot be nested.** You cannot call `context.step()`, `context.wait()`, or `context.invoke()` from inside another step's callback. Use `context.runInChildContext()` to group operations instead.
4. **All non-deterministic code must run inside steps.** `Date.now()`, `Math.random()`, UUID generation, API calls, and database queries outside a step will produce different values on replay and corrupt execution state.
5. **Closure mutations are lost on replay** - return values from steps
6. **Side effects outside steps repeat** - use `context.logger` (replay-aware)

## When to Load Reference Files

Expand Down Expand Up @@ -115,13 +70,6 @@ def handler(event: dict, context: DurableContext) -> dict:
return result
```

### Critical Rules

1. **All non-deterministic code MUST be in steps** (Date.now, Math.random, API calls)
2. **Cannot nest durable operations** - use `runInChildContext` to group operations
3. **Closure mutations are lost on replay** - return values from steps
4. **Side effects outside steps repeat** - use `context.logger` (replay-aware)

### Python API Differences

The Python SDK differs from TypeScript in several key areas:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ Advanced error handling patterns for durable functions, including timeout handli
4. Execute fallback operation in a separate step

**Important limitation:**
In TypeScript, native setTimeout (and patterns like Promise.race using it) will fail during execution replays. To create a reliable timeout that persists across execution (expands over multi invocations), always use the timeout parameter provided by waitForCallback or waitForCondition
In TypeScript, native setTimeout (and patterns like Promise.race using it) will fail during execution replays. To create a reliable timeout that persists across execution (expands over multi invocations), always use the timeout parameter provided by waitForCallback

## Conditional Retry Based on Error Type

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -227,7 +227,7 @@ const results = await context.map(

// Only one item processed (assuming first succeeds)
if (results.successCount > 0) {
const match = results.getSucceeded()[0];
const match = results.succeeded()[0];
context.logger.info('Found match', { match });
}
```
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -196,7 +196,7 @@ results = context.map(
```typescript
const results = await context.map('process', items, processFunc);

console.log(results.status); // 'COMPLETED' | 'FAILED'
console.log(results.status); // 'SUCCEEDED' | 'FAILED'
console.log(results.totalCount); // Total items
console.log(results.startedCount); // Items started
console.log(results.successCount); // Successful items
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -302,7 +302,7 @@ DurableFunction:
RetentionPeriodInDays: 1 # Short retention
Environment:
Variables:
LOG_LEVEL: DEBUG
LOG_LEVEL: DEBUG # Use INFO or higher in non-dev — DEBUG may expose step results and execution state
ENVIRONMENT: development
```

Expand Down
43 changes: 19 additions & 24 deletions aws-lambda-durable-functions-power/steering/error-handling.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,15 +89,15 @@ const result = await context.step(
```python
def custom_retry(error: Exception, attempt: int) -> RetryDecision:
if hasattr(error, 'status_code') and 400 <= error.status_code < 500:
return RetryDecision(should_retry=False)
return RetryDecision.no_retry()

if attempt < 5:
return RetryDecision(
should_retry=True,
delay=Duration.from_seconds(2 ** attempt)
)

return RetryDecision(should_retry=False)
return RetryDecision.no_retry()
```

## Error Classification
Expand Down Expand Up @@ -221,43 +221,38 @@ def handler(event: dict, context: DurableContext) -> dict:
return {'success': True, 'order_id': shipment['order_id']}

except Exception as error:
context.logger.error('Order failed, executing compensations', error)
context.logger.error(f'Order failed, executing compensations: {error}')

for name, comp_step, resource_id in reversed(compensations):
try:
context.step(comp_step(resource_id))
except Exception as comp_error:
context.logger.error(f'Compensation {name} failed', comp_error)
context.logger.error(f'Compensation {name} failed: {comp_error}')

raise error
```

## Unrecoverable Errors

Configure non-retryable failures to stop execution immediately:
Mark errors as unrecoverable to stop execution immediately:

**TypeScript:**

The TypeScript SDK does not currently expose a public unrecoverable error type.
Use a no-retry strategy when a step should fail immediately.

```typescript
import { retryPresets } from '@aws/durable-execution-sdk-js';

export const handler = withDurableExecution(async (event, context: DurableContext) => {
const user = await context.step('fetch-user', async () => {
const user = await fetchUser(event.userId);

if (!user) {
// This error fails the step immediately because retryPresets.noRetry
// disables retries for this step.
throw new Error('User not found');
}

return user;
}, {
retryStrategy: retryPresets.noRetry,
});
const user = await context.step(
'fetch-user',
async () => {
const user = await fetchUser(event.userId);

if (!user) {
throw new Error('User not found');
}
return user;
},
{ retryStrategy: () => ({ shouldRetry: false }) }
);

// Continue processing...
});
Expand Down Expand Up @@ -428,7 +423,7 @@ export const handler = withDurableExecution(async (event, context: DurableContex
2. **Classify errors correctly** - distinguish retryable from non-retryable
3. **Implement compensating transactions** for distributed workflows
4. **Make errors deterministic** - same input produces same error
5. **Disable retries for non-retryable errors** to stop execution early when appropriate
5. **Use unrecoverable errors** to stop execution early when appropriate
6. **Log errors with context** using `context.logger`
7. **Handle partial failures** gracefully in batch operations
8. **Implement circuit breakers** for external service calls
Expand Down
79 changes: 74 additions & 5 deletions aws-lambda-durable-functions-power/steering/getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,81 @@

Quick start guide for building your first durable function.

## Check user and project preferences
## Onboarding

Ask which IaC framework to use for new projects.
Ask which programming language to use if unclear, clarify between JavaScript and TypeScript if necessary.
Ask to create a git repo for projects if one doesn't exist already.
### Step 1: Validate Prerequisites

Before using AWS Lambda durable functions, verify:

1. **AWS CLI** is installed (2.33.22 or higher) and configured:

```bash
aws --version
aws sts get-caller-identity
```

2. **Runtime environment** is ready:
- For TypeScript/JavaScript: Node.js 22+ (`node --version`)
- For Python: Python 3.11+ (`python --version`. Note that only Lambda runtime environments 3.13+ come with the Durable Execution SDK pre-installed. 3.11 is the minimum supported Python version by the Durable Execution SDK itself — use OCI to bring your own container image with an older Python runtime + Durable Execution SDK.)

3. **Deployment capability** exists (one of):
- AWS SAM CLI (`sam --version`) 1.153.1 or higher
- AWS CDK (`cdk --version`) v2.237.1 or higher
- Direct Lambda deployment access

### Step 2: Select language and IaC framework

### Language Selection

Default: TypeScript

Override syntax:

- "use Python" → Generate Python code
- "use JavaScript" → Generate JavaScript code

When not specified, ALWAYS use TypeScript

### IaC framework selection

Default: CDK

Override syntax:

- "use CloudFormation" → Generate YAML templates
- "use SAM" → Generate YAML templates

When not specified, ALWAYS use CDK

### Error Scenarios

#### Unsupported Language

- List detected language
- State: "Durable Execution SDK is not yet available for [framework]"
- Suggest supported languages as alternatives

#### Unsupported IaC Framework

- List detected framework
- State: "[framework] might not support Lambda durable functions yet"
- Suggest supported frameworks as alternatives

### Step 3: Install SDK

**For TypeScript/JavaScript:**

```bash
npm install @aws/durable-execution-sdk-js
npm install --save-dev @aws/durable-execution-sdk-js-testing
```

**For Python:**

```bash
pip install aws-durable-execution-sdk-python
pip install aws-durable-execution-sdk-python-testing
```

## Basic Handler

Expand Down Expand Up @@ -244,7 +314,6 @@ my-durable-function/
│ └── retry_strategies.py
├── tests/
│ └── test_handler.py # Tests with DurableFunctionTestRunner
│ └── test_handler.py # Tests with DurableFunctionTestRunner
├── infrastructure/
│ └── template.yaml # SAM/CloudFormation
└── pyproject.toml # Project configuration
Expand Down
16 changes: 11 additions & 5 deletions aws-lambda-durable-functions-power/steering/step-operations.md
Original file line number Diff line number Diff line change
Expand Up @@ -134,15 +134,15 @@ from aws_durable_execution_sdk_python.retries import RetryDecision

def custom_retry(error: Exception, attempt: int) -> RetryDecision:
if isinstance(error, ValidationError):
return RetryDecision(should_retry=False)
return RetryDecision.no_retry()

if attempt < 3:
return RetryDecision(
should_retry=True,
delay=Duration.from_seconds(2 ** attempt)
)

return RetryDecision(should_retry=False)
return RetryDecision.no_retry()

result = context.step(
risky_operation(),
Expand Down Expand Up @@ -214,18 +214,24 @@ import { StepSemantics } from '@aws/durable-execution-sdk-js';
const result = await context.step(
'charge-payment',
async () => chargeCard(amount),
{ semantics: StepSemantics.AtMostOncePerRetry }
{
semantics: StepSemantics.AtMostOncePerRetry,
retryStrategy: () => ({ shouldRetry: false })
}
);
```

**Python:**

```python
from aws_durable_execution_sdk_python.config import StepSemantics
from aws_durable_execution_sdk_python.config import StepSemantics, StepConfig

result = context.step(
charge_card(amount),
config=StepConfig(step_semantics=StepSemantics.AT_MOST_ONCE_PER_RETRY)
config=StepConfig(
step_semantics=StepSemantics.AT_MOST_ONCE_PER_RETRY,
retry_strategy=lambda error, attempt: RetryDecision.no_retry()
)
)
```

Expand Down
13 changes: 5 additions & 8 deletions aws-lambda-durable-functions-power/steering/testing-patterns.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ def test_workflow():
runner = DurableFunctionTestRunner(handler=handler)

with runner:
result = runner.run(input={'user_id': '123'}, timeout=10)
result = runner.run(input='{"user_id": "123"}', timeout=10)

assert result.status is InvocationStatus.SUCCEEDED
```
Expand Down Expand Up @@ -272,11 +272,9 @@ it('should handle callback failure', async () => {

const executionPromise = runner.run({ payload: {} });

await new Promise(resolve => setTimeout(resolve, 100));

const callbackOp = runner.getOperation('wait-for-approval');

// Send callback failure
await callbackOp.waitForData(WaitingOperationStatus.STARTED);

await callbackOp.sendCallbackFailure(
'ApprovalDenied',
'Request was rejected'
Expand Down Expand Up @@ -320,10 +318,9 @@ it('should handle callback heartbeats', async () => {

const executionPromise = runner.run({ payload: {} });

await new Promise(resolve => setTimeout(resolve, 100));

const callbackOp = runner.getOperation('long-running-process');

await callbackOp.waitForData(WaitingOperationStatus.STARTED);

// Send heartbeats
await callbackOp.sendCallbackHeartbeat();
await runner.skipTime({ minutes: 2 });
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ Steps:

1. Fetch the execution history directly:
Run: aws lambda get-durable-execution-history --durable-execution-arn <durable-execution-arn> --region <region> --include-execution-data
Note: execution data may contain sensitive information (PII, credentials, business data). Do not display raw step results to users without reviewing content first.

2. If the command succeeds, analyze and provide a user-friendly diagnosis:
a. Report the execution status (RUNNING/SUCCEEDED/FAILED/STOPPED/TIMED_OUT)
Expand Down
Loading