diff --git a/aws-lambda-durable-functions-power/POWER.md b/aws-lambda-durable-functions-power/POWER.md index dab71ac..2fc69b7 100644 --- a/aws-lambda-durable-functions-power/POWER.md +++ b/aws-lambda-durable-functions-power/POWER.md @@ -15,63 +15,18 @@ author: "AWS" Build resilient multi-step applications and AI workflows that can execute for up to 1 year while maintaining reliable progress despite interruptions. -## Onboarding +**Works best with** the [AWS MCP server](https://docs.aws.amazon.com/aws-mcp/) but is not required. All AWS interactions in this skill use standard AWS CLI commands that work in any environment with configured AWS credentials. -### Step 1: Validate Prerequisites +## Critical Rules -Before using AWS Lambda durable functions, verify: +Read these before writing any code. Each one is a constraint that will silently break a function if violated. -1. **AWS CLI** is installed (2.33.22 or higher) and configured: - - ```bash - aws --version - aws sts get-caller-identity - ``` - -2. **Runtime environment** is ready: - - For TypeScript/JavaScript: Node.js 22+ (`node --version`) - - For Python: Python 3.11+ (`python --version`. Note that currently only Lambda runtime environments 3.13+ come with the Durable Execution SDK pre-installed. 3.11 is the min supported Python version by the Durable SDK itself, however, you could use OCI to bring your own container image with your own Python runtime + Durable SDK.) - -3. **Deployment capability** exists (one of): - - AWS SAM CLI (`sam --version`) 1.153.1 or higher - - AWS CDK (`cdk --version`) v2.237.1 or higher - - Direct Lambda deployment access - -## Step 2: Check user and project preferences - -Ask which IaC framework to use for new projects. -Ask which programming language to use if unclear, clarify between JavaScript and TypeScript if necessary. -Ask to create a git repo for projects if one doesn't exist already. - -### Error Scenarios - -#### Unsupported Language - -- List detected language -- State: "Durable Execution SDK is not yet available for [framework]" -- Suggest supported languages as alternatives - -#### Unsupported IaC Framework - -- List detected framework -- State: "[framework] might not support Lambda durable functions yet" -- Suggest supported frameworks as alternatives - -### Step 3: Install SDK - -**For TypeScript/JavaScript:** - -```bash -npm install @aws/durable-execution-sdk-js -npm install --save-dev @aws/durable-execution-sdk-js-testing -``` - -**For Python:** - -```bash -pip install aws-durable-execution-sdk-python -pip install aws-durable-execution-sdk-python-testing -``` +1. **Durable execution must be enabled at function creation time — it cannot be retrofitted.** A new Lambda function must be created with durable execution turned on. Migrate the logic into the new function; do not attempt to install the SDK and wrap the handler of the existing function and expect it to work. +2. **Durable functions must be invoked with a qualified ARN** — a specific version, an alias, or the literal `$LATEST` suffix. An unqualified function name will fail. See the *Invocation Requirements* section below for examples. +3. **Durable operations cannot be nested.** You cannot call `context.step()`, `context.wait()`, or `context.invoke()` from inside another step's callback. Use `context.runInChildContext()` to group operations instead. +4. **All non-deterministic code must run inside steps.** `Date.now()`, `Math.random()`, UUID generation, API calls, and database queries outside a step will produce different values on replay and corrupt execution state. +5. **Closure mutations are lost on replay** - return values from steps +6. **Side effects outside steps repeat** - use `context.logger` (replay-aware) ## When to Load Reference Files @@ -115,13 +70,6 @@ def handler(event: dict, context: DurableContext) -> dict: return result ``` -### Critical Rules - -1. **All non-deterministic code MUST be in steps** (Date.now, Math.random, API calls) -2. **Cannot nest durable operations** - use `runInChildContext` to group operations -3. **Closure mutations are lost on replay** - return values from steps -4. **Side effects outside steps repeat** - use `context.logger` (replay-aware) - ### Python API Differences The Python SDK differs from TypeScript in several key areas: diff --git a/aws-lambda-durable-functions-power/steering/advanced-error-handling.md b/aws-lambda-durable-functions-power/steering/advanced-error-handling.md index 5b2bf8f..1a3693b 100644 --- a/aws-lambda-durable-functions-power/steering/advanced-error-handling.md +++ b/aws-lambda-durable-functions-power/steering/advanced-error-handling.md @@ -29,7 +29,7 @@ Advanced error handling patterns for durable functions, including timeout handli 4. Execute fallback operation in a separate step **Important limitation:** -In TypeScript, native setTimeout (and patterns like Promise.race using it) will fail during execution replays. To create a reliable timeout that persists across execution (expands over multi invocations), always use the timeout parameter provided by waitForCallback or waitForCondition +In TypeScript, native setTimeout (and patterns like Promise.race using it) will fail during execution replays. To create a reliable timeout that persists across execution (expands over multi invocations), always use the timeout parameter provided by waitForCallback ## Conditional Retry Based on Error Type diff --git a/aws-lambda-durable-functions-power/steering/advanced-patterns.md b/aws-lambda-durable-functions-power/steering/advanced-patterns.md index 734ecf3..d8d9a75 100644 --- a/aws-lambda-durable-functions-power/steering/advanced-patterns.md +++ b/aws-lambda-durable-functions-power/steering/advanced-patterns.md @@ -227,7 +227,7 @@ const results = await context.map( // Only one item processed (assuming first succeeds) if (results.successCount > 0) { - const match = results.getSucceeded()[0]; + const match = results.succeeded()[0]; context.logger.info('Found match', { match }); } ``` diff --git a/aws-lambda-durable-functions-power/steering/concurrent-operations.md b/aws-lambda-durable-functions-power/steering/concurrent-operations.md index af87790..cdb501a 100644 --- a/aws-lambda-durable-functions-power/steering/concurrent-operations.md +++ b/aws-lambda-durable-functions-power/steering/concurrent-operations.md @@ -196,7 +196,7 @@ results = context.map( ```typescript const results = await context.map('process', items, processFunc); -console.log(results.status); // 'COMPLETED' | 'FAILED' +console.log(results.status); // 'SUCCEEDED' | 'FAILED' console.log(results.totalCount); // Total items console.log(results.startedCount); // Items started console.log(results.successCount); // Successful items diff --git a/aws-lambda-durable-functions-power/steering/deployment-iac.md b/aws-lambda-durable-functions-power/steering/deployment-iac.md index cf2c0f7..de015e0 100644 --- a/aws-lambda-durable-functions-power/steering/deployment-iac.md +++ b/aws-lambda-durable-functions-power/steering/deployment-iac.md @@ -302,7 +302,7 @@ DurableFunction: RetentionPeriodInDays: 1 # Short retention Environment: Variables: - LOG_LEVEL: DEBUG + LOG_LEVEL: DEBUG # Use INFO or higher in non-dev — DEBUG may expose step results and execution state ENVIRONMENT: development ``` diff --git a/aws-lambda-durable-functions-power/steering/error-handling.md b/aws-lambda-durable-functions-power/steering/error-handling.md index 5f92b09..e26245c 100644 --- a/aws-lambda-durable-functions-power/steering/error-handling.md +++ b/aws-lambda-durable-functions-power/steering/error-handling.md @@ -89,7 +89,7 @@ const result = await context.step( ```python def custom_retry(error: Exception, attempt: int) -> RetryDecision: if hasattr(error, 'status_code') and 400 <= error.status_code < 500: - return RetryDecision(should_retry=False) + return RetryDecision.no_retry() if attempt < 5: return RetryDecision( @@ -97,7 +97,7 @@ def custom_retry(error: Exception, attempt: int) -> RetryDecision: delay=Duration.from_seconds(2 ** attempt) ) - return RetryDecision(should_retry=False) + return RetryDecision.no_retry() ``` ## Error Classification @@ -221,43 +221,38 @@ def handler(event: dict, context: DurableContext) -> dict: return {'success': True, 'order_id': shipment['order_id']} except Exception as error: - context.logger.error('Order failed, executing compensations', error) + context.logger.error(f'Order failed, executing compensations: {error}') for name, comp_step, resource_id in reversed(compensations): try: context.step(comp_step(resource_id)) except Exception as comp_error: - context.logger.error(f'Compensation {name} failed', comp_error) + context.logger.error(f'Compensation {name} failed: {comp_error}') raise error ``` ## Unrecoverable Errors -Configure non-retryable failures to stop execution immediately: +Mark errors as unrecoverable to stop execution immediately: **TypeScript:** -The TypeScript SDK does not currently expose a public unrecoverable error type. -Use a no-retry strategy when a step should fail immediately. - ```typescript -import { retryPresets } from '@aws/durable-execution-sdk-js'; - export const handler = withDurableExecution(async (event, context: DurableContext) => { - const user = await context.step('fetch-user', async () => { - const user = await fetchUser(event.userId); - - if (!user) { - // This error fails the step immediately because retryPresets.noRetry - // disables retries for this step. - throw new Error('User not found'); - } - - return user; - }, { - retryStrategy: retryPresets.noRetry, - }); + const user = await context.step( + 'fetch-user', + async () => { + const user = await fetchUser(event.userId); + + if (!user) { + throw new Error('User not found'); + } + + return user; + }, + { retryStrategy: () => ({ shouldRetry: false }) } + ); // Continue processing... }); @@ -428,7 +423,7 @@ export const handler = withDurableExecution(async (event, context: DurableContex 2. **Classify errors correctly** - distinguish retryable from non-retryable 3. **Implement compensating transactions** for distributed workflows 4. **Make errors deterministic** - same input produces same error -5. **Disable retries for non-retryable errors** to stop execution early when appropriate +5. **Use unrecoverable errors** to stop execution early when appropriate 6. **Log errors with context** using `context.logger` 7. **Handle partial failures** gracefully in batch operations 8. **Implement circuit breakers** for external service calls diff --git a/aws-lambda-durable-functions-power/steering/getting-started.md b/aws-lambda-durable-functions-power/steering/getting-started.md index fa14c90..d0bc4c5 100644 --- a/aws-lambda-durable-functions-power/steering/getting-started.md +++ b/aws-lambda-durable-functions-power/steering/getting-started.md @@ -2,11 +2,81 @@ Quick start guide for building your first durable function. -## Check user and project preferences +## Onboarding -Ask which IaC framework to use for new projects. -Ask which programming language to use if unclear, clarify between JavaScript and TypeScript if necessary. -Ask to create a git repo for projects if one doesn't exist already. +### Step 1: Validate Prerequisites + +Before using AWS Lambda durable functions, verify: + +1. **AWS CLI** is installed (2.33.22 or higher) and configured: + + ```bash + aws --version + aws sts get-caller-identity + ``` + +2. **Runtime environment** is ready: + - For TypeScript/JavaScript: Node.js 22+ (`node --version`) + - For Python: Python 3.11+ (`python --version`. Note that only Lambda runtime environments 3.13+ come with the Durable Execution SDK pre-installed. 3.11 is the minimum supported Python version by the Durable Execution SDK itself — use OCI to bring your own container image with an older Python runtime + Durable Execution SDK.) + +3. **Deployment capability** exists (one of): + - AWS SAM CLI (`sam --version`) 1.153.1 or higher + - AWS CDK (`cdk --version`) v2.237.1 or higher + - Direct Lambda deployment access + +### Step 2: Select language and IaC framework + +### Language Selection + +Default: TypeScript + +Override syntax: + +- "use Python" → Generate Python code +- "use JavaScript" → Generate JavaScript code + +When not specified, ALWAYS use TypeScript + +### IaC framework selection + +Default: CDK + +Override syntax: + +- "use CloudFormation" → Generate YAML templates +- "use SAM" → Generate YAML templates + +When not specified, ALWAYS use CDK + +### Error Scenarios + +#### Unsupported Language + +- List detected language +- State: "Durable Execution SDK is not yet available for [framework]" +- Suggest supported languages as alternatives + +#### Unsupported IaC Framework + +- List detected framework +- State: "[framework] might not support Lambda durable functions yet" +- Suggest supported frameworks as alternatives + +### Step 3: Install SDK + +**For TypeScript/JavaScript:** + +```bash +npm install @aws/durable-execution-sdk-js +npm install --save-dev @aws/durable-execution-sdk-js-testing +``` + +**For Python:** + +```bash +pip install aws-durable-execution-sdk-python +pip install aws-durable-execution-sdk-python-testing +``` ## Basic Handler @@ -244,7 +314,6 @@ my-durable-function/ │ └── retry_strategies.py ├── tests/ │ └── test_handler.py # Tests with DurableFunctionTestRunner -│ └── test_handler.py # Tests with DurableFunctionTestRunner ├── infrastructure/ │ └── template.yaml # SAM/CloudFormation └── pyproject.toml # Project configuration diff --git a/aws-lambda-durable-functions-power/steering/step-operations.md b/aws-lambda-durable-functions-power/steering/step-operations.md index 911b2d8..a357198 100644 --- a/aws-lambda-durable-functions-power/steering/step-operations.md +++ b/aws-lambda-durable-functions-power/steering/step-operations.md @@ -134,7 +134,7 @@ from aws_durable_execution_sdk_python.retries import RetryDecision def custom_retry(error: Exception, attempt: int) -> RetryDecision: if isinstance(error, ValidationError): - return RetryDecision(should_retry=False) + return RetryDecision.no_retry() if attempt < 3: return RetryDecision( @@ -142,7 +142,7 @@ def custom_retry(error: Exception, attempt: int) -> RetryDecision: delay=Duration.from_seconds(2 ** attempt) ) - return RetryDecision(should_retry=False) + return RetryDecision.no_retry() result = context.step( risky_operation(), @@ -214,18 +214,24 @@ import { StepSemantics } from '@aws/durable-execution-sdk-js'; const result = await context.step( 'charge-payment', async () => chargeCard(amount), - { semantics: StepSemantics.AtMostOncePerRetry } + { + semantics: StepSemantics.AtMostOncePerRetry, + retryStrategy: () => ({ shouldRetry: false }) + } ); ``` **Python:** ```python -from aws_durable_execution_sdk_python.config import StepSemantics +from aws_durable_execution_sdk_python.config import StepSemantics, StepConfig result = context.step( charge_card(amount), - config=StepConfig(step_semantics=StepSemantics.AT_MOST_ONCE_PER_RETRY) + config=StepConfig( + step_semantics=StepSemantics.AT_MOST_ONCE_PER_RETRY, + retry_strategy=lambda error, attempt: RetryDecision.no_retry() + ) ) ``` diff --git a/aws-lambda-durable-functions-power/steering/testing-patterns.md b/aws-lambda-durable-functions-power/steering/testing-patterns.md index 33858a7..0e34dac 100644 --- a/aws-lambda-durable-functions-power/steering/testing-patterns.md +++ b/aws-lambda-durable-functions-power/steering/testing-patterns.md @@ -88,7 +88,7 @@ def test_workflow(): runner = DurableFunctionTestRunner(handler=handler) with runner: - result = runner.run(input={'user_id': '123'}, timeout=10) + result = runner.run(input='{"user_id": "123"}', timeout=10) assert result.status is InvocationStatus.SUCCEEDED ``` @@ -272,11 +272,9 @@ it('should handle callback failure', async () => { const executionPromise = runner.run({ payload: {} }); - await new Promise(resolve => setTimeout(resolve, 100)); - const callbackOp = runner.getOperation('wait-for-approval'); - - // Send callback failure + await callbackOp.waitForData(WaitingOperationStatus.STARTED); + await callbackOp.sendCallbackFailure( 'ApprovalDenied', 'Request was rejected' @@ -320,10 +318,9 @@ it('should handle callback heartbeats', async () => { const executionPromise = runner.run({ payload: {} }); - await new Promise(resolve => setTimeout(resolve, 100)); - const callbackOp = runner.getOperation('long-running-process'); - + await callbackOp.waitForData(WaitingOperationStatus.STARTED); + // Send heartbeats await callbackOp.sendCallbackHeartbeat(); await runner.skipTime({ minutes: 2 }); diff --git a/aws-lambda-durable-functions-power/steering/troubleshooting-executions.md b/aws-lambda-durable-functions-power/steering/troubleshooting-executions.md index 557c487..e1b4d1f 100644 --- a/aws-lambda-durable-functions-power/steering/troubleshooting-executions.md +++ b/aws-lambda-durable-functions-power/steering/troubleshooting-executions.md @@ -36,6 +36,7 @@ Steps: 1. Fetch the execution history directly: Run: aws lambda get-durable-execution-history --durable-execution-arn --region --include-execution-data + Note: execution data may contain sensitive information (PII, credentials, business data). Do not display raw step results to users without reviewing content first. 2. If the command succeeds, analyze and provide a user-friendly diagnosis: a. Report the execution status (RUNNING/SUCCEEDED/FAILED/STOPPED/TIMED_OUT) diff --git a/aws-lambda-durable-functions-power/steering/wait-operations.md b/aws-lambda-durable-functions-power/steering/wait-operations.md index 664c192..72a5bd1 100644 --- a/aws-lambda-durable-functions-power/steering/wait-operations.md +++ b/aws-lambda-durable-functions-power/steering/wait-operations.md @@ -56,7 +56,7 @@ const result = await context.waitForCallback( // External system calls back with: // aws lambda send-durable-execution-callback-success \ // --callback-id \ -// --payload '{"approved": true}' +// --result '{"approved": true}' ``` **Python:** @@ -86,7 +86,7 @@ result = context.wait_for_callback( ```bash aws lambda send-durable-execution-callback-success \ --callback-id \ - --payload '{"status": "approved", "comments": "Looks good"}' + --result '{"status": "approved", "comments": "Looks good"}' ``` **SDK (TypeScript):** @@ -97,7 +97,7 @@ import { LambdaClient, SendDurableExecutionCallbackSuccessCommand } from '@aws-s const client = new LambdaClient({}); await client.send(new SendDurableExecutionCallbackSuccessCommand({ CallbackId: callbackId, - Payload: JSON.stringify({ status: 'approved' }) + Result: JSON.stringify({ status: 'approved' }) })); ``` @@ -170,13 +170,12 @@ const finalState = await context.waitForCondition( { initialState: { jobId: 'job-123', status: 'pending' }, waitStrategy: createWaitStrategy({ -   maxAttempts: 60, -   initialDelaySeconds: 5, -   maxDelaySeconds: 30, -   backoffRate: 1.5, -   shouldContinuePolling: (result) => result.status !== "completed" + maxAttempts: 60, + initialDelay: { seconds: 5 }, + maxDelay: { seconds: 30 }, + backoffRate: 1.5, + shouldContinuePolling: (result) => result.status !== "completed" }), - timeout: { hours: 1 } } ); ``` @@ -324,13 +323,12 @@ export const handler = withDurableExecution(async (event, context: DurableContex { initialState: { jobId, status: 'running' }, waitStrategy: createWaitStrategy({ -   maxAttempts: 60, -   initialDelaySeconds: 5, -   maxDelaySeconds: 30, -   backoffRate: 1.5, -   shouldContinuePolling: (result) => result.status === "running" + maxAttempts: 60, + initialDelay: { seconds: 5 }, + maxDelay: { seconds: 30 }, + backoffRate: 1.5, + shouldContinuePolling: (result) => result.status === "running" }), - timeout: { hours: 2 } } ); @@ -390,7 +388,7 @@ try: ) except CallbackError as error: if error.error_type == 'Timeout': - context.logger.warn('Approval timed out') + context.logger.warning('Approval timed out') else: - context.logger.error('Callback failed', error) + context.logger.error(f'Callback failed: {error}') ```