Skip to content

Commit 5c2a130

Browse files
committed
feat: Replace Step Functions pattern with Lambda Durable Functions pattern
- Replace sfn-ecs-python-sam with lambda-ecs-python-sam - Implement synchronous pattern with polling using context.wait() - Implement callback pattern with DynamoDB state tracking - Use Python 3.13 with container image deployment - Support durable execution for workflows up to 1 year - Reduce costs with no charges during context.wait() periods
1 parent ce2829f commit 5c2a130

11 files changed

Lines changed: 879 additions & 673 deletions

File tree

Lines changed: 320 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,320 @@
1+
# AWS Lambda Durable Functions to Amazon ECS with Python
2+
3+
This pattern demonstrates how to invoke an Amazon ECS task from AWS Lambda Durable Functions using Python, showcasing resilient multi-step workflows with automatic checkpointing and state management.
4+
5+
Lambda Durable Functions enable you to build resilient applications that can execute for up to one year while maintaining reliable progress despite interruptions. This pattern shows two integration approaches: **synchronous (polling with durable waits)** and **callback (async with durable steps)**.
6+
7+
Learn more about this pattern at Serverless Land Patterns: https://serverlessland.com/patterns
8+
9+
**Important:** This application uses various AWS services and there are costs associated with these services after the Free Tier usage - please see the [AWS Pricing page](https://aws.amazon.com/pricing/) for details.
10+
11+
## What are Lambda Durable Functions?
12+
13+
Lambda Durable Functions enable you to build resilient multi-step applications that can execute for up to one year while maintaining reliable progress despite interruptions. Key features include:
14+
15+
- **Automatic Checkpointing**: Each step is automatically checkpointed, so your function can resume from the last completed step after interruptions
16+
- **Cost-Effective Waits**: During wait operations, your function suspends without incurring compute charges
17+
- **Built-in Retries**: Steps have automatic retry logic with progress tracking
18+
- **Deterministic Replay**: When resuming, completed steps use stored results instead of re-executing
19+
20+
This pattern uses the [AWS Durable Execution SDK for Python](https://docs.aws.amazon.com/lambda/latest/dg/durable-execution-sdk.html) to implement these capabilities.
21+
22+
## Security Note
23+
24+
This pattern is designed for learning and demonstration purposes. The IAM roles and security group use permissive configurations to simplify deployment and focus on the integration patterns:
25+
26+
- **Security Group**: Allows all outbound traffic (required for pulling Docker images and calling AWS APIs)
27+
- **IAM Roles**: Use wildcard (`*`) resources for ECS task management
28+
29+
**For production use**, you should:
30+
- Restrict security group egress to specific AWS service endpoints using VPC endpoints
31+
- Scope IAM policies to specific resources (task definitions, DynamoDB tables)
32+
- Implement least privilege access based on your security requirements
33+
- Consider using AWS PrivateLink for service-to-service communication
34+
- Enable VPC Flow Logs for network traffic monitoring
35+
- Package the AWS SDK in your Lambda deployment package (13-14MB) instead of relying on the Lambda-provided runtime SDK
36+
- Include the Durable Execution SDK in your deployment package for production (included in requirements.txt)
37+
38+
Deploy this pattern in a non-production AWS account or isolated environment for testing.
39+
40+
## Requirements
41+
42+
* [Create an AWS account](https://portal.aws.amazon.com/gp/aws/developer/registration/index.html) if you do not already have one and log in. The IAM user that you use must have sufficient permissions to make necessary AWS service calls and manage AWS resources.
43+
* [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html) installed and configured
44+
* [Git Installed](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git)
45+
* [AWS Serverless Application Model](https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/serverless-sam-cli-install.html) (AWS SAM) installed
46+
47+
## Architecture
48+
49+
### Pattern 1: Synchronous (Durable Polling) Integration
50+
51+
```
52+
┌─────────────────────┐ ┌──────────────────┐ ┌─────────────┐
53+
│ Lambda Durable │ │ ECS Task │ │ CloudWatch │
54+
│ Function (Sync) │─────▶│ (Python) │─────▶│ Logs │
55+
│ │ │ │ │ │
56+
└─────────────────────┘ └──────────────────┘ └─────────────┘
57+
│ │
58+
│ Durable Wait (no charges) │
59+
└───────────────────────────────┘
60+
Polls with checkpointing
61+
```
62+
63+
**How it works:**
64+
1. Lambda durable function invokes the ECS task using `ecs:RunTask` (checkpointed step)
65+
2. Function uses `context.wait()` to pause without compute charges
66+
3. After each wait, function checks task status using `ecs:DescribeTasks` (checkpointed step)
67+
4. If interrupted, function automatically resumes from last checkpoint
68+
5. Once complete, Lambda returns the result
69+
6. Can run for up to 1 year (vs 15 minutes for standard Lambda)
70+
71+
**Key Durable Features:**
72+
- `@durable_execution` decorator enables durable execution
73+
- `@durable_step` decorator marks functions as checkpointed steps
74+
- `context.wait()` suspends execution without charges
75+
- Automatic replay and recovery from failures
76+
77+
**Use cases:**
78+
- Long-running tasks (hours to days)
79+
- Tasks requiring reliable progress tracking
80+
- Workflows that need automatic recovery
81+
- Cost-sensitive polling operations
82+
83+
**Advantages over standard Lambda:**
84+
- No 15-minute timeout limitation
85+
- Pay only for active execution time (not wait time)
86+
- Automatic checkpointing and recovery
87+
- Built-in retry logic
88+
89+
### Pattern 2: Callback (Durable Async) Integration
90+
91+
```
92+
┌─────────────────────┐ ┌──────────────────┐ ┌─────────────┐
93+
│ Lambda Durable │ │ ECS Task │ │ CloudWatch │
94+
│ Function (Callback)│─────▶│ (Python) │─────▶│ Logs │
95+
│ │ │ │ │ │
96+
└─────────────────────┘ └──────────────────┘ └─────────────┘
97+
│ │ │
98+
│ Checkpointed Steps │ │
99+
│ ▼ │
100+
│ ┌─────────────────┐ │
101+
└──────────────────────│ DynamoDB │◄─────────────┘
102+
│ Table │
103+
└─────────────────┘
104+
```
105+
106+
**How it works:**
107+
1. Lambda durable function creates DynamoDB record (checkpointed step)
108+
2. Lambda invokes the ECS task using `ecs:RunTask` (checkpointed step)
109+
3. Lambda updates DynamoDB with task ARN (checkpointed step)
110+
4. Lambda **returns immediately** (async pattern)
111+
5. The Python application in ECS processes the work
112+
6. When done, the ECS task updates DynamoDB with the result
113+
7. If any step fails, automatic retry with checkpoint recovery
114+
115+
**Key Durable Features:**
116+
- Each step is automatically checkpointed
117+
- If interrupted, function resumes from last completed step
118+
- No re-execution of completed steps
119+
- Reliable task initiation guaranteed
120+
121+
**Use cases:**
122+
- Fire-and-forget workflows
123+
- Asynchronous processing
124+
- When you don't need immediate results
125+
- Decoupling task execution from API responses
126+
- Workflows requiring guaranteed task initiation
127+
128+
**Advantages:**
129+
- Reliable task initiation with automatic recovery
130+
- Minimal Lambda execution time
131+
- Each step is independently retryable
132+
- No risk of duplicate task creation (idempotent)
133+
134+
## Deployment Instructions
135+
136+
### Prerequisites
137+
138+
* Python 3.13 or 3.14 runtime support for Lambda Durable Functions
139+
* AWS SAM CLI version that supports DurableConfig and container images
140+
* Docker installed (for building Lambda container images)
141+
142+
### Step 1: Clone the Repository
143+
144+
```bash
145+
git clone https://github.com/aws-samples/serverless-patterns
146+
cd serverless-patterns/lambda-ecs-python-sam
147+
```
148+
149+
### Step 2: Build and Deploy
150+
151+
This pattern uses Lambda container images with Python 3.13 to support durable functions. The build process will:
152+
- Build Docker images with the Durable Execution SDK
153+
- Create ECR repositories automatically
154+
- Push images to ECR
155+
- Deploy Lambda functions using the container images
156+
157+
```bash
158+
sam build
159+
sam deploy --guided
160+
```
161+
162+
During the prompts:
163+
- **Stack Name**: `lambda-ecs-durable-demo` (or your preferred name)
164+
- **AWS Region**: Your preferred region (e.g., `us-east-1`)
165+
- **Parameter VpcCIDR**: Press Enter to use default (10.0.0.0/16)
166+
- **Confirm changes before deploy**: Y
167+
- **Allow SAM CLI IAM role creation**: Y
168+
- **Disable rollback**: N
169+
- **SyncLambdaFunction has no authorization defined**: Y
170+
- **CallbackLambdaFunction has no authorization defined**: Y
171+
- **Create managed ECR repositories for all functions**: Y (required for container images)
172+
- **Save arguments to samconfig.toml**: Y
173+
174+
The deployment will take 5-10 minutes as it creates VPC, ECS cluster, Lambda functions, and other resources.
175+
176+
### Step 3: Note the Outputs
177+
178+
After deployment, note the following outputs:
179+
- `SyncLambdaFunctionArn` - ARN for the synchronous pattern Lambda
180+
- `CallbackLambdaFunctionArn` - ARN for the callback pattern Lambda
181+
- `CallbackTableName` - DynamoDB table for callback tracking
182+
- `ECSClusterName` - Name of the ECS cluster
183+
- `LogGroupName` - CloudWatch log group for ECS tasks
184+
185+
**Important**: When invoking durable functions, you must use a qualified ARN (append `:$LATEST` to the function name).
186+
187+
## How to Test
188+
189+
### Testing the Synchronous (Durable) Pattern
190+
191+
1. **Invoke the durable function asynchronously:**
192+
193+
Lambda Durable Functions with execution timeout > 15 minutes must be invoked asynchronously. Use the `--invocation-type Event` flag and a qualified ARN (with `:$LATEST`):
194+
195+
```bash
196+
aws lambda invoke \
197+
--function-name lambda-ecs-durable-demo-sync-function:\$LATEST \
198+
--invocation-type Event \
199+
--cli-binary-format raw-in-base64-out \
200+
--payload '{"message": "Hello from durable sync pattern", "processingTime": 10}' \
201+
response.json
202+
```
203+
204+
**Note**: The `\$LATEST` qualifier is required for durable functions. The backslash escapes the dollar sign in bash.
205+
206+
2. **Monitor the Lambda execution logs:**
207+
208+
```bash
209+
aws logs tail /aws/lambda/lambda-ecs-durable-demo-sync-function --follow
210+
```
211+
212+
You'll see:
213+
- Task starting with checkpointed step
214+
- Durable waits (no compute charges during waits)
215+
- Status checks every 5 seconds (PROVISIONING → PENDING → RUNNING → STOPPED)
216+
- Each check is a separate checkpointed operation
217+
- Final result when task completes
218+
219+
3. **View ECS task logs:**
220+
221+
```bash
222+
aws logs tail /ecs/lambda-ecs-durable-demo --follow
223+
```
224+
225+
4. **View execution in Lambda console:**
226+
227+
Navigate to the Lambda console → Your function → "Monitoring" tab → "Logs" to see the execution timeline and checkpoints.
228+
229+
### Testing the Callback (Durable) Pattern
230+
231+
1. **Invoke the durable function asynchronously:**
232+
233+
```bash
234+
aws lambda invoke \
235+
--function-name lambda-ecs-durable-demo-callback-function:\$LATEST \
236+
--invocation-type Event \
237+
--cli-binary-format raw-in-base64-out \
238+
--payload '{"message": "Hello from durable callback pattern", "processingTime": 30}' \
239+
response.json
240+
```
241+
242+
2. **Monitor the Lambda execution logs:**
243+
244+
```bash
245+
aws logs tail /aws/lambda/lambda-ecs-durable-demo-callback-function --follow
246+
```
247+
248+
You'll see:
249+
- DynamoDB record creation (checkpointed)
250+
- ECS task initiation (checkpointed)
251+
- Function returns immediately
252+
253+
3. **Check the status in DynamoDB:**
254+
255+
```bash
256+
# Scan the table to see all executions
257+
aws dynamodb scan --table-name lambda-ecs-durable-demo-callbacks
258+
259+
# Or get a specific execution (replace with your execution ID from logs)
260+
aws dynamodb get-item \
261+
--table-name lambda-ecs-durable-demo-callbacks \
262+
--key '{"executionId": {"S": "YOUR-EXECUTION-ID"}}'
263+
```
264+
265+
4. **Monitor ECS task logs:**
266+
267+
```bash
268+
aws logs tail /ecs/lambda-ecs-durable-demo --follow
269+
```
270+
271+
The ECS task will update DynamoDB when processing is complete. You'll see the result in the `result` field with status `COMPLETED`.
272+
273+
## Key Differences Between Patterns
274+
275+
| Feature | Synchronous (Durable Polling) | Callback (Durable Async) |
276+
|---------|------------------------------|--------------------------|
277+
| **Execution Duration** | Up to 1 year | Up to 1 year |
278+
| **Checkpointing** | Automatic for each step | Automatic for each step |
279+
| **Wait Charges** | No charges during waits | N/A (returns immediately) |
280+
| **Polling** | Durable waits between checks | No polling needed |
281+
| **Task Awareness** | Task doesn't know about Lambda | Task updates DynamoDB |
282+
| **Complexity** | Moderate (durable steps + waits) | Moderate (durable steps + DynamoDB) |
283+
| **Use Case** | Long-running tasks needing results | Fire-and-forget workflows |
284+
| **Cost** | Pay only for active execution | Minimal (quick execution) |
285+
| **Result Retrieval** | Returned by function | Query DynamoDB |
286+
| **Reliability** | Automatic recovery from failures | Guaranteed task initiation |
287+
288+
## Benefits of Lambda Durable Functions
289+
290+
Compared to standard Lambda functions:
291+
292+
**Extended Duration**: Execute for up to 1 year (vs 15 minutes)
293+
**Cost Optimization**: No charges during wait operations
294+
**Automatic Recovery**: Built-in checkpointing and replay
295+
**Simplified Code**: No manual state management needed
296+
**Reliable Execution**: Guaranteed progress despite interruptions
297+
**Built-in Retries**: Automatic retry logic for steps
298+
299+
## Cleanup
300+
301+
To delete the resources:
302+
303+
```bash
304+
sam delete
305+
```
306+
307+
## Resources
308+
309+
- [AWS Lambda Durable Functions](https://docs.aws.amazon.com/lambda/latest/dg/durable-functions.html)
310+
- [Durable Execution SDK](https://docs.aws.amazon.com/lambda/latest/dg/durable-execution-sdk.html)
311+
- [AWS Lambda](https://aws.amazon.com/lambda/)
312+
- [Amazon ECS](https://aws.amazon.com/ecs/)
313+
- [Amazon DynamoDB](https://aws.amazon.com/dynamodb/)
314+
- [ECS RunTask API](https://docs.aws.amazon.com/AmazonECS/latest/APIReference/API_RunTask.html)
315+
316+
---
317+
318+
Copyright 2024 Amazon.com, Inc. or its affiliates. All Rights Reserved.
319+
320+
SPDX-License-Identifier: MIT-0
Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
{
2+
"title": "AWS Lambda Durable Functions to Amazon ECS with Python",
3+
"description": "Invoke ECS tasks from Lambda Durable Functions with automatic checkpointing, state management, and resilient execution patterns",
4+
"language": "Python",
5+
"level": "300",
6+
"framework": "SAM",
7+
"introBox": {
8+
"headline": "How it works",
9+
"text": [
10+
"This pattern demonstrates AWS Lambda Durable Functions invoking Amazon ECS tasks with resilient, long-running execution capabilities:",
11+
"1. Durable Synchronous Pattern: Lambda uses checkpointed steps and durable waits to poll ECS task status. Can run for up to 1 year with automatic recovery from failures. No compute charges during wait periods.",
12+
"2. Durable Callback Pattern: Lambda uses checkpointed steps to reliably initiate ECS tasks. Each step (create record, start task, update status) is automatically checkpointed for guaranteed execution.",
13+
"The pattern uses the AWS Durable Execution SDK for Python, providing automatic state management, checkpoint-based recovery, and cost-effective long-running workflows. Includes inline Python code in ECS containers, VPC networking, and DynamoDB for callback tracking."
14+
]
15+
},
16+
"gitHub": {
17+
"template": {
18+
"repoURL": "https://github.com/aws-samples/serverless-patterns/tree/main/lambda-ecs-python-sam",
19+
"templateURL": "serverless-patterns/lambda-ecs-python-sam",
20+
"projectFolder": "lambda-ecs-python-sam",
21+
"templateFile": "template.yaml"
22+
}
23+
},
24+
"resources": {
25+
"bullets": [
26+
{
27+
"text": "Lambda Durable Functions",
28+
"link": "https://docs.aws.amazon.com/lambda/latest/dg/durable-functions.html"
29+
},
30+
{
31+
"text": "Durable Execution SDK",
32+
"link": "https://docs.aws.amazon.com/lambda/latest/dg/durable-execution-sdk.html"
33+
},
34+
{
35+
"text": "Run Amazon ECS or Fargate tasks",
36+
"link": "https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs_run_task.html"
37+
},
38+
{
39+
"text": "Amazon ECS Task Definitions",
40+
"link": "https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_definitions.html"
41+
}
42+
]
43+
},
44+
"deploy": {
45+
"text": [
46+
"sam build",
47+
"sam deploy --guided"
48+
]
49+
},
50+
"testing": {
51+
"text": [
52+
"See the GitHub repo for detailed testing instructions."
53+
]
54+
},
55+
"cleanup": {
56+
"text": [
57+
"Delete the stack: <code>sam delete</code>"
58+
]
59+
},
60+
"authors": [
61+
{
62+
"name": "Mian Tariq",
63+
"image": "",
64+
"bio": "Senior Delivery Consultant",
65+
"linkedin": ""
66+
}
67+
]
68+
}

0 commit comments

Comments
 (0)