Skip to content

Commit 9745536

Browse files
authored
Merge pull request #2996 from marcojahn/s3-lambda-textract-bedrock-durable-cdk-ts
added s3-lambda-textract-bedrock-durable-cdk-ts pattern
2 parents 58c0135 + 86f9b10 commit 9745536

11 files changed

Lines changed: 829 additions & 0 deletions
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
!jest.config.js
2+
*.d.ts
3+
node_modules
4+
package-lock.json
5+
6+
# CDK asset staging directory
7+
.cdk.staging
8+
cdk.out
9+
10+
# Parcel default cache directory
11+
.parcel-cache
12+
13+
# Mac files
14+
.DS_Store
Lines changed: 163 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,163 @@
1+
# Durable Document Processing with Amazon S3, AWS Lambda, Amazon Textract, and Amazon Bedrock
2+
3+
This pattern demonstrates a durable document processing pipeline. When a document is uploaded to Amazon S3, a durable AWS Lambda function extracts text using Amazon Textract's asynchronous API, summarizes the content with Amazon Bedrock (Amazon Nova Lite), and stores the results in Amazon DynamoDB. The durable function uses checkpointing and `waitForCondition` to reliably poll for the Textract job completion without wasting compute, and automatically resumes from the last checkpoint if interrupted.
4+
5+
Learn more about this pattern at Serverless Land Patterns: [https://serverlessland.com/patterns/s3-lambda-textract-bedrock-durable-cdk-ts](https://serverlessland.com/patterns/s3-lambda-textract-bedrock-durable-cdk-ts)
6+
7+
Important: this application uses various AWS services and there are costs associated with these services after the Free Tier usage - please see the [AWS Pricing page](https://aws.amazon.com/pricing/) for details. You are responsible for any AWS costs incurred. No warranty is implied in this example.
8+
9+
## Requirements
10+
11+
* [Create an AWS account](https://portal.aws.amazon.com/gp/aws/developer/registration/index.html) if you do not already have one and log in. The IAM user that you use must have sufficient permissions to make necessary AWS service calls and manage AWS resources.
12+
* [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html) installed and configured
13+
* [Git Installed](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git)
14+
* [Node.js and npm](https://nodejs.org/) installed
15+
* [AWS CDK](https://docs.aws.amazon.com/cdk/latest/guide/getting_started.html) installed
16+
* [Amazon Bedrock model access](https://docs.aws.amazon.com/bedrock/latest/userguide/model-access.html) enabled for Amazon Nova Lite in your AWS region
17+
18+
## Important: Bedrock Inference Profile Region Prefix
19+
20+
The Lambda handler uses a cross-region inference profile ID to invoke Amazon Nova Lite. The profile ID is region-specific:
21+
22+
| Region | Inference Profile ID |
23+
|--------|---------------------|
24+
| US regions (us-east-1, us-west-2, etc.) | `us.amazon.nova-lite-v1:0` |
25+
| EU regions (eu-central-1, eu-west-1, etc.) | `eu.amazon.nova-lite-v1:0` |
26+
| AP regions (ap-southeast-1, ap-northeast-1, etc.) | `ap.amazon.nova-lite-v1:0` |
27+
28+
The default in this pattern is `eu.amazon.nova-lite-v1:0`. If deploying to a different region, update the `modelId` in `lambda/processor.js` and the inference profile ARN in `lib/pattern-stack.ts`.
29+
30+
## Deployment Instructions
31+
32+
1. Create a new directory, navigate to that directory in a terminal and clone the GitHub repository:
33+
34+
```bash
35+
git clone https://github.com/aws-samples/serverless-patterns
36+
```
37+
38+
2. Change directory to the pattern directory:
39+
40+
```bash
41+
cd s3-lambda-textract-bedrock-durable-cdk-ts
42+
```
43+
44+
3. Install dependencies:
45+
46+
```bash
47+
npm install
48+
```
49+
50+
4. Deploy the CDK stack to your default AWS account and region:
51+
52+
```bash
53+
cdk deploy
54+
```
55+
56+
5. Note the outputs from the CDK deployment process. These contain the resource names which are used for testing.
57+
58+
## Deployment Outputs
59+
60+
After deployment, CDK will display the following outputs. Save these values for testing:
61+
62+
| Output Key | Description | Usage |
63+
|------------|-------------|-------|
64+
| `DocumentBucketName` | Amazon S3 bucket for document uploads | Upload documents here to trigger processing |
65+
| `ResultsTableName` | Amazon DynamoDB table for processing results | Query this table to see extracted text and summaries |
66+
| `ProcessorFunctionName` | Durable AWS Lambda function name | Use for monitoring and log inspection |
67+
| `ProcessorFunctionArn` | Durable AWS Lambda function ARN | Reference for invocation and permissions |
68+
69+
## How it works
70+
71+
![Architecture Diagram](s3-lambda-textract-bedrock-durable-cdk-ts.png)
72+
73+
This pattern creates a durable AWS Lambda function that implements a multi-step document processing pipeline with automatic checkpointing and resilient polling.
74+
75+
Architecture flow:
76+
1. A document (PDF, PNG, or JPG) is uploaded to the Amazon S3 document bucket
77+
2. Amazon S3 sends an event notification that triggers the durable AWS Lambda function
78+
3. The function starts an asynchronous Amazon Textract text detection job (Step 1: `start-textract`)
79+
4. The function polls for Textract job completion using `waitForCondition` with exponential backoff (Step 2: `wait-textract-complete`) — the function suspends between polls without consuming compute
80+
5. Once Textract completes, the function extracts text from the response blocks (Step 3: `extract-text`)
81+
6. The extracted text is sent to Amazon Bedrock (Amazon Nova Lite) for summarization (Step 4: `bedrock-summarize`)
82+
7. The summary and metadata are stored in Amazon DynamoDB (Step 5: `store-results`)
83+
84+
Each step is checkpointed by the durable execution runtime. If the function is interrupted at any point (timeout, transient error), it resumes from the last completed step rather than starting over.
85+
86+
Example use cases:
87+
- **Invoices**: Extract line items, amounts, and vendor details automatically
88+
- **Contracts**: Identify key clauses, obligations, and renewal dates
89+
- **Insurance documents**: Digitize forms and extract policy information
90+
- **Compliance reports**: Flag non-compliant sections or missing fields
91+
92+
## Testing
93+
94+
### Upload a Test Document
95+
96+
1. Get the S3 bucket name from the stack outputs:
97+
98+
```bash
99+
BUCKET_NAME=$(aws cloudformation describe-stacks \
100+
--stack-name S3LambdaTextractBedrockDurableStack \
101+
--query 'Stacks[0].Outputs[?OutputKey==`DocumentBucketName`].OutputValue' \
102+
--output text)
103+
```
104+
105+
2. Upload a PDF, PNG, or JPG document:
106+
107+
```bash
108+
aws s3 cp your-document.pdf s3://$BUCKET_NAME/
109+
```
110+
111+
3. The durable Lambda function is triggered automatically. You can monitor progress in CloudWatch Logs:
112+
113+
```bash
114+
FUNCTION_NAME=$(aws cloudformation describe-stacks \
115+
--stack-name S3LambdaTextractBedrockDurableStack \
116+
--query 'Stacks[0].Outputs[?OutputKey==`ProcessorFunctionName`].OutputValue' \
117+
--output text)
118+
119+
aws logs tail /aws/lambda/$FUNCTION_NAME --follow
120+
```
121+
122+
### Check Processing Results
123+
124+
1. Get the DynamoDB table name:
125+
126+
```bash
127+
TABLE_NAME=$(aws cloudformation describe-stacks \
128+
--stack-name S3LambdaTextractBedrockDurableStack \
129+
--query 'Stacks[0].Outputs[?OutputKey==`ResultsTableName`].OutputValue' \
130+
--output text)
131+
```
132+
133+
2. Scan the table for results (allow 1-2 minutes for processing to complete):
134+
135+
```bash
136+
aws dynamodb scan --table-name $TABLE_NAME
137+
```
138+
139+
3. Query a specific document result:
140+
141+
```bash
142+
aws dynamodb get-item \
143+
--table-name $TABLE_NAME \
144+
--key '{"documentKey": {"S": "your-document.pdf"}}'
145+
```
146+
147+
The result includes the Textract job ID, extracted text length, Bedrock-generated summary, and processing timestamp.
148+
149+
## Cleanup
150+
151+
1. Empty the S3 bucket and delete the stack:
152+
153+
```bash
154+
cdk destroy
155+
```
156+
157+
2. Confirm the deletion when prompted.
158+
159+
---
160+
161+
Copyright 2026 Amazon.com, Inc. or its affiliates. All Rights Reserved.
162+
163+
SPDX-License-Identifier: MIT-0
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
#!/usr/bin/env node
2+
import * as cdk from "aws-cdk-lib";
3+
import { PatternStack } from "../lib/pattern-stack";
4+
5+
const app = new cdk.App();
6+
new PatternStack(app, "S3LambdaTextractBedrockDurableStack", {
7+
env: {
8+
account: process.env.CDK_DEFAULT_ACCOUNT,
9+
region: process.env.CDK_DEFAULT_REGION,
10+
},
11+
});
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
{
2+
"app": "npx ts-node --prefer-ts-exts bin/s3-lambda-textract-bedrock-durable-cdk-ts.ts",
3+
"watch": {
4+
"include": ["**"],
5+
"exclude": [
6+
"README.md",
7+
"cdk*.json",
8+
"**/*.d.ts",
9+
"**/*.js",
10+
"tsconfig.json",
11+
"package*.json",
12+
"yarn.lock",
13+
"node_modules",
14+
"cdk.out"
15+
]
16+
},
17+
"context": {
18+
"@aws-cdk/aws-lambda:recognizeLayerVersion": true,
19+
"@aws-cdk/core:checkSecretUsage": true,
20+
"@aws-cdk/core:target-partitions": ["aws", "aws-cn"]
21+
}
22+
}
Lines changed: 132 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,132 @@
1+
{
2+
"title": "Amazon S3 to AWS Lambda (Durable) to Amazon Textract and Bedrock",
3+
"description": "Extract text from documents with Amazon Textract and summarize with Amazon Bedrock using a durable AWS Lambda function.",
4+
"language": "TypeScript",
5+
"level": "300",
6+
"framework": "AWS CDK",
7+
"services": {
8+
"from": {
9+
"serviceName": "Amazon S3",
10+
"serviceURL": "/s3/"
11+
},
12+
"to": {
13+
"serviceName": "AWS Lambda",
14+
"serviceURL": "/lambda/"
15+
}
16+
},
17+
"patternArch": {
18+
"icon1": {
19+
"x": 10,
20+
"y": 50,
21+
"service": "s3",
22+
"label": "Amazon S3"
23+
},
24+
"icon2": {
25+
"x": 35,
26+
"y": 50,
27+
"service": "lambda",
28+
"label": "AWS Lambda (Durable)"
29+
},
30+
"icon3": {
31+
"x": 60,
32+
"y": 30,
33+
"service": "textract",
34+
"label": "Amazon Textract"
35+
},
36+
"icon4": {
37+
"x": 60,
38+
"y": 70,
39+
"service": "bedrock",
40+
"label": "Amazon Bedrock"
41+
},
42+
"icon5": {
43+
"x": 85,
44+
"y": 50,
45+
"service": "dynamodb",
46+
"label": "Amazon DynamoDB"
47+
},
48+
"line1": {
49+
"from": "icon1",
50+
"to": "icon2"
51+
},
52+
"line2": {
53+
"from": "icon2",
54+
"to": "icon3"
55+
},
56+
"line3": {
57+
"from": "icon2",
58+
"to": "icon4"
59+
},
60+
"line4": {
61+
"from": "icon2",
62+
"to": "icon5"
63+
}
64+
},
65+
"patternType": "Serverless",
66+
"introBox": {
67+
"headline": "How it works",
68+
"text": [
69+
"This pattern demonstrates a durable document processing pipeline using AWS Lambda durable functions.",
70+
"When a document (PDF, PNG, or JPG) is uploaded to Amazon S3, it triggers a durable Lambda function.",
71+
"The function starts an asynchronous Amazon Textract text detection job and polls for completion using waitForCondition with exponential backoff.",
72+
"Once text extraction completes, the extracted text is sent to Amazon Bedrock (Amazon Nova Lite) for summarization.",
73+
"Results including the summary are stored in Amazon DynamoDB.",
74+
"Durable functions provide automatic checkpointing, so if the function is interrupted during the long-running Textract polling, it resumes from the last checkpoint without re-executing completed steps.",
75+
"Example use cases: invoice processing, contract analysis, insurance document intake, and compliance review."
76+
]
77+
},
78+
"gitHub": {
79+
"template": {
80+
"repoURL": "https://github.com/aws-samples/serverless-patterns/tree/main/s3-lambda-textract-bedrock-durable-cdk-ts",
81+
"templateURL": "serverless-patterns/s3-lambda-textract-bedrock-durable-cdk-ts",
82+
"projectFolder": "s3-lambda-textract-bedrock-durable-cdk-ts",
83+
"templateFile": "lib/pattern-stack.ts"
84+
}
85+
},
86+
"resources": {
87+
"bullets": [
88+
{
89+
"text": "AWS Lambda Durable Functions Documentation",
90+
"link": "https://docs.aws.amazon.com/lambda/latest/dg/durable-functions.html"
91+
},
92+
{
93+
"text": "Amazon Textract Asynchronous Operations",
94+
"link": "https://docs.aws.amazon.com/textract/latest/dg/async.html"
95+
},
96+
{
97+
"text": "Amazon Bedrock Amazon Nova Models",
98+
"link": "https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-nova.html"
99+
},
100+
{
101+
"text": "AWS Lambda Durable Execution SDK for JavaScript",
102+
"link": "https://github.com/aws/aws-durable-execution-sdk-js"
103+
}
104+
]
105+
},
106+
"deploy": {
107+
"text": [
108+
"Clone the repository: <code>git clone https://github.com/aws-samples/serverless-patterns</code>",
109+
"Change directory: <code>cd s3-lambda-textract-bedrock-durable-cdk-ts</code>",
110+
"Install dependencies: <code>npm install</code>",
111+
"Deploy the CDK stack: <code>cdk deploy</code>"
112+
]
113+
},
114+
"testing": {
115+
"text": [
116+
"Get the S3 bucket name: <code>BUCKET_NAME=$(aws cloudformation describe-stacks --stack-name S3LambdaTextractBedrockDurableStack --query 'Stacks[0].Outputs[?OutputKey==`DocumentBucketName`].OutputValue' --output text)</code>",
117+
"Upload a test document: <code>aws s3 cp test-document.pdf s3://$BUCKET_NAME/</code>",
118+
"Check DynamoDB for results: <code>TABLE_NAME=$(aws cloudformation describe-stacks --stack-name S3LambdaTextractBedrockDurableStack --query 'Stacks[0].Outputs[?OutputKey==`ResultsTableName`].OutputValue' --output text) && aws dynamodb scan --table-name $TABLE_NAME</code>"
119+
]
120+
},
121+
"cleanup": {
122+
"text": ["Delete the stack: <code>cdk destroy</code>"]
123+
},
124+
"authors": [
125+
{
126+
"name": "Marco Jahn",
127+
"image": "https://sessionize.com/image/e99b-400o400o2-pqR4BacUSzHrq4fgZ4wwEQ.png",
128+
"bio": "Senior Solutions Architect, Amazon Web Services",
129+
"linkedin": "marcojahn"
130+
}
131+
]
132+
}

0 commit comments

Comments
 (0)