|
| 1 | +# Durable Document Processing with Amazon S3, AWS Lambda, Amazon Textract, and Amazon Bedrock |
| 2 | + |
| 3 | +This pattern demonstrates a durable document processing pipeline. When a document is uploaded to Amazon S3, a durable AWS Lambda function extracts text using Amazon Textract's asynchronous API, summarizes the content with Amazon Bedrock (Amazon Nova Lite), and stores the results in Amazon DynamoDB. The durable function uses checkpointing and `waitForCondition` to reliably poll for the Textract job completion without wasting compute, and automatically resumes from the last checkpoint if interrupted. |
| 4 | + |
| 5 | +Learn more about this pattern at Serverless Land Patterns: [https://serverlessland.com/patterns/s3-lambda-textract-bedrock-durable-cdk-ts](https://serverlessland.com/patterns/s3-lambda-textract-bedrock-durable-cdk-ts) |
| 6 | + |
| 7 | +Important: this application uses various AWS services and there are costs associated with these services after the Free Tier usage - please see the [AWS Pricing page](https://aws.amazon.com/pricing/) for details. You are responsible for any AWS costs incurred. No warranty is implied in this example. |
| 8 | + |
| 9 | +## Requirements |
| 10 | + |
| 11 | +* [Create an AWS account](https://portal.aws.amazon.com/gp/aws/developer/registration/index.html) if you do not already have one and log in. The IAM user that you use must have sufficient permissions to make necessary AWS service calls and manage AWS resources. |
| 12 | +* [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html) installed and configured |
| 13 | +* [Git Installed](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git) |
| 14 | +* [Node.js and npm](https://nodejs.org/) installed |
| 15 | +* [AWS CDK](https://docs.aws.amazon.com/cdk/latest/guide/getting_started.html) installed |
| 16 | +* [Amazon Bedrock model access](https://docs.aws.amazon.com/bedrock/latest/userguide/model-access.html) enabled for Amazon Nova Lite in your AWS region |
| 17 | + |
| 18 | +## Important: Bedrock Inference Profile Region Prefix |
| 19 | + |
| 20 | +The Lambda handler uses a cross-region inference profile ID to invoke Amazon Nova Lite. The profile ID is region-specific: |
| 21 | + |
| 22 | +| Region | Inference Profile ID | |
| 23 | +|--------|---------------------| |
| 24 | +| US regions (us-east-1, us-west-2, etc.) | `us.amazon.nova-lite-v1:0` | |
| 25 | +| EU regions (eu-central-1, eu-west-1, etc.) | `eu.amazon.nova-lite-v1:0` | |
| 26 | +| AP regions (ap-southeast-1, ap-northeast-1, etc.) | `ap.amazon.nova-lite-v1:0` | |
| 27 | + |
| 28 | +The default in this pattern is `eu.amazon.nova-lite-v1:0`. If deploying to a different region, update the `modelId` in `lambda/processor.js` and the inference profile ARN in `lib/pattern-stack.ts`. |
| 29 | + |
| 30 | +## Deployment Instructions |
| 31 | + |
| 32 | +1. Create a new directory, navigate to that directory in a terminal and clone the GitHub repository: |
| 33 | + |
| 34 | + ```bash |
| 35 | + git clone https://github.com/aws-samples/serverless-patterns |
| 36 | + ``` |
| 37 | + |
| 38 | +2. Change directory to the pattern directory: |
| 39 | + |
| 40 | + ```bash |
| 41 | + cd s3-lambda-textract-bedrock-durable-cdk-ts |
| 42 | + ``` |
| 43 | + |
| 44 | +3. Install dependencies: |
| 45 | + |
| 46 | + ```bash |
| 47 | + npm install |
| 48 | + ``` |
| 49 | + |
| 50 | +4. Deploy the CDK stack to your default AWS account and region: |
| 51 | + |
| 52 | + ```bash |
| 53 | + cdk deploy |
| 54 | + ``` |
| 55 | + |
| 56 | +5. Note the outputs from the CDK deployment process. These contain the resource names which are used for testing. |
| 57 | + |
| 58 | +## Deployment Outputs |
| 59 | + |
| 60 | +After deployment, CDK will display the following outputs. Save these values for testing: |
| 61 | + |
| 62 | +| Output Key | Description | Usage | |
| 63 | +|------------|-------------|-------| |
| 64 | +| `DocumentBucketName` | Amazon S3 bucket for document uploads | Upload documents here to trigger processing | |
| 65 | +| `ResultsTableName` | Amazon DynamoDB table for processing results | Query this table to see extracted text and summaries | |
| 66 | +| `ProcessorFunctionName` | Durable AWS Lambda function name | Use for monitoring and log inspection | |
| 67 | +| `ProcessorFunctionArn` | Durable AWS Lambda function ARN | Reference for invocation and permissions | |
| 68 | + |
| 69 | +## How it works |
| 70 | + |
| 71 | + |
| 72 | + |
| 73 | +This pattern creates a durable AWS Lambda function that implements a multi-step document processing pipeline with automatic checkpointing and resilient polling. |
| 74 | + |
| 75 | +Architecture flow: |
| 76 | +1. A document (PDF, PNG, or JPG) is uploaded to the Amazon S3 document bucket |
| 77 | +2. Amazon S3 sends an event notification that triggers the durable AWS Lambda function |
| 78 | +3. The function starts an asynchronous Amazon Textract text detection job (Step 1: `start-textract`) |
| 79 | +4. The function polls for Textract job completion using `waitForCondition` with exponential backoff (Step 2: `wait-textract-complete`) — the function suspends between polls without consuming compute |
| 80 | +5. Once Textract completes, the function extracts text from the response blocks (Step 3: `extract-text`) |
| 81 | +6. The extracted text is sent to Amazon Bedrock (Amazon Nova Lite) for summarization (Step 4: `bedrock-summarize`) |
| 82 | +7. The summary and metadata are stored in Amazon DynamoDB (Step 5: `store-results`) |
| 83 | + |
| 84 | +Each step is checkpointed by the durable execution runtime. If the function is interrupted at any point (timeout, transient error), it resumes from the last completed step rather than starting over. |
| 85 | + |
| 86 | +Example use cases: |
| 87 | +- **Invoices**: Extract line items, amounts, and vendor details automatically |
| 88 | +- **Contracts**: Identify key clauses, obligations, and renewal dates |
| 89 | +- **Insurance documents**: Digitize forms and extract policy information |
| 90 | +- **Compliance reports**: Flag non-compliant sections or missing fields |
| 91 | + |
| 92 | +## Testing |
| 93 | + |
| 94 | +### Upload a Test Document |
| 95 | + |
| 96 | +1. Get the S3 bucket name from the stack outputs: |
| 97 | + |
| 98 | + ```bash |
| 99 | + BUCKET_NAME=$(aws cloudformation describe-stacks \ |
| 100 | + --stack-name S3LambdaTextractBedrockDurableStack \ |
| 101 | + --query 'Stacks[0].Outputs[?OutputKey==`DocumentBucketName`].OutputValue' \ |
| 102 | + --output text) |
| 103 | + ``` |
| 104 | + |
| 105 | +2. Upload a PDF, PNG, or JPG document: |
| 106 | + |
| 107 | + ```bash |
| 108 | + aws s3 cp your-document.pdf s3://$BUCKET_NAME/ |
| 109 | + ``` |
| 110 | + |
| 111 | +3. The durable Lambda function is triggered automatically. You can monitor progress in CloudWatch Logs: |
| 112 | + |
| 113 | + ```bash |
| 114 | + FUNCTION_NAME=$(aws cloudformation describe-stacks \ |
| 115 | + --stack-name S3LambdaTextractBedrockDurableStack \ |
| 116 | + --query 'Stacks[0].Outputs[?OutputKey==`ProcessorFunctionName`].OutputValue' \ |
| 117 | + --output text) |
| 118 | +
|
| 119 | + aws logs tail /aws/lambda/$FUNCTION_NAME --follow |
| 120 | + ``` |
| 121 | + |
| 122 | +### Check Processing Results |
| 123 | + |
| 124 | +1. Get the DynamoDB table name: |
| 125 | + |
| 126 | + ```bash |
| 127 | + TABLE_NAME=$(aws cloudformation describe-stacks \ |
| 128 | + --stack-name S3LambdaTextractBedrockDurableStack \ |
| 129 | + --query 'Stacks[0].Outputs[?OutputKey==`ResultsTableName`].OutputValue' \ |
| 130 | + --output text) |
| 131 | + ``` |
| 132 | + |
| 133 | +2. Scan the table for results (allow 1-2 minutes for processing to complete): |
| 134 | + |
| 135 | + ```bash |
| 136 | + aws dynamodb scan --table-name $TABLE_NAME |
| 137 | + ``` |
| 138 | + |
| 139 | +3. Query a specific document result: |
| 140 | + |
| 141 | + ```bash |
| 142 | + aws dynamodb get-item \ |
| 143 | + --table-name $TABLE_NAME \ |
| 144 | + --key '{"documentKey": {"S": "your-document.pdf"}}' |
| 145 | + ``` |
| 146 | + |
| 147 | +The result includes the Textract job ID, extracted text length, Bedrock-generated summary, and processing timestamp. |
| 148 | + |
| 149 | +## Cleanup |
| 150 | + |
| 151 | +1. Empty the S3 bucket and delete the stack: |
| 152 | + |
| 153 | + ```bash |
| 154 | + cdk destroy |
| 155 | + ``` |
| 156 | + |
| 157 | +2. Confirm the deletion when prompted. |
| 158 | + |
| 159 | +--- |
| 160 | + |
| 161 | +Copyright 2026 Amazon.com, Inc. or its affiliates. All Rights Reserved. |
| 162 | + |
| 163 | +SPDX-License-Identifier: MIT-0 |
0 commit comments