Skip to content

Commit 1fa6ee2

Browse files
committed
Add lambda-streaming-download-s3 pattern
1 parent fbb7016 commit 1fa6ee2

5 files changed

Lines changed: 259 additions & 0 deletions

File tree

lambda-s3-download/README.md

Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
# Lambda S3 Download
2+
3+
This pattern deploys a Lambda function that downloads a file from a URL and uploads it to an S3 bucket using multipart upload. It streams the file in configurable chunks through `/tmp`, making it capable of handling files larger than Lambda's memory and storage limits.
4+
5+
Important: this application uses various AWS services and there are costs associated with these services after the Free Tier usage - please see the [AWS Pricing page](https://aws.amazon.com/pricing/) for details. You are responsible for any AWS costs incurred. No warranty is implied in this example.
6+
7+
## Requirements
8+
9+
* [Create an AWS account](https://portal.aws.amazon.com/gp/aws/developer/registration/index.html) if you do not already have one and log in. The IAM user that you use must have sufficient permissions to make necessary AWS service calls and manage AWS resources.
10+
* [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html) installed and configured
11+
* [Git Installed](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git)
12+
* [AWS Serverless Application Model](https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/serverless-sam-cli-install.html) (AWS SAM) installed
13+
14+
## Deployment Instructions
15+
16+
1. Create a new directory, navigate to that directory in a terminal and clone the GitHub repository:
17+
```
18+
git clone https://github.com/aws-samples/serverless-patterns
19+
```
20+
1. Change directory to the pattern directory:
21+
```
22+
cd serverless-patterns/lambda-s3-download
23+
```
24+
1. Build the application:
25+
```
26+
sam build
27+
```
28+
1. Deploy the application:
29+
```
30+
sam deploy --guided
31+
```
32+
1. During the prompts:
33+
* Enter a stack name
34+
* Enter the desired AWS Region
35+
* Enter the target S3 bucket name (the bucket must already exist)
36+
* Allow SAM CLI to create IAM roles with the required permissions
37+
38+
Once you have run `sam deploy --guided` mode once and saved arguments to a configuration file (samconfig.toml), you can use `sam deploy` in future to use these defaults.
39+
40+
1. Note the outputs from the SAM deployment process. These contain the resource names and/or ARNs which are used for testing.
41+
42+
## How it works
43+
44+
The Lambda function:
45+
46+
1. Receives a download URL and filename via the event payload
47+
2. Initiates an S3 multipart upload with SHA256 checksums
48+
3. Streams the file from the URL in chunks (default 128 MB), writing each chunk to `/tmp` and uploading it as a multipart part
49+
4. Cleans up each chunk from `/tmp` after uploading to stay within the 10 GB ephemeral storage limit
50+
5. Completes the multipart upload and returns the S3 object checksum
51+
6. If any step fails, aborts the multipart upload to avoid orphaned parts
52+
53+
The function is configured with a 15-minute timeout, 1 GB memory, and 10 GB ephemeral storage.
54+
55+
## Testing
56+
57+
Invoke the Lambda function with a test event:
58+
59+
```bash
60+
aws lambda invoke \
61+
--function-name FUNCTION_NAME \
62+
--cli-binary-format raw-in-base64-out \
63+
--payload '{
64+
"download_url": "https://example.com/file.zip",
65+
"download_filename": "file.zip"
66+
}' \
67+
response.json
68+
```
69+
70+
Optional event parameters:
71+
72+
| Parameter | Description | Default |
73+
|---|---|---|
74+
| `target_bucket` | S3 bucket name (overrides the deployed parameter) | Value from template parameter |
75+
| `target_bucket_region` | S3 bucket region | Lambda's region |
76+
| `chunk_size_mb` | Size of each download chunk in MB (clamped between 5 and 5120) | 128 |
77+
78+
## Known Limitations
79+
80+
- The Lambda function has a 15-minute maximum timeout. If the download and upload combined take longer than that, the function will be killed mid-stream and the multipart upload will be left incomplete. Consider setting an [S3 lifecycle rule](https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpu-abort-incomplete-mpu-lifecycle-config.html) on the target bucket to auto-clean incomplete multipart uploads.
81+
- The `download_filename` should be a flat filename (e.g. `file.zip`). If it contains slashes (e.g. `path/to/file.zip`), the temporary file path in `/tmp` will include subdirectories that may not exist, causing a write failure.
82+
83+
## Cleanup
84+
85+
1. Delete the stack
86+
```bash
87+
aws cloudformation delete-stack --stack-name STACK_NAME
88+
```
89+
1. Confirm the stack has been deleted
90+
```bash
91+
aws cloudformation list-stacks --query "StackSummaries[?contains(StackName,'STACK_NAME')].StackStatus"
92+
```
93+
----
94+
Copyright 2025 Amazon.com, Inc. or its affiliates. All Rights Reserved.
95+
96+
SPDX-License-Identifier: MIT-0
Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
{
2+
"title": "Lambda S3 Download",
3+
"description": "A Lambda function that downloads a file from a URL and uploads it to S3 using multipart upload with SHA256 checksums.",
4+
"language": "Python",
5+
"level": "300",
6+
"framework": "SAM",
7+
"introBox": {
8+
"headline": "How it works",
9+
"text": [
10+
"This pattern deploys a Lambda function that streams a file from a URL and uploads it to an S3 bucket using multipart upload.",
11+
"The file is downloaded in configurable chunks (default 128 MB, clamped between 5 MB and 5 GB) and written to /tmp before being uploaded as individual parts. Each chunk is cleaned up from /tmp after upload, allowing the function to handle files larger than Lambda's memory or ephemeral storage limits.",
12+
"SHA256 checksums are calculated for each part and verified on completion. If any step fails, the multipart upload is automatically aborted to avoid orphaned parts."
13+
]
14+
},
15+
"gitHub": {
16+
"template": {
17+
"repoURL": "https://github.com/aws-samples/serverless-patterns/tree/main/lambda-s3-download",
18+
"templateURL": "serverless-patterns/lambda-s3-download",
19+
"projectFolder": "lambda-s3-download",
20+
"templateFile": "template.yaml"
21+
}
22+
},
23+
"resources": {
24+
"bullets": [
25+
{
26+
"text": "S3 Multipart Upload Overview",
27+
"link": "https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpuoverview.html"
28+
},
29+
{
30+
"text": "AWS Lambda - Configuring Ephemeral Storage",
31+
"link": "https://docs.aws.amazon.com/lambda/latest/dg/configuration-ephemeral-storage.html"
32+
}
33+
]
34+
},
35+
"deploy": {
36+
"text": [
37+
"sam build",
38+
"sam deploy --guided"
39+
]
40+
},
41+
"testing": {
42+
"text": [
43+
"See the GitHub repo for detailed testing instructions."
44+
]
45+
},
46+
"cleanup": {
47+
"text": [
48+
"Delete the stack: <code>aws cloudformation delete-stack --stack-name STACK_NAME</code>."
49+
]
50+
},
51+
"authors": [
52+
{
53+
"name": "Robert Meyer",
54+
"image": "https://serverlessland.com/assets/images/resources/contributors/ext-robert-meyer.jpg",
55+
"bio": "Robert is a Partner Solutions Architect with AWS in EMEA.",
56+
"linkedin": "https://www.linkedin.com/in/robert-meyer-phd-6a114a58/",
57+
"twitter": "@robl_on_tour"
58+
}
59+
]
60+
}

lambda-s3-download/src/app.py

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
import requests
2+
import boto3
3+
import json
4+
import os
5+
from pathlib import Path
6+
7+
8+
def lambda_handler(event, context):
9+
10+
target_bucket = event.get("target_bucket", os.environ["TARGET_BUCKET"])
11+
target_bucket_region = event.get("target_bucket_region", os.environ.get("AWS_REGION"))
12+
13+
download_url = event["download_url"]
14+
download_filename = event["download_filename"]
15+
16+
# Cap chunk size under 5 GB to be inside S3 max part size and not exhaust max Lambda memory
17+
# Floor chunk size at 5 MB to fit the S3 minimum part size
18+
chunk_size_mb = min(max(int(event.get("chunk_size_mb", 128)), 5), 5120)
19+
20+
# open a multipart s3 upload request.
21+
s3 = boto3.client("s3", region_name = target_bucket_region)
22+
upload_request = s3.create_multipart_upload(Bucket=target_bucket, Key=download_filename, ChecksumAlgorithm="SHA256")
23+
upload_id = upload_request["UploadId"]
24+
part_number = 0
25+
parts = []
26+
27+
try:
28+
with requests.get(download_url, stream=True) as download_request:
29+
30+
for chunk in download_request.iter_content(chunk_size=chunk_size_mb*1024*1024):
31+
part_number = part_number + 1
32+
download_target = Path("/tmp", download_filename + "_" + str(part_number))
33+
34+
with download_target.open('wb') as download_file:
35+
download_file.write(chunk)
36+
download_file.close()
37+
38+
with download_target.open('rb') as download_file:
39+
part_upload = s3.upload_part(Body=download_file, Bucket=target_bucket, Key=download_filename, PartNumber=part_number, UploadId=upload_id, ChecksumAlgorithm="SHA256")
40+
parts.append({'ETag': part_upload['ETag'], 'ChecksumSHA256': part_upload['ChecksumSHA256'], 'PartNumber': part_number})
41+
download_file.close()
42+
43+
download_target.unlink()
44+
45+
s3.complete_multipart_upload(Bucket=target_bucket, Key=download_filename, MultipartUpload={'Parts': parts}, UploadId=upload_id)
46+
objectSummary = s3.get_object_attributes(Bucket=target_bucket,Key=download_filename, ObjectAttributes=['Checksum'])
47+
48+
return {
49+
"statusCode": 200,
50+
"body": json.dumps({
51+
"message": f"{download_filename} uploaded successfully",
52+
"bucket": target_bucket,
53+
"key": download_filename,
54+
"checksum_sha256": objectSummary["Checksum"]["ChecksumSHA256"],
55+
"parts": len(parts)
56+
})
57+
}
58+
59+
except Exception as e:
60+
s3.abort_multipart_upload(Bucket=target_bucket, Key=download_filename, UploadId=upload_id)
61+
return {
62+
"statusCode": 500,
63+
"body": json.dumps({"message": f"Download/Upload failed: {str(e)}"})
64+
}
65+
66+
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
boto3
2+
json
3+
os
4+
Path
5+
requests
6+

lambda-s3-download/template.yaml

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
AWSTemplateFormatVersion: '2010-09-09'
2+
Transform: AWS::Serverless-2016-10-31
3+
Description: Lambda function that downloads a file from a URL and uploads it to S3 using multipart upload
4+
5+
Parameters:
6+
TargetBucketName:
7+
Type: String
8+
Description: Name of the S3 bucket to upload files to
9+
10+
Resources:
11+
DownloadFunction:
12+
Type: AWS::Serverless::Function
13+
Properties:
14+
Handler: app.lambda_handler
15+
Runtime: python3.12
16+
CodeUri: src/
17+
Timeout: 900
18+
MemorySize: 1024
19+
EphemeralStorage:
20+
Size: 10240
21+
Environment:
22+
Variables:
23+
TARGET_BUCKET: !Ref TargetBucketName
24+
Policies:
25+
- S3CrudPolicy:
26+
BucketName: !Ref TargetBucketName
27+
28+
Outputs:
29+
DownloadFunctionArn:
30+
Description: Lambda function ARN
31+
Value: !GetAtt DownloadFunction.Arn

0 commit comments

Comments
 (0)