Skip to content

Commit 25fd15e

Browse files
committed
feat: Add arm64 support, default chunk size to 512 MB, extract filename from URL fallback
1 parent 1fa6ee2 commit 25fd15e

5 files changed

Lines changed: 9 additions & 10 deletions

File tree

lambda-s3-download/README.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ The Lambda function:
4545
4646
1. Receives a download URL and filename via the event payload
4747
2. Initiates an S3 multipart upload with SHA256 checksums
48-
3. Streams the file from the URL in chunks (default 128 MB), writing each chunk to `/tmp` and uploading it as a multipart part
48+
3. Streams the file from the URL in chunks (default 512 MB), writing each chunk to `/tmp` and uploading it as a multipart part
4949
4. Cleans up each chunk from `/tmp` after uploading to stay within the 10 GB ephemeral storage limit
5050
5. Completes the multipart upload and returns the S3 object checksum
5151
6. If any step fails, aborts the multipart upload to avoid orphaned parts
@@ -73,12 +73,13 @@ Optional event parameters:
7373
|---|---|---|
7474
| `target_bucket` | S3 bucket name (overrides the deployed parameter) | Value from template parameter |
7575
| `target_bucket_region` | S3 bucket region | Lambda's region |
76-
| `chunk_size_mb` | Size of each download chunk in MB (clamped between 5 and 5120) | 128 |
76+
| `chunk_size_mb` | Size of each download chunk in MB (clamped between 5 and 5120) | 512 |
7777

7878
## Known Limitations
7979

8080
- The Lambda function has a 15-minute maximum timeout. If the download and upload combined take longer than that, the function will be killed mid-stream and the multipart upload will be left incomplete. Consider setting an [S3 lifecycle rule](https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpu-abort-incomplete-mpu-lifecycle-config.html) on the target bucket to auto-clean incomplete multipart uploads.
8181
- The `download_filename` should be a flat filename (e.g. `file.zip`). If it contains slashes (e.g. `path/to/file.zip`), the temporary file path in `/tmp` will include subdirectories that may not exist, causing a write failure.
82+
- The maximum downloadable file size is limited by the 15-minute Lambda timeout, not by S3 (which supports up to 5 TB via multipart upload with 10,000 parts). In practice, Lambda can usually download roughly 55-110 GB in 15 minutes depending on network speed between Lambda and the source URL, so your mileage may vary. At the default chunk size of 512 MB, the 10,000 parts limit allows up to ~5 TB.
8283

8384
## Cleanup
8485

lambda-s3-download/example-pattern.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
"headline": "How it works",
99
"text": [
1010
"This pattern deploys a Lambda function that streams a file from a URL and uploads it to an S3 bucket using multipart upload.",
11-
"The file is downloaded in configurable chunks (default 128 MB, clamped between 5 MB and 5 GB) and written to /tmp before being uploaded as individual parts. Each chunk is cleaned up from /tmp after upload, allowing the function to handle files larger than Lambda's memory or ephemeral storage limits.",
11+
"The file is downloaded in configurable chunks (default 512 MB, clamped between 5 MB and 5 GB) and written to /tmp before being uploaded as individual parts. Each chunk is cleaned up from /tmp after upload, allowing the function to handle files larger than Lambda's memory or ephemeral storage limits.",
1212
"SHA256 checksums are calculated for each part and verified on completion. If any step fails, the multipart upload is automatically aborted to avoid orphaned parts."
1313
]
1414
},

lambda-s3-download/src/app.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
import json
44
import os
55
from pathlib import Path
6+
from urllib.parse import urlparse
67

78

89
def lambda_handler(event, context):
@@ -11,11 +12,11 @@ def lambda_handler(event, context):
1112
target_bucket_region = event.get("target_bucket_region", os.environ.get("AWS_REGION"))
1213

1314
download_url = event["download_url"]
14-
download_filename = event["download_filename"]
15+
download_filename = event.get("download_filename", urlparse(download_url).path.split("/")[-1])
1516

1617
# Cap chunk size under 5 GB to be inside S3 max part size and not exhaust max Lambda memory
1718
# Floor chunk size at 5 MB to fit the S3 minimum part size
18-
chunk_size_mb = min(max(int(event.get("chunk_size_mb", 128)), 5), 5120)
19+
chunk_size_mb = min(max(int(event.get("chunk_size_mb", 512)), 5), 5120)
1920

2021
# open a multipart s3 upload request.
2122
s3 = boto3.client("s3", region_name = target_bucket_region)
Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1 @@
1-
boto3
2-
json
3-
os
4-
Path
51
requests
6-

lambda-s3-download/template.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,8 @@ Resources:
1414
Handler: app.lambda_handler
1515
Runtime: python3.12
1616
CodeUri: src/
17+
Architectures:
18+
- arm64
1719
Timeout: 900
1820
MemorySize: 1024
1921
EphemeralStorage:

0 commit comments

Comments
 (0)