Skip to content

Commit 6664e54

Browse files
authored
feat(lambda): add SAM template and sample events for AWS deployment (#907)
* feat(lambda): add SAM template and sample events for AWS deployment Phase 6.2 of the distributed rendering plan (DISTRIBUTED-RENDERING-PLAN.md §15). Reference SAM template for deploying HyperFrames distributed rendering on AWS — one Lambda function in three roles, choreographed by a Step Functions standard workflow with a Map state for parallel chunk rendering. Resources created by the template: - Lambda function pointing at the Phase 6.1 ZIP - Step Functions state machine: Plan -> Map(N) RenderChunk -> Assemble - S3 bucket for plan tarballs, chunk outputs, final mp4 - IAM role for the state machine - CloudWatch alarm guarding against runaway chunk invocations Retry policy: 4 attempts, 2s initial, 2x backoff, max 60s, with the typed non-retryable error codes from plan §9.3 explicitly opted out. CodeUri points at packages/aws-lambda/dist/handler.zip; sam deploy resolves the local path and uploads to a SAM-managed bucket on first deploy. Validated: sam validate --lint passes against the template. This is part of the 8-PR Phase 6 stack; PR 6.2 of 8. * fix(lambda): address PR 879 review feedback - Add CloudWatch alarms for Lambda Errors metric (5min window, threshold 1) and Step Functions ExecutionsFailed metric. The existing runaway- invocations alarm catches too-many-calls but missed silent per-chunk failures and retry-exhaustion. - Document VersioningConfiguration: Suspended tradeoff inline. Adopters treating the final mp4 as user-keepable should bump to Enabled. - Cost-allocation Tags on RenderBucket + Lambda Globals. - Lambda Tracing: Active so X-Ray spans don't terminate at the SF→Lambda boundary (the state machine already had tracing). - State-machine top-level TimeoutSeconds: 3600 as defensive ceiling on the whole choreography — catches Plan-retry storms before they hit individual task budgets. - AssertChunkCount Choice state: if Plan ever returns ChunkCount=0 the Map would silently iterate zero times and Assemble would receive an empty ChunkS3Uris[] producing an empty output. Fail-fast with typed PLAN_TOO_LARGE error instead. - Architecture comment: explicit x86_64-only constraint from @sparticuz/chromium so adopters trying Graviton don't get bitten. * docs(lambda): drop internal plan-doc refs from SAM example + template
1 parent c50f59a commit 6664e54

7 files changed

Lines changed: 713 additions & 1 deletion

File tree

.gitignore

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,10 @@ packages/producer/src/services/fontData.generated.ts
6767
# Local proof / test artifacts
6868
qa-artifacts/
6969
my-video/
70-
examples/
70+
examples/*
71+
# Tracked OSS examples — negations override the blanket `examples/*` ignore.
72+
!examples/aws-lambda
73+
!examples/aws-lambda/**
7174
packages/studio/data/
7275
.desloppify/
7376
.worktrees/

examples/aws-lambda/.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# SAM CLI state — written by `sam deploy --guided`, contains user choices.
2+
samconfig.toml
3+
.aws-sam/

examples/aws-lambda/README.md

Lines changed: 168 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,168 @@
1+
# AWS Lambda + Step Functions deployment
2+
3+
Reference SAM template for deploying HyperFrames distributed rendering on
4+
AWS. One Lambda function, three roles (Plan / RenderChunk / Assemble),
5+
choreographed by a Step Functions standard workflow with a Map state for
6+
parallel chunk rendering.
7+
8+
See [`packages/aws-lambda/README.md`](../../packages/aws-lambda/README.md)
9+
for the Lambda handler architecture.
10+
11+
## Prerequisites
12+
13+
- AWS account with IAM permissions to deploy CloudFormation stacks
14+
containing Lambda, Step Functions, S3, IAM, and CloudWatch resources.
15+
- [`sam` CLI](https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/install-sam-cli.html)
16+
installed (≥ 1.100).
17+
- [`bun`](https://bun.sh) installed (≥ 1.3) to build the handler ZIP.
18+
19+
## One-shot deploy
20+
21+
```bash
22+
# 1. Build the handler ZIP that `template.yaml`'s CodeUri points at.
23+
bun install # at repo root
24+
bun run --cwd packages/aws-lambda build:zip
25+
26+
# 2. Deploy. First time: `--guided` to set stack name + region.
27+
cd examples/aws-lambda
28+
sam deploy --guided --resolve-s3
29+
```
30+
31+
`--resolve-s3` lets SAM pick (or create) a per-account bucket to host the
32+
uploaded ZIP. After the first deploy, subsequent updates can omit
33+
`--guided` and `--resolve-s3` — SAM remembers your choices in
34+
`samconfig.toml`.
35+
36+
## What gets created
37+
38+
| Resource | Purpose |
39+
| ---------------------------------------- | -------------------------------------------------------------------------------------------------- |
40+
| `Render Lambda` | Single function, handler `handler.handler`. Dispatches on `event.Action`. |
41+
| `Render State Machine` | Step Functions standard workflow. Plan → Map(N) RenderChunk → Assemble. |
42+
| `Render Bucket` | S3 bucket for plan tarballs, chunk outputs, and final mp4. `renders/` prefix expires after 7 days. |
43+
| IAM role for the state machine | Invokes the Lambda; writes CloudWatch logs; X-Ray traces. |
44+
| IAM role for the Lambda (managed by SAM) | S3 CRUD on the render bucket; CloudWatch logs. |
45+
| Runaway-invocation alarm | Fires if RenderChunk runs more than `ChunkInvocationAlarmThreshold` times in an hour. |
46+
47+
## Running a render
48+
49+
Upload your project as a zip to the render bucket, then start a Step
50+
Functions execution:
51+
52+
```bash
53+
STACK_NAME=hyperframes-render # whatever you picked at deploy
54+
RENDER_BUCKET=$(aws cloudformation describe-stacks \
55+
--stack-name "$STACK_NAME" \
56+
--query 'Stacks[0].Outputs[?OutputKey==`RenderBucketName`].OutputValue' \
57+
--output text)
58+
STATE_MACHINE_ARN=$(aws cloudformation describe-stacks \
59+
--stack-name "$STACK_NAME" \
60+
--query 'Stacks[0].Outputs[?OutputKey==`RenderStateMachineArn`].OutputValue' \
61+
--output text)
62+
63+
# Tar + upload the project directory. The handler uses `tar` (not
64+
# `unzip`, which Lambda's base image doesn't ship), so the on-the-wire
65+
# archive format is `.tar.gz`.
66+
tar -czf my-project.tar.gz -C ./my-project .
67+
aws s3 cp my-project.tar.gz "s3://${RENDER_BUCKET}/projects/my-project.tar.gz"
68+
69+
# Start the execution. The input JSON tells the state machine where to
70+
# read inputs and write outputs.
71+
aws stepfunctions start-execution \
72+
--state-machine-arn "$STATE_MACHINE_ARN" \
73+
--input "$(cat <<EOF
74+
{
75+
"ProjectS3Uri": "s3://${RENDER_BUCKET}/projects/my-project.tar.gz",
76+
"PlanOutputS3Prefix": "s3://${RENDER_BUCKET}/renders/$(date +%s)/",
77+
"OutputS3Uri": "s3://${RENDER_BUCKET}/output.mp4",
78+
"Config": {
79+
"fps": 30,
80+
"width": 1920,
81+
"height": 1080,
82+
"format": "mp4",
83+
"chunkSize": 240,
84+
"maxParallelChunks": 8,
85+
"runtimeCap": "lambda"
86+
}
87+
}
88+
EOF
89+
)"
90+
```
91+
92+
The Step Functions execution kicks off Plan, fans out RenderChunk via
93+
the Map state, and finally Assemble. Final mp4 lands at `OutputS3Uri`.
94+
95+
## Local invocation
96+
97+
You can test the Lambda handler without deploying anything via SAM
98+
local:
99+
100+
```bash
101+
# Build the ZIP first.
102+
bun run --cwd packages/aws-lambda build:zip
103+
104+
# Launch a local Lambda runtime emulator and run a sample plan event.
105+
cd examples/aws-lambda
106+
sam validate
107+
sam local invoke RenderFunction --event sample-events/plan.json
108+
```
109+
110+
The `sample-events/` directory ships small JSON payloads for each of the
111+
three actions. They reference fake S3 URIs — useful for sanity-checking
112+
the handler's dispatch logic; not for full end-to-end testing (real S3
113+
calls require credentials and a project zip to actually exist).
114+
115+
## Parameters
116+
117+
| Parameter | Default | Notes |
118+
| ------------------------------- | ------------- | ----------------------------------------------------------------------------------------------- |
119+
| `ProjectName` | `hyperframes` | Prefix for created resource names. |
120+
| `LambdaMemoryMb` | `10240` | Lambda memory; Lambda allocates CPU proportionally. 10 GB recommended for 1080p. |
121+
| `LambdaTimeoutSec` | `900` | Per-invocation timeout. 15 min is Lambda's hard ceiling. |
122+
| `ReservedConcurrency` | `-1` | Hard cap on simultaneous Lambda invocations. `-1` = unreserved. Set to e.g. `50` to bound cost. |
123+
| `ChromeSource` | `sparticuz` | Must match the `--source=` flag passed to `build-zip.ts`. |
124+
| `ChunkInvocationAlarmThreshold` | `1000` | CloudWatch alarm threshold (RenderChunk invocations per hour). |
125+
126+
## Cleanup
127+
128+
```bash
129+
sam delete --stack-name hyperframes-render
130+
```
131+
132+
S3 buckets are `Retain`ed on delete to protect rendered artifacts.
133+
Empty + delete the bucket manually after `sam delete` if you want to
134+
fully tear down.
135+
136+
## Cost model
137+
138+
| Service | Driver | Approximate cost |
139+
| ----------------------- | --------------------------------------- | -------------------------------------------------------------- |
140+
| Lambda | Per-invocation billed duration × memory | ≈ $0.0000167/GB-s; a 10 GB function running 5 min costs ~$0.50 |
141+
| Step Functions Standard | Per state transition | $0.025/1k transitions |
142+
| S3 | Storage + GET/PUT | Dominated by mp4 storage; plan tarballs expire in 7 days |
143+
| CloudWatch Logs | Ingestion + storage | Logs are not throttled; set retention manually if cost matters |
144+
145+
A 60-second 1080p30 composition at default chunkSize=240 (8 chunks)
146+
typically costs ~$0.04 in Lambda time + ~$0.001 in Step Functions.
147+
The eval script under `scripts/eval.sh` produces real per-fixture cost
148+
numbers when you run it against your own AWS account.
149+
150+
## Troubleshooting
151+
152+
- **"Chrome failed to launch"** — the ZIP was likely built with the wrong
153+
`--source`. Match `ChromeSource` to the build flag.
154+
- **"PLAN_HASH_MISMATCH"** — non-retryable. The plan tarball was written
155+
by a different version of the producer than the chunk worker is
156+
running. Re-plan from scratch.
157+
- **"BROWSER_GPU_NOT_SOFTWARE"** — Chromium fell back to a hardware GL
158+
backend. Should not happen in Lambda (no GPU); file an issue.
159+
- **CloudWatch alarm firing on `runaway-chunk-invocations`** — check
160+
the state machine execution history for an unintended Map fan-out, or
161+
raise the threshold if your workload genuinely exceeds it.
162+
163+
## What's NOT in this directory
164+
165+
- CDK construct shipping the same topology programmatically — follow-up.
166+
- `hyperframes lambda deploy / render / progress / destroy` CLI — follow-up.
167+
- Migration guide — follow-up.
168+
- Lambda RIE local smoke harness mode — follow-up.
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
{
2+
"Action": "assemble",
3+
"PlanS3Uri": "s3://example-bucket/renders/sample/plan.tar.gz",
4+
"ChunkS3Uris": [
5+
"s3://example-bucket/renders/sample/chunks/0000.mp4",
6+
"s3://example-bucket/renders/sample/chunks/0001.mp4",
7+
"s3://example-bucket/renders/sample/chunks/0002.mp4",
8+
"s3://example-bucket/renders/sample/chunks/0003.mp4"
9+
],
10+
"AudioS3Uri": null,
11+
"OutputS3Uri": "s3://example-bucket/renders/sample/output.mp4",
12+
"Format": "mp4"
13+
}
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
{
2+
"Action": "plan",
3+
"ProjectS3Uri": "s3://example-bucket/projects/sample-composition.tar.gz",
4+
"PlanOutputS3Prefix": "s3://example-bucket/renders/sample/",
5+
"Config": {
6+
"fps": 30,
7+
"width": 1920,
8+
"height": 1080,
9+
"format": "mp4",
10+
"chunkSize": 240,
11+
"maxParallelChunks": 8,
12+
"runtimeCap": "lambda"
13+
}
14+
}
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
{
2+
"Action": "renderChunk",
3+
"PlanS3Uri": "s3://example-bucket/renders/sample/plan.tar.gz",
4+
"PlanHash": "0000000000000000000000000000000000000000000000000000000000000000",
5+
"ChunkIndex": 0,
6+
"ChunkOutputS3Prefix": "s3://example-bucket/renders/sample/",
7+
"Format": "mp4"
8+
}

0 commit comments

Comments
 (0)