Skip to content

Commit c8000a1

Browse files
committed
fix(lambda): address PR 878 review feedback
- Verify event.PlanHash against the untarred plan.json at the handler boundary before invoking the producer primitive. Throws typed PLAN_HASH_MISMATCH on divergence so Step Functions routes it as non-retryable; previously the field was schema bloat the handler ignored, leaving enforcement entirely inside the producer. - Standardize on MiB throughout build-zip.ts, verify-zip-size.ts, and the README. Lambda's hard ceiling is 250 MiB (AWS docs label "250 MB" but use binary mebibytes); previously mixed units made the 248 MiB budget look like a ~5 MB margin instead of the 2 MiB it actually is. - stageChromeHeadlessShell now picks Chrome versions via numeric semver comparison instead of lexicographic sort+reverse — the latter would silently pick "99.x" over "131.x" once Chrome cached three-digit majors that aren't width-aligned. - Drop _setSparticuzChromiumForTests from the public index barrel. Test-only DI seam imported directly from ./chromium.js in tests. - Replace require("node:fs") inside walkSize() with the top-level fs imports — file is ESM and the same module is already imported.
1 parent b2bf830 commit c8000a1

7 files changed

Lines changed: 157 additions & 33 deletions

File tree

packages/aws-lambda/README.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -49,10 +49,10 @@ inside Step Functions' history budget (under 200 bytes per chunk).
4949

5050
The package supports two Chromium sources:
5151

52-
| Source | Default | Size | When to pick it |
53-
| ------------------------------- | ------- | ----------------- | --------------------------------------------------------------------------------------------------------------------- |
54-
| `@sparticuz/chromium` | yes | ~70 MB compressed | Lambda. Decompresses into `/tmp` at runtime; the rest of the ecosystem already uses it for headless-Chrome-in-Lambda. |
55-
| Bundled `chrome-headless-shell` | no | ~140 MB | Fallback. Used if `@sparticuz/chromium` ever drops `HeadlessExperimental.beginFrame` support. |
52+
| Source | Default | Size | When to pick it |
53+
| ------------------------------- | ------- | ------------------ | --------------------------------------------------------------------------------------------------------------------- |
54+
| `@sparticuz/chromium` | yes | ~70 MiB compressed | Lambda. Decompresses into `/tmp` at runtime; the rest of the ecosystem already uses it for headless-Chrome-in-Lambda. |
55+
| Bundled `chrome-headless-shell` | no | ~140 MiB | Fallback. Used if `@sparticuz/chromium` ever drops `HeadlessExperimental.beginFrame` support. |
5656

5757
Pick the source at build time:
5858

@@ -98,8 +98,8 @@ designed to extract cleanly into Lambda's `/var/task/`.
9898

9999
`verify:zip-size` enforces:
100100

101-
- Unzipped ≤ 248 MB (in-house budget; Lambda hard ceiling is 250 MB unzipped)
102-
- Zipped ≤ 150 MB (in-house budget; Lambda has no hard zipped cap for S3-deployed functions)
101+
- Unzipped ≤ 248 MiB (in-house budget; Lambda hard ceiling is 250 MiB unzipped — AWS docs label this "250 MB" but use binary mebibytes)
102+
- Zipped ≤ 150 MiB (in-house budget; Lambda has no hard zipped cap for S3-deployed functions)
103103

104104
CI fails the PR if either is exceeded.
105105

packages/aws-lambda/scripts/build-zip.ts

Lines changed: 46 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@ import {
3535
cpSync,
3636
existsSync,
3737
mkdirSync,
38+
readdirSync,
3839
readFileSync,
3940
rmSync,
4041
statSync,
@@ -52,26 +53,27 @@ const distDir = join(packageRoot, "dist");
5253

5354
interface BuildOptions {
5455
source: "sparticuz" | "chrome-headless-shell";
55-
/** Hard upper bound on the unzipped bundle size in bytes (Lambda limit is 250 MB). */
56+
/** Hard upper bound on the unzipped bundle size in bytes (Lambda limit is 250 MiB). */
5657
maxUnzippedBytes: number;
5758
/** Hard upper bound on the ZIP file size in bytes. */
5859
maxZippedBytes: number;
5960
}
6061

6162
const DEFAULT_OPTIONS: BuildOptions = {
6263
source: "sparticuz",
63-
// Lambda's hard ceiling for ZIP-deployed functions is 250 MB unzipped.
64-
// We gate at 248 MiB to keep a couple of MiB of headroom — the
65-
// sparticuz Chrome (70 MB) + ffmpeg (80 MB) + ffprobe (62 MB) +
66-
// bundled Node deps put us close to the ceiling. Chrome itself
64+
// Lambda's hard ceiling for ZIP-deployed functions is 250 MiB unzipped
65+
// (AWS docs label it "250 MB" but the 262144000-byte value is 250
66+
// binary mebibytes). We gate at 248 MiB to keep ~2 MiB of headroom —
67+
// the sparticuz Chrome (~70 MiB) + ffmpeg (~80 MiB) + ffprobe (~62
68+
// MiB) + bundled Node deps put us close to the ceiling. Chrome itself
6769
// decompresses into Lambda's `/tmp` at cold start, which has its own
68-
// 10 GB budget, so the unzipped /var/task footprint above is what
69-
// actually competes with Lambda's 250 MB limit.
70+
// 10 GiB budget, so the unzipped /var/task footprint above is what
71+
// actually competes with Lambda's 250 MiB limit.
7072
maxUnzippedBytes: 248 * 1024 * 1024,
71-
// Lambda's only zipped-size cap is for direct console/CLI uploads (50 MB);
72-
// S3-deployed functions are bounded by the unzipped ceiling. We gate at
73-
// 150 MiB to flag a sudden bundle-size regression without false-failing
74-
// on the natural ~100 MiB sparticuz + ffmpeg payload.
73+
// Lambda's only zipped-size cap is for direct console/CLI uploads (50
74+
// MiB); S3-deployed functions are bounded by the unzipped ceiling. We
75+
// gate at 150 MiB to flag a sudden bundle-size regression without
76+
// false-failing on the natural ~100 MiB sparticuz + ffmpeg payload.
7577
maxZippedBytes: 150 * 1024 * 1024,
7678
};
7779

@@ -142,7 +144,7 @@ async function main(): Promise<void> {
142144
throw new Error(
143145
`[build-zip] unzipped bundle ${formatBytes(unzippedBytes)} exceeds limit ${formatBytes(
144146
opts.maxUnzippedBytes,
145-
)} (Lambda ZIP ceiling: 250 MB unzipped). ` +
147+
)} (Lambda ZIP ceiling: 250 MiB unzipped). ` +
146148
`Switch --source to the lighter option, or move Chrome to a Lambda Layer.`,
147149
);
148150
}
@@ -221,14 +223,14 @@ async function bundleHandler(stagingDir: string): Promise<void> {
221223
"puppeteer-core",
222224
"puppeteer",
223225
// AWS SDK v3 is pre-installed in the Lambda Node 22 runtime; mark
224-
// external so we don't double-bundle 3+ MB of SDK.
226+
// external so we don't double-bundle 3+ MiB of SDK.
225227
"@aws-sdk/client-s3",
226228
],
227229
plugins: [workspaceAliasPlugin],
228230
minify: false,
229-
// sourcemap=false: the ZIP is tight on Lambda's 250 MB unzipped cap
230-
// (Chrome 70 MB + ffmpeg 80 MB + ffprobe 62 MB + Node deps). A 4-5 MB
231-
// sourcemap puts us over. Re-enable for local debugging by passing
231+
// sourcemap=false: the ZIP is tight on Lambda's 250 MiB unzipped cap
232+
// (Chrome ~70 MiB + ffmpeg ~80 MiB + ffprobe ~62 MiB + Node deps). A
233+
// 4-5 MiB sourcemap puts us over. Re-enable for local debugging by passing
232234
// --sourcemap; the bundle's stack traces stay readable enough without
233235
// it because we don't minify.
234236
sourcemap: false,
@@ -399,8 +401,11 @@ function stageChromeHeadlessShell(stagingDir: string): void {
399401
`before --source=chrome-headless-shell.`,
400402
);
401403
}
402-
const { readdirSync } = require("node:fs") as typeof import("node:fs");
403-
const versions = readdirSync(baseDir).sort().reverse();
404+
// Sort by numeric semver descending. `sort().reverse()` is lexicographic,
405+
// which silently picks "99.0.0" over "131.0.0" once Chrome ships
406+
// three-digit majors that aren't strictly width-aligned. `compareSemver`
407+
// returns negative/zero/positive on (a, b), so descending = `b - a`.
408+
const versions = readdirSync(baseDir).sort((a, b) => compareSemver(b, a));
404409
for (const v of versions) {
405410
const candidate = join(baseDir, v, "chrome-headless-shell-linux64", "chrome-headless-shell");
406411
if (existsSync(candidate)) {
@@ -415,6 +420,28 @@ function stageChromeHeadlessShell(stagingDir: string): void {
415420
throw new Error(`[build-zip] no linux64 chrome-headless-shell binary found under ${baseDir}.`);
416421
}
417422

423+
/**
424+
* Compare two semver-shaped strings like "131.0.6778.108". Treats any
425+
* non-numeric directory name as `-Infinity` so it sorts to the bottom
426+
* (Puppeteer's cache layout sometimes includes `latest` or branch tags).
427+
* Used by `stageChromeHeadlessShell` to pick the newest cached Chrome
428+
* without tripping on the lexicographic "99 > 131" trap.
429+
*/
430+
function compareSemver(a: string, b: string): number {
431+
const partsA = a.split(".").map((s) => Number.parseInt(s, 10));
432+
const partsB = b.split(".").map((s) => Number.parseInt(s, 10));
433+
const len = Math.max(partsA.length, partsB.length);
434+
for (let i = 0; i < len; i++) {
435+
const ai = partsA[i] ?? 0;
436+
const bi = partsB[i] ?? 0;
437+
if (Number.isNaN(ai) && Number.isNaN(bi)) continue;
438+
if (Number.isNaN(ai)) return -1;
439+
if (Number.isNaN(bi)) return 1;
440+
if (ai !== bi) return ai - bi;
441+
}
442+
return 0;
443+
}
444+
418445
function zipDirectory(sourceDir: string, zipPath: string): void {
419446
const result = spawnSync("zip", ["-rq", zipPath, "."], { cwd: sourceDir, stdio: "inherit" });
420447
if (result.status !== 0) {
@@ -437,12 +464,11 @@ function directorySizeBytes(dir: string): number {
437464
}
438465

439466
function walkSize(dir: string): number {
440-
const { readdirSync, statSync: stat } = require("node:fs") as typeof import("node:fs");
441467
let total = 0;
442468
for (const entry of readdirSync(dir, { withFileTypes: true })) {
443469
const full = join(dir, entry.name);
444470
if (entry.isDirectory()) total += walkSize(full);
445-
else if (entry.isFile()) total += stat(full).size;
471+
else if (entry.isFile()) total += statSync(full).size;
446472
}
447473
return total;
448474
}

packages/aws-lambda/scripts/verify-zip-size.ts

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,8 @@
55
* Reads `dist/handler.zip.manifest.json` (written by `build-zip.ts`) and
66
* exits non-zero if either the unzipped or zipped size exceeds the
77
* declared limits. Lambda's hard ceiling for ZIP-deployed functions is
8-
* 250 MB unzipped; the in-house budget is 240 MB to keep headroom for
8+
* 250 MiB unzipped (262144000 bytes — AWS docs label it "250 MB" but use
9+
* binary mebibytes); the in-house budget is 248 MiB to keep headroom for
910
* the Chrome tarball decompression that happens at cold start.
1011
*/
1112

@@ -55,15 +56,15 @@ function main(): void {
5556
if (manifest.unzippedBytes > IN_HOUSE_UNZIPPED_LIMIT) {
5657
console.error(
5758
`[verify-zip-size] FAIL unzipped: ${formatBytes(manifest.unzippedBytes)} > ` +
58-
`${formatBytes(IN_HOUSE_UNZIPPED_LIMIT)} (in-house limit; Lambda hard ceiling is 250 MB).`,
59+
`${formatBytes(IN_HOUSE_UNZIPPED_LIMIT)} (in-house limit; Lambda hard ceiling is 250 MiB).`,
5960
);
6061
failed = true;
6162
}
6263
if (actualZipped > IN_HOUSE_ZIPPED_LIMIT) {
6364
console.error(
6465
`[verify-zip-size] FAIL zipped: ${formatBytes(actualZipped)} > ` +
65-
`${formatBytes(IN_HOUSE_ZIPPED_LIMIT)} (in-house limit; Lambda direct-upload ceiling is 50 MB, ` +
66-
`S3-deploy ceiling is 250 MB).`,
66+
`${formatBytes(IN_HOUSE_ZIPPED_LIMIT)} (in-house limit; Lambda direct-upload ceiling is 50 MiB, ` +
67+
`S3-deploy ceiling is 250 MiB).`,
6768
);
6869
failed = true;
6970
}

packages/aws-lambda/src/events.ts

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,13 @@ export interface RenderChunkEvent {
5252
Action: "renderChunk";
5353
/** S3 URI of the plan tar produced by a PlanEvent invocation. */
5454
PlanS3Uri: string;
55-
/** `PlanResult.planHash` — checked against the planDir's `plan.json` post-download. */
55+
/**
56+
* `PlanResult.planHash` from the Plan invocation. The handler verifies
57+
* this against the untarred planDir's `plan.json` before invoking the
58+
* producer, throwing a typed `PLAN_HASH_MISMATCH` on divergence so the
59+
* state machine routes it as non-retryable. Defense-in-depth — the
60+
* producer also re-checks internally.
61+
*/
5662
PlanHash: string;
5763
/** 0-based chunk index this invocation should render. */
5864
ChunkIndex: number;

packages/aws-lambda/src/handler.test.ts

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -272,6 +272,56 @@ describe("handler dispatch", () => {
272272
expect(renderChunkMock).toHaveBeenCalledTimes(1);
273273
});
274274

275+
it("rejects renderChunk when event.PlanHash diverges from plan.json", async () => {
276+
const tmpRoot = makeTmpRoot();
277+
const s3 = new FakeS3Client();
278+
// The fixture's plan.json has planHash="fakehash"; the event below
279+
// claims something else, so the handler must throw PLAN_HASH_MISMATCH
280+
// before invoking the primitive.
281+
s3.objects.set("s3://bucket/plan.tar.gz", await makeMinimalPlanTar());
282+
283+
const renderChunkMock = mock(async () => {
284+
throw new Error("primitive should not be called on a hash mismatch");
285+
});
286+
const planMock = mock(async () => {
287+
throw new Error("should not be called");
288+
});
289+
const assembleMock = mock(async () => {
290+
throw new Error("should not be called");
291+
});
292+
293+
const event: RenderChunkEvent = {
294+
Action: "renderChunk",
295+
PlanS3Uri: "s3://bucket/plan.tar.gz",
296+
PlanHash: "not-the-real-hash",
297+
ChunkIndex: 0,
298+
ChunkOutputS3Prefix: "s3://bucket/renders/abc/",
299+
Format: "mp4",
300+
};
301+
302+
let caught: unknown;
303+
try {
304+
await handler(event, {
305+
s3: s3 as unknown as import("@aws-sdk/client-s3").S3Client,
306+
primitives: {
307+
plan: planMock as unknown as typeof import("@hyperframes/producer/distributed").plan,
308+
renderChunk:
309+
renderChunkMock as unknown as typeof import("@hyperframes/producer/distributed").renderChunk,
310+
assemble:
311+
assembleMock as unknown as typeof import("@hyperframes/producer/distributed").assemble,
312+
},
313+
tmpRoot,
314+
skipChromeResolution: true,
315+
});
316+
} catch (err) {
317+
caught = err;
318+
}
319+
expect(caught).toBeInstanceOf(Error);
320+
expect((caught as Error).name).toBe("PLAN_HASH_MISMATCH");
321+
expect((caught as Error).message).toMatch(/not-the-real-hash/);
322+
expect(renderChunkMock).not.toHaveBeenCalled();
323+
});
324+
275325
it("routes Action='assemble' to the assemble primitive", async () => {
276326
const tmpRoot = makeTmpRoot();
277327
const s3 = new FakeS3Client();

packages/aws-lambda/src/handler.ts

Lines changed: 40 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111
* primitive → S3 upload → return small JSON result.
1212
*/
1313

14-
import { existsSync, mkdirSync, mkdtempSync, rmSync, statSync } from "node:fs";
14+
import { existsSync, mkdirSync, mkdtempSync, readFileSync, rmSync, statSync } from "node:fs";
1515
import { tmpdir } from "node:os";
1616
import { basename, join } from "node:path";
1717
import { S3Client } from "@aws-sdk/client-s3";
@@ -315,6 +315,14 @@ async function handleRenderChunk(
315315
await downloadS3ObjectToFile(s3, event.PlanS3Uri, planTar);
316316
await untarDirectory(planTar, planDir);
317317

318+
// Verify the plan's hash matches what Step Functions told us to render.
319+
// The producer's renderChunk re-checks internally (defense-in-depth),
320+
// but doing it here at the handler boundary lets us fail before paying
321+
// the Chrome-launch + render cost on a misrouted chunk. Throws a
322+
// typed PLAN_HASH_MISMATCH that Step Functions can route as
323+
// non-retryable.
324+
verifyPlanHash(planDir, event.PlanHash);
325+
318326
const chunkOutputBase = join(
319327
work,
320328
event.Format === "png-sequence"
@@ -489,3 +497,34 @@ function cleanupDir(dir: string): void {
489497
// Best-effort — leak is preferable to crashing on success path.
490498
}
491499
}
500+
501+
/**
502+
* Read the untarred planDir's `plan.json` and assert its `planHash`
503+
* matches what the Step Functions event claims. Throws on mismatch with
504+
* a typed `PLAN_HASH_MISMATCH` error name so the state machine's typed
505+
* non-retryable list routes it correctly.
506+
*
507+
* This is defense-in-depth — the producer's `renderChunk` does the same
508+
* check internally — but performing it here lets us fail before paying
509+
* the Chrome-launch + per-frame capture cost on a misrouted chunk.
510+
*/
511+
function verifyPlanHash(planDir: string, expected: string): void {
512+
const planJsonPath = join(planDir, "plan.json");
513+
let parsed: { planHash?: unknown };
514+
try {
515+
parsed = JSON.parse(readFileSync(planJsonPath, "utf-8")) as { planHash?: unknown };
516+
} catch (err) {
517+
const msg = err instanceof Error ? err.message : String(err);
518+
const error = new Error(`PLAN_HASH_MISMATCH: failed to read ${planJsonPath}: ${msg}`);
519+
error.name = "PLAN_HASH_MISMATCH";
520+
throw error;
521+
}
522+
const actual = parsed.planHash;
523+
if (typeof actual !== "string" || actual !== expected) {
524+
const error = new Error(
525+
`PLAN_HASH_MISMATCH: event PlanHash=${expected} did not match plan.json planHash=${String(actual)}`,
526+
);
527+
error.name = "PLAN_HASH_MISMATCH";
528+
throw error;
529+
}
530+
}

packages/aws-lambda/src/index.ts

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,8 +25,10 @@ export {
2525
type RenderChunkLambdaResult,
2626
type SerializableDistributedRenderConfig,
2727
} from "./events.js";
28+
// `_setSparticuzChromiumForTests` is intentionally NOT re-exported from
29+
// the package barrel — it's a test-only DI seam. Test files import it
30+
// directly from `./chromium.js`.
2831
export {
29-
_setSparticuzChromiumForTests,
3032
type ChromeSource,
3133
resolveChromeArgs,
3234
resolveChromeExecutablePath,

0 commit comments

Comments
 (0)