Skip to content

Commit fdf8046

Browse files
robertsLandoclaude
andauthored
feat(sea): add per-file compression to SEA archive (closes #250) (#251)
* feat(sea): add per-file compression to SEA archive (--compress Brotli/GZip/Zstd) Extends the existing --compress flag to enhanced SEA mode, matching what Standard mode has had for years. Each file in the SEA archive is compressed independently with gzip / brotli / zstd and decompressed lazily at first fs.readFileSync() / require(), so the cold-start cost is proportional to the files actually read — not the full archive. Measured on claude-code@1.0.100 (node22-linux-x64): 194 MB → 152 MB with --compress Zstd (41 MB saved, no measurable startup regression), and 194 MB → 147 MB with --compress Brotli (~3 min build). Closes most of the size gap between SEA-mode binaries and competitors like Bun. - lib/compress_type.ts: add Zstd = 3 - lib/index.ts: accept "Zstd"/"zs" at --compress; refuse --compress for simple SEA mode (no walker → nothing to compress) - lib/producer.ts: wire Zstd compressor into Standard-mode producer too, so the flag is consistent across modes - lib/sea-assets.ts: compress each entry during archive write; record manifest.compression = numeric CompressType; keep stats[key].size as the uncompressed length so fs.statSync() reports the real file size - lib/sea.ts, lib/types.ts: thread doCompress through seaEnhanced() - prelude/bootstrap.js: add Zstd branch to payloadFile/payloadFileSync - prelude/sea-vfs-setup.js: pick a decompressor once at SEAProvider construction; decompress on first read, cache the result in _fileCache - test/test-93-sea-compress: build the same fixture with None/GZip/ Brotli/Zstd (Zstd gated on zlib.zstdCompressSync availability) and assert every packaged binary prints identical output - docs: update compression.md, sea-mode.md, sea-vs-standard.md, ARCHITECTURE.md, and vs-bun-deno.md with the new feature and the re-measured claude-code numbers Closes #250 * docs(vs-bun-deno): note trimmed-Node build as a path to further shrink SEA binaries Binary-size gap to Bun isn't all archive — ~30 MB of the remaining delta is full-ICU in the stock Node binary pkg-fetch ships. Spell out that ./configure --without-intl --without-inspector --without-npm --without-corepack --fully-static (a pkg-fetch concern, not a pkg one) would close most of what's left. * docs(vs-bun-deno): update startup times with fresh first-run measurements Re-ran all four pkg --sea variants on the same host with consistent methodology (first-run, cold ~/.cache/pkg, /usr/bin/time -f %e for ./binary --version). Bun/Deno rows are unchanged from the morning run. - None: 979 → 610 ms - GZip: 590 ms (new) - Zstd: 560 ms (new) - Brotli: 590 ms (new) Compression adds ≤0 ms vs uncompressed on this workload because claude-code's --version path only touches a handful of files, so the sync zlib/zstd decode cost is dwarfed by the startup savings from the smaller archive being memory-mapped. * docs(vs-bun-deno): re-measure all three runtimes side by side Ran pkg --sea (4 codecs), bun --compile, bun --compile --bytecode, and deno compile on the same host with matching methodology (fresh fixture, cold ~/.cache/pkg, /usr/bin/time -f %e for ./bin --version first run): Bun 510 ms (108 MB) Bun --bytecode 530 ms (190 MB) pkg --sea 560 ms (194 MB) pkg --sea --zstd 570 ms (152 MB) pkg --sea --gzip 580 ms (154 MB) pkg --sea --brotli 590 ms (147 MB) Deno 740 ms (183 MB) The previous numbers (797 Bun / 1256 Deno / 979 pkg) were measured on a different run/method, not apples-to-apples. These six are. Bun is still fastest and smallest; pkg SEA with compression is within ~60 ms of Bun while shipping stock Node.js; Deno is the slowest starter on this workload. Narrative paragraphs updated to match. * refactor(sea): harden compression paths, unify codec picker, restore streaming Security / correctness: - prelude/sea-vfs-setup.js: cap per-file decompression via maxOutputLength and assert decompressed length matches manifest stats.size; use Number.isInteger for offset/length/size bounds (rejects NaN and non-integer floats that the prior typeof-number guard let through). - lib/sea-assets.ts: synthesize a stats entry for records that had STORE_CONTENT but no STORE_STAT, so every compressed stripe has an authoritative size for the runtime to cross-check against. Make resolveCompressor exhaustive — a new CompressType without a matching case now fails the build instead of shipping an archive that claims compression but contains raw bytes. Performance: - lib/sea-assets.ts: restore createReadStream path for unmodified disk-resident files; the prior always-readFileAsync forced peak RSS to grow with total asset size even when compression was disabled. - Resolve the decompressor/compressor exactly once per path: at module load in prelude/bootstrap.js, at SEAProvider construction in sea-vfs-setup.js, before the stripe loop in sea-assets.ts, and before Multistream in producer.ts. Fails fast when the runtime is missing a Zstd API instead of mid-stripe. Skips _fileCache entirely for uncompressed archives so archive subarrays aren't pinned unnecessarily. DRY / surface: - prelude/bootstrap-shared.js: single source of truth for COMPRESS_* constants, pickDecompressorSync/Async, and a context-aware zstdMissingError (build-host vs end-user remediation). Classical bootstrap and SEA VFS both consume it; the local zlib require in bootstrap.js is gone. - lib/compress_type.ts: getZstdCompressSync / getZstdCompressStream replace the duplicated 'zlib as unknown as { ... }' casts in producer.ts and sea-assets.ts and emit a single build-error string (now also includes process.version). - lib/help.ts: add Zstd to the --compress description and examples. - lib/index.ts: the 'invalid compression algorithm' error now lists the real accepted tokens (None/none, Brotli/br, GZip/gz/gzip, Zstd/zs/zstd); the compression banner goes through log.info instead of console.log. Tests: - test/test-93-sea-compress: assert each compressed binary is at least 50 KB smaller than the None build so a silent fallback to uncompressed fails the test (the prior byte-equality check couldn't detect that regression). - test/test-80-compression: cover --compress Zstd in the classical pipeline (lib/producer.ts and prelude/bootstrap.js zstd branches) when zlib.createZstdCompress is available on the build host. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(sea): prune dead code and optimize VFS hot paths Dead code: - Drop SeaAssetsResult.entryIsESM — seaEnhanced destructures only { assets, manifestPath } and the value is read via manifest.entryIsESM at runtime, so the return-shape field was carrying a stale copy. - Drop the 'syscall' parameter from SEAProvider._resolveSymlink: all five callers pass only the path, and ELOOP is rare enough that hardcoding err.syscall = 'stat' is fine. - Drop the 'context' parameter from pickDecompressorSync/Async and merge zstdMissingError into a single runtime-wording string: only 'runtime' was ever passed (build-side Zstd errors go through lib/compress_type.ts's own zstdBuildError). - Drop unused COMPRESS_GZIP/BROTLI/ZSTD exports from bootstrap-shared — callers now go through pickDecompressor and only COMPRESS_NONE is read directly by sea-vfs-setup. - Remove the redundant process.argv[1] = entrypoint assignment in sea-bootstrap.js; sea-bootstrap-core.js already sets it to the same value. - Inline the single-use ZSTD_MISSING_BUILD_REMEDIATION constant. Hot paths (~30K lookups per startup on large projects): - toManifestKey: skip the backslash→slash regex on POSIX hosts where paths already match the manifest shape; keep the replace on win32 where it's mandatory. - _resolveSymlink: short-circuit before entering the MAX_SYMLINK_DEPTH loop when the path isn't a symlink key (the common case). Comments: - sea-assets.ts: rename the Zstd-resolution rationale to point at zstdBuildError, which is where the wording now lives. - bootstrap-shared.js: tighten the COMPRESS_NONE comment now that only it is exported. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(sea-compress): fix Windows CI failure from CRLF in payload.txt Root cause: payload.txt starts with 0x0a; on a Windows checkout git's autocrlf converted it to 0x0d 0x0a, so PAYLOAD.slice(0, 32) contained a leading \r\n that survived in `expected` but got stripped from `actual` via the existing replace(/\r\n/g, '\n'), causing the equality assertion to fail across every Windows job. Fix: - Add .gitattributes so payload.txt is checked out LF on every platform; the SEA archive bytes are now deterministic cross-platform, which also keeps the compressed-size assertion stable. - Normalize CRLF in `expected` as defense-in-depth so an existing Windows clone (cloned before .gitattributes landed) still passes the test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(sea): drop redundant post-decompress size assert Per review feedback on PR #251: an attacker who can rewrite the SEA blob can also rewrite `manifest.stats[p].size` to match the payload they ship, so the post-decompression `buf.length === expected` check does not survive a consistent tamper — it only fires on accidental corruption, which is a narrow and unlikely case. Keep `maxOutputLength`: it bounds the zlib allocation up front so a blob with a plausible-but-inflated manifest can't request unbounded memory before we discover the size mismatch. That bound is cheap and standard Node zlib hygiene. Also keep the `stats.size` validation: `maxOutputLength` requires a finite integer, so NaN / negative / missing values must still be rejected before reaching zlib. Tightened the comment to reflect the actual threat model (bounded allocation vs. tamper detection) instead of the earlier bomb-defense framing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 0cf75a8 commit fdf8046

22 files changed

Lines changed: 568 additions & 117 deletions

docs-site/guide/compression.md

Lines changed: 29 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,26 +1,30 @@
11
---
22
title: Compression
3-
description: Shrink the embedded filesystem inside your pkg binary with Brotli or GZip.
3+
description: Shrink the embedded filesystem inside your pkg binary with Brotli, GZip or Zstd.
44
---
55

66
# Compression
77

8-
Pass `--compress Brotli` or `--compress GZip` to compress the contents of files stored in the executable. `-C` is a shortcut for `--compress`.
8+
Pass `--compress Brotli`, `--compress GZip`, or `--compress Zstd` to compress the contents of files stored in the executable. `-C` is a shortcut for `--compress`.
99

1010
::: code-group
1111

12-
```sh [Brotli (smaller)]
12+
```sh [Zstd (best balance)]
13+
pkg --compress Zstd index.js
14+
```
15+
16+
```sh [Brotli (smallest)]
1317
pkg --compress Brotli index.js
1418
```
1519

16-
```sh [GZip (faster to decompress)]
20+
```sh [GZip (widely compatible)]
1721
pkg -C GZip index.js
1822
```
1923

2024
```json [package.json]
2125
{
2226
"pkg": {
23-
"compress": "Brotli"
27+
"compress": "Zstd"
2428
}
2529
}
2630
```
@@ -29,24 +33,33 @@ pkg -C GZip index.js
2933

3034
## How much does it save?
3135

32-
This option can reduce the size of the embedded filesystem by up to **60%**. The exact ratio depends on your project — heavy JavaScript (libraries with long variable names) compresses well, already-minified code less so.
36+
This option can reduce the size of the embedded filesystem by **60-70%** on typical Node.js projects. The exact ratio depends on your project — heavy JavaScript (libraries with long variable names) compresses well, already-minified code less so.
3337

34-
The startup time of the application may actually be **slightly reduced** — smaller disk reads often outweigh the decompression cost.
38+
The startup time of the application may actually be **slightly reduced** — smaller disk reads often outweigh the decompression cost, especially with Zstd.
3539

36-
## Brotli vs GZip
40+
## Choosing an algorithm
3741

38-
| Algorithm | Compression ratio | Decompression speed | Use when |
39-
| --------- | ----------------- | ------------------- | ------------------------------- |
40-
| Brotli | Higher | Slower | Binary size matters most |
41-
| GZip | Lower | Faster | Cold-start latency matters most |
42+
| Algorithm | Compression ratio | Decompression speed | Use when |
43+
| --------- | ----------------- | ------------------- | --------------------------------------------------- |
44+
| Brotli | Highest | Slowest | Binary size is the only thing that matters |
45+
| Zstd | High | Very fast | Balanced default — small binary and fast cold start |
46+
| GZip | Lower | Fast | Older Node.js runtimes without Zstd support |
4247

43-
For most CLI tools, Brotli is the better default. For long-running services where the extra MB or two doesn't matter, GZip shaves a few ms off startup.
48+
For most projects, **Zstd** is the best default — near-Brotli ratios with GZip-class decompression speed.
49+
50+
::: info Zstd availability
51+
Zstd uses `node:zlib`'s `zstdCompress` / `zstdDecompress`, which were added in **Node.js 22.15.0**. The build host and the packaged Node runtime must both support it. Use Brotli if you need to target older Node 22.x releases.
52+
:::
4453

4554
## SEA mode
4655

47-
::: warning Not supported in SEA mode
48-
Compression is **not** available when packaging with `--sea`. The SEA binary layout uses a flat blob without per-file compression. If binary size is critical, stick with Standard mode. See [SEA vs Standard](/guide/sea-vs-standard).
49-
:::
56+
Compression also works with `--sea`. The SEA archive is compressed per-file at build time and decompressed lazily on first `fs.readFileSync` / `require()` at runtime, so only files you actually access pay the decompression cost.
57+
58+
```sh
59+
pkg --sea --compress Zstd index.js
60+
```
61+
62+
This closes most of the size gap between SEA-mode and Standard-mode binaries without a measurable cold-start regression for typical CLIs.
5063

5164
## Troubleshooting
5265

docs-site/guide/sea-mode.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -96,7 +96,7 @@ Not supported in enhanced SEA mode (incompatible with the VFS bootstrap). Set it
9696

9797
## Trade-offs vs Standard mode
9898

99-
Enhanced SEA builds faster and uses **official Node.js APIs**, but stores source in plaintext and skips compression. Workers, native addons, ESM, cross-compile and targets all work the same.
99+
Enhanced SEA builds faster and uses **official Node.js APIs**. Per-file compression (`--compress Brotli` / `GZip` / `Zstd`) is supported and closes most of the size gap with Standard mode. Workers, native addons, ESM, cross-compile and targets all work the same.
100100

101101
For the full feature matrix and decision guide, see **[SEA vs Standard](/guide/sea-vs-standard)**.
102102

docs-site/guide/sea-vs-standard.md

Lines changed: 14 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ description: The full comparison between Standard mode (patched Node.js, bytecod
1313
**SEA mode** runs on **stock, unmodified Node.js**. No patches. No waiting for `pkg-fetch` to catch up. Security fixes and new Node versions are available the moment Node.js itself releases them.
1414
:::
1515

16-
Everything else — compression, bytecode, worker threads, native addons — flows from that one decision.
16+
Everything else — bytecode, worker threads, native addons, bundling strategy — flows from that one decision.
1717

1818
## Why stock binaries matter
1919

@@ -25,26 +25,25 @@ Everything else — compression, bytecode, worker threads, native addons — flo
2525

2626
## Feature matrix
2727

28-
| Feature | **Standard** | **Enhanced SEA** |
29-
| ------------------------------- | ---------------------------------------------------------------------- | ------------------------ |
30-
| **Node.js binary** | Custom patched (`pkg-fetch`) | **Stock Node.js**|
31-
| Source protection (V8 bytecode) || ❌ plaintext |
32-
| Compression (Brotli / GZip) || |
33-
| Build speed | Slower | **Faster** |
34-
| Cross-compile | ⚠️ broken on Node 22 ([see](/guide/targets#cross-compilation-support)) ||
35-
| Worker threads |||
36-
| Native addons |||
37-
| ESM + top-level await | Partial | ✅ every target |
38-
| Maintenance burden | High — patch each Node release | **Low — stock binaries** |
39-
| Security updates | Wait for `pkg-fetch` rebuild | **Immediate** |
40-
| Future path | Tied to `pkg-fetch` | Migrates to `node:vfs` |
28+
| Feature | **Standard** | **Enhanced SEA** |
29+
| ---------------------------------- | ---------------------------------------------------------------------- | ------------------------ |
30+
| **Node.js binary** | Custom patched (`pkg-fetch`) | **Stock Node.js**|
31+
| Source protection (V8 bytecode) || ❌ plaintext |
32+
| Compression (Brotli / GZip / Zstd) || |
33+
| Build speed | Slower | **Faster** |
34+
| Cross-compile | ⚠️ broken on Node 22 ([see](/guide/targets#cross-compilation-support)) ||
35+
| Worker threads |||
36+
| Native addons |||
37+
| ESM + top-level await | Partial | ✅ every target |
38+
| Maintenance burden | High — patch each Node release | **Low — stock binaries** |
39+
| Security updates | Wait for `pkg-fetch` rebuild | **Immediate** |
40+
| Future path | Tied to `pkg-fetch` | Migrates to `node:vfs` |
4141

4242
## When to pick which
4343

4444
Pick **Standard** when:
4545

4646
- You need **source protection** — your IP must not ship as plaintext JavaScript.
47-
- You need **compression** — binary size matters more than build speed.
4847

4948
Pick **SEA** when:
5049

0 commit comments

Comments
 (0)