feature: Compression for memfile/rootfs assets#2034
Conversation
…ning - Use header.HugepageSize for uncompressed fetch alignment (semantically correct) - Stream NFS cache hits directly into ReadFrame instead of buffering in memory - Fix timer placement to cover full GetFrame (read + decompression) - Fix onRead callback: nil for compressed inner calls (prevents double-invoke), pass through for uncompressed (bytes are final) - Remove panic recovery from runFetch (never in main) - Remove low-value chunker tests subsumed by ConcurrentStress - Remove 4MB frame configs from benchmarks (targeting 2MB only) - Remove unused readCacheFile function Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ble NFS cache
- Remove dead flagsClient chain through chunker/build/template layers (~15 files)
- Delete ChunkerConfigFlag (unused after flagsClient removal)
- Delete mock_flagsclient_test.go
- Simplify GetUploadOptions: remove redundant intOr/strOr fallbacks (flags have defaults)
- Add GetCompressionType helper to frame_table.go, deduplicate compression type extraction
- Replace [16]byte{} with uuid.Nil and "rootfs.ext4" with storage.RootfsName in inspect-build
- Simplify UploadV4Header return pattern
- Remove onRead callback from legacy fullFetchChunker (FullFetch should not use progressive reads)
- Re-enable NFS cache in template cache.go
- Remove all fmt.Printf debug instrumentation from orchestrator
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…sionType threading
Add per-build file size and SHA-256 checksum to V4 headers, eliminating
the redundant Size() network call when opening upstream data files on
the read path. Checksums are computed for free by piggybacking on
CompressStream's existing frame iteration.
Remove the separate compressionType parameter threaded through
getBuild → newStorageDiff → NewChunker; the read path now derives
compression state from the per-mapping FrameTable directly.
V4 binary format change (not yet deployed):
[Metadata] [LZ4: numBuilds, builds(uuid+size+checksum),
numMappings, mappings...]
V3 path unchanged — falls back to Size() call when size is unknown.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Merge writeFrameToCache and writeChunkToCache into unified writeToCache with lock + atomic rename, used by all three cache write paths - Fix file descriptor leak in cache hit paths: defer f.Close() and wrap in NopCloser so ReadFrame's close doesn't double-close the fd - Add defer uploader.Close() in CompressStream so PartUploader file handles are released on error paths between Start() and Complete() - Make Close() idempotent via sync.Once on fsPartUploader and filePartWriter Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… lev-compression-final
The SHA-256 checksum in BuildFileInfo now covers uncompressed data, making it useful for end-to-end integrity verification of the original content. Updated inspect-build to use SHA-256 (replacing MD5) and verify checksums against the header. Fixed early-return lint warnings. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
GetUploadOptions now accepts fileType and useCase parameters, enriching the LD evaluation context so dashboard targeting rules can differentiate (e.g. compress memfile but not rootfs, or builds but not pauses). TemplateBuild accepts per-file opts directly instead of holding an ff reference. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- peerserver/seekable + localDiff: use logical cache.Size(), not on-disk FileSize() (8x-inflated formula broke V3/V4 P2P resume). - storage_google.StoreFile: one outer WriteFromFileSystem timer covering all branches; drop the shadowed OneShot; tag with compression.type/level. - Privatize ResolveCompressConfig into sandbox; validate frame size against the real per-file block size from the snapshot diff header. - CI: add lz4 matrix; propagate COMPRESS_* to the base-template build via .env.test so the fixture matches the services.
- peerserver/seekable + localDiff: use logical cache.Size(), not on-disk FileSize() (8x-inflated formula broke V3/V4 P2P resume). - storage_google.StoreFile: one outer WriteFromFileSystem timer covering all branches; drop the shadowed OneShot; tag with compression.type/level. - Privatize ResolveCompressConfig into sandbox; validate frame size against the real per-file block size from the snapshot diff header. - CI: add lz4 matrix; propagate COMPRESS_* to the base-template build via .env.test so the fixture matches the services.
048ae83 to
7e65a1f
Compare
There was a problem hiding this comment.
Additional findings (outside current diff — PR may have been updated during review):
-
🔴
packages/orchestrator/pkg/sandbox/template/peerserver/seekable_test.go:17-23— TestSeekableSource_Size mocks the wrong method, breaking CI: the test expectsdiff.FileSize()butseekableSource.Size(seekable.go:22) actually callsf.diff.Size(ctx).go test -run TestSeekableSource_Size ./packages/orchestrator/pkg/sandbox/template/peerserver/fails withunexpected method call: Size(*context.cancelCtx)andFAIL: 0 out of 1 expectation(s) were met. FileSize(). Fix is one-character: change the mock todiff.EXPECT().Size(mock.Anything).Return(int64(1234), nil)(or change the production call toFileSize(), butSize(ctx)is the SeekableSource interface contract and is what other tests in the file already expect for similar code paths).Extended reasoning...
What's broken
In
packages/orchestrator/pkg/sandbox/template/peerserver/seekable_test.go:20, the test sets up the mock as:diff := buildmocks.NewMockDiff(t) diff.EXPECT().FileSize().Return(int64(1234), nil)
But the production code in
packages/orchestrator/pkg/sandbox/template/peerserver/seekable.go:21-23callsSize(ctx), notFileSize():func (f *seekableSource) Size(ctx context.Context) (int64, error) { return f.diff.Size(ctx) }
build.Diffis an interface that has both methods (Size(ctx) (int64, error)from the embeddedstorage.SeekableReader, plus a separateFileSize() (int64, error)method). The test mocks the wrong one of the two.Step-by-step proof
Reproduced directly on the PR branch:
$ go test -run TestSeekableSource_Size ./packages/orchestrator/pkg/sandbox/template/peerserver/ --- FAIL: TestSeekableSource_Size (0.00s) mock.go:361: assert: mock: I don't know what to return because the method call was unexpected. Either do Mock.On("Size").Return(...) first, or remove the Size() call. This method was unexpected: Size(*context.cancelCtx) at: [...mockdiff.go:411 ...seekable.go:22 ...seekable_test.go:28] mockdiff.go:24: FAIL: FileSize() at: [...mockdiff.go:260 ...seekable_test.go:20] mockdiff.go:24: FAIL: 0 out of 1 expectation(s) were met. The code you are testing needs to make 1 more call(s). FAIL FAIL github.com/e2b-dev/infra/packages/orchestrator/pkg/sandbox/template/peerserver 0.017sWalking through the call:
src.Size(t.Context())is called (seekable_test.go:28).- That dispatches to
seekableSource.Size, which callsf.diff.Size(ctx)(seekable.go:22). - The mock has no expectation for
Size— only forFileSize()— so testify panics:Size(*context.cancelCtx)is unexpected. - After panic recovery, mock cleanup reports the unmet
FileSize()expectation as a second failure.
Why this matters / impact
CI-breaking. The PR's own diff (visible in this review) replaced the previously-correct
Size(mock.Anything)expectation withFileSize(), but the production code atseekable.go:22was not changed to match. This appears to be an in-flight refactor that was committed half-done.Fix
The simpler/safer fix is to update the test to match the production interface contract.
Size(ctx context.Context) (int64, error)is part of thestorage.SeekableReaderinterface thatbuild.Diffembeds and is what the rest of the codebase uses for context-cancellable size lookups. Change line 20 to:diff.EXPECT().Size(mock.Anything).Return(int64(1234), nil)
Alternatively, if the intent is to use the no-context
FileSize()here (it returns from the local cache without an RPC),seekable.go:22needs to be changed toreturn f.diff.FileSize()and the function signature kept asSize(ctx context.Context)to satisfy theSeekableSourceinterface (with_ context.Contextto silence unused-param). Either way, the test and the production code must agree.
|
@cla-bot check |
|
We require contributors to sign our Contributor License Agreement, and we don't have @github-actions[bot] on file. You can sign our CLA at https://e2b.dev/docs/cla . Once you've signed, post a comment here that says '@cla-bot check' |
|
The cla-bot has been summoned, and re-checked this pull request! |
|
@cla-bot check |
|
The cla-bot has been summoned, and re-checked this pull request! |
|
@cla-bot check |
|
The cla-bot has been summoned, and re-checked this pull request! |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7e65a1f6d5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| // IncompletePendingUpload is set on diff headers produced by ToDiffHeader and | ||
| // cleared on the finalized headers swapped in by the upload pipeline. It | ||
| // is in-memory only (never serialized), and signals that the build's data | ||
| // has not yet reached object storage — readers must serve from the local | ||
| // cache and skip FrameTable lookups for the still-missing self entry. | ||
| IncompletePendingUpload bool |
There was a problem hiding this comment.
🟡 The doc comment for IncompletePendingUpload claims it "is in-memory only (never serialized)", but the V4 wire format reserves bit 0 of a dedicated flags byte for exactly this field — serialization_v4.go:111-118 writes it and serialization_v4.go:198 reads it back. peerserver/header.go:46 explicitly forces this bit ON for V4 headers shipped over P2P, which is the wire mechanism by which peers detect an in-flight build. The flag is never persisted to long-term storage (StoreHeader refuses headers carrying it), but it IS serialized in the V4 wire format. Consider rewording to something like "never persisted to long-term storage by StoreHeader, but is serialized in the V4 wire format for in-flight P2P signaling". Documentation only — runtime behavior is correct.
Extended reasoning...
What the bug is
The comment block at packages/shared/pkg/storage/header/header.go:39-44 reads:
// IncompletePendingUpload is set on diff headers produced by ToDiffHeader and
// cleared on the finalized headers swapped in by the upload pipeline. It
// is in-memory only (never serialized), and signals that the builds data
// has not yet reached object storage — readers must serve from the local
// cache and skip FrameTable lookups for the still-missing self entry.
IncompletePendingUpload boolThe phrase "in-memory only (never serialized)" is factually wrong for V4 headers. This is documentation only — runtime behavior is correct — but it will mislead future maintainers about the V4 wire format and the P2P contract.
Step-by-step proof
-
SerializeHeader(serialization.go:8-14) dispatches toserializeV4for any header withMetadata.Version >= 4, passingh.IncompletePendingUploadthrough. -
serializeV4(serialization_v4.go:111-119) packs the flag into bit 0 of a dedicated flags byte that becomes part of the on-the-wire bytes:var flags uint8 if incomplete { flags |= v4FlagIncomplete } result := make([]byte, metadataSize+v4FlagsLen+v4SizePrefixLen+len(compressed)) copy(result, metaBuf.Bytes()) result[metadataSize] = flags
v4FlagIncomplete = 1 << 0(serialization_v4.go:24). -
deserializeV4(serialization_v4.go:131, 198) reads the flag back:flags := blockData[0] ... h.IncompletePendingUpload = flags&v4FlagIncomplete != 0
-
peerserver/header.go:38-47is the wire mechanism — it explicitly forces this bit ON for every V4 header it serves to a peer, regardless of the in-memory state:wire := *h if wire.Metadata.Version >= header.MetadataVersionV4 { wire.IncompletePendingUpload = true } data, err := header.SerializeHeader(&wire)
This is the entire mechanism by which a peer learns the origin upload is still in flight.
Why the comment is technically true only for the storage path
StoreHeader (serialization.go:62-67) refuses to persist a header with the flag set:
if h.IncompletePendingUpload {
return fmt.Errorf("refusing to persist incomplete header for %s", path)
}So the flag never lands in long-term object storage — that is the only sense in which it is "not serialized". But the V4 wire path absolutely encodes it.
Impact
Documentation-only — the runtime is correct. But the comment will mislead anyone trying to understand:
- the V4 wire format,
- the P2P → "switch to storage" handshake,
- why
peerserver/header.gomutates the field before serialization, - why V3 doesn't need this and V4 does.
A future maintainer trusting the "never serialized" claim might remove the flags byte, the peerserver mutation, or the StoreHeader guard, breaking the P2P in-flight signal in subtle ways.
Suggested fix
Reword the comment to reflect the actual contract, e.g.:
// IncompletePendingUpload is set on diff headers produced by ToDiffHeader and
// cleared on the finalized headers swapped in by the upload pipeline. It is
// never persisted to long-term storage (StoreHeader refuses headers carrying
// it), but it IS serialized in the V4 wire format (bit 0 of the flags byte)
// to signal in-flight builds over P2P. Readers seeing this bit set must serve
// from the local cache and skip FrameTable lookups for the still-missing self
// entry.
IncompletePendingUpload bool
Summary
Compression for data files (memfile, rootfs). Files are broken into independently decompressible frames (2 MiB, zstd), stored in GCS alongside V4 headers with per-mapping frame tables. Fully backward-compatible: the read path auto-detects V3/V4 headers and routes compressed vs uncompressed reads per-mapping. Gated by compress-config LaunchDarkly flag (per-team/cluster/template targeting).
What changed
FramedFileinterface replacesSeekable— unifiedGetFrame(ctx, offset, frameTable, decompress, buf, readSize, onRead)handles both compressed and uncompressed dataFrameTableper mapping +BuildFileInfo(uncompressed size, SHA-256 checksum) per build; LZ4-block-compressed header blobChunkerwith mmap cache, and fetch sessions dedupe replacing streaming_chunk.goRead path
P2P header switchover
Benchmark results
End-to-end pause/resume
(BenchmarkBaseImage, 50 iterations, local disk):
Full architecture doc: docs/compression-architecture.md