- A. Architecture
- B. Biggest Changes
- C. Read Path Diagram
- D. Remaining Work
- E. Write Paths
- F. Failure Modes
- G. Cost & Benefit
- H. Grafana Metrics
- I. Rollout Strategy
Templates are stored in GCS as build artifacts. Each build produces two data files (memfile, rootfs) plus a header and metadata. Each data file can have an uncompressed variant ({buildId}/memfile) and a compressed variant ({buildId}/v4.memfile.lz4), with corresponding v3 and v4 headers.
- Data is broken into frames, each independently decompressible (LZ4 or Zstd).
- Frames are aligned to
FrameAlignmentSize(=MemoryChunkSize= 4 MiB) in uncompressed space, with a minimum of 1 MB compressed and a maximum of 32 MB uncompressed (configurable). - The v4 header embeds a
FrameTableper mapping:CompressionType + StartAt + []FrameSize. The header itself is always LZ4-block-compressed, regardless of data compression type. - The
FrameTableis subset per mapping so each mapping carries only the frames it references.
The most relevant change is FramedFile (returned by OpenFramedFile) replaces the old Seekable (returned by OpenSeekable). Where Seekable had separate ReadAt, OpenRangeReader, and StoreFile methods, FramedFile unifies reads into a single GetFrame(ctx, offsetU, frameTable, decompress, buf, readSize, onRead) that handles both compressed and uncompressed data, plus Size and StoreFile (with optional compression via FramedUploadOptions). For compressed data, raw compressed frames are cached individually on NFS by (path, frameStart, frameSize) key.
Two LaunchDarkly JSON flags control compression, with per-team/cluster/template targeting:
chunker-config (read path):
// (restart required for existing chunkers)
{
"useCompressedAssets": false, // load v4 headers, use compressed read path if available
"minReadBatchSizeKB": 16 // floor for read batch size in KB
}compress-config (write path):
{
"compressBuilds": false, // enable compressed dual-write uploads
"compressionType": "zstd", // "lz4" or "zstd"
"level": 2, // compression level (0=fast, higher=better ratio)
"frameTargetMB": 2, // target compressed frame size in MiB
"frameMaxUncompressedMB": 16, // cap on uncompressed bytes per frame (= 4 × MemoryChunkSize)
"uploadPartTargetMB": 50, // target GCS multipart upload part size in MiB
"encoderConcurrency": 1, // goroutines per zstd encoder
"decoderConcurrency": 1 // goroutines per pooled zstd decoder
}When an orchestrator loads a template from storage (cache miss):
- Header probe: if
useCompressedAssets, probes for v4 and v3 headers in parallel, preferring v4. Falls back to v3 if v4 is missing. - Asset probe: for each build referenced in header mappings, probes for 3 data variants in parallel (uncompressed,
.lz4,.zstd). Missing variants are silently skipped. - Chunker creation: one
Chunkerper(buildId, fileType). The chunker'sAssetInforecords which variants exist.
All three consumer types share the same path at read time:
GetBlock(offset, length, ft) // was Slice()
→ header.GetShiftedMapping(offset) // in-memory → BuildMap with FrameTable
→ DiffStore.Get(buildId) // TTL cache hit → cached Chunker
→ Chunker.GetBlock(offset, length, ft)
→ mmap cache hit? return reference
→ miss: regionLock dedup → fetchSession → GetFrame → NFS cache → GCS
→ decompressed bytes written into mmap, waiters notified
- Prefetch reads 4 MiB, UFFD reads 4 KB or 2 MB (hugepage), NBD reads 4 KB.
- Frames are aligned to
MemoryChunkSize(4 MiB), so noGetBlockcall ever crosses a frame boundary. - If the v4 header was loaded, each mapping carries a subset
FrameTable; thisftis threaded through toGetBlock, routing to compressed or uncompressed fetch, no header fetch is needed.
-
Unified Chunker: collapsed
FullFetchChunker,StreamingChunker, and theChunkerinterface back into a single concreteChunkerstruct backed by slot-basedregionLockfor fetch deduplication; a single code path handles both compressed and uncompressed data viaGetFrame. -
Asset probing at init:
StorageDiff.Initnow probes for all 3 data variants (uncompressed, lz4, zstd) in parallel viaprobeAssets, constructing anAssetInfothat the Chunker uses to route reads. This replaces the previousOpenSeekablesingle-object path. -
Upload API on TemplateBuild: moved the upload lifecycle from
SnapshottoTemplateBuild, which now owns path extraction,PendingFrameTablesaccumulation, and V4 header serialization.UploadAllis synchronous (no internal goroutine); multi-layer builds useUploadExceptV4Headers+UploadV4Headerwith explicit coordination viaUploadTracker. -
NFS cache for compressed frames:
GetFrameon the NFS cache layer stores and retrieves individual compressed frames by(path, frameStart, frameSize), with progressive decompression into mmap. Uncompressed reads use the sameGetFramecodepath withft=nil. -
FrameTable validation and testing: added
validateGetFrameParamsat theGetFrameentry point (alignment checks for compressed, bounds checks for uncompressed), fixedFrameTable.Rangebug (was not initializing fromStartAt), and added comprehensiveFrameTableunit tests.
flowchart TD
subgraph Consumers
NBD["NBD (4 KB)"]
UFFD["UFFD (4 KB / 2 MB)"]
PF["Prefetch (4 MiB)"]
end
NBD & UFFD & PF --> GM["header.GetShiftedMapping(offset)"]
GM -->|"BuildMap + FrameTable"| DS["DiffStore.Get(buildId)"]
DS -->|"cached Chunker"| GB["Chunker.GetBlock(offset, length, ft)"]
GB --> MC{"mmap cache hit?"}
MC -->|"hit"| REF["return []byte (reference to mmap)"]
MC -->|"miss"| RL["regionLock (dedup / wait)"]
RL --> ROUTE{"matching compressed asset exists?"}
ROUTE -->|"compressed"| GFC["GetFrame (ft, decompress=true)"]
ROUTE -->|"uncompressed"| GFU["GetFrame (ft=nil, decompress=false)"]
GFC --> NFS{"NFS cache hit?"}
GFU --> NFS
NFS -->|"hit"| WRITE["write to mmap + notify waiters"]
NFS -->|"miss"| GCS["GCS range read (C-space or U-space)"]
GCS --> DEC{"compressed?"}
DEC -->|"yes"| DECOMP["pooled zstd/lz4 decoder"]
DEC -->|"no"| STORE_NFS
DECOMP --> STORE_NFS["store frame in NFS cache"]
STORE_NFS --> WRITE
WRITE --> REF
ASCII version
NBD (4KB) UFFD (4KB/2MB) Prefetch (4MiB)
\ | /
`---------.---'--------.-----'
v v
header.GetShiftedMapping(offset)
|
v
DiffStore.Get(buildId) ──> cached Chunker
|
v
Chunker.GetBlock(offset, length, ft)
|
.------+------.
v v
[mmap hit] [mmap miss]
return ref |
regionLock (dedup/wait)
|
.--------+--------.
v v
ft != nil? ft == nil
compressed uncompressed
asset exists?
| |
v v
GetFrame GetFrame
(decompress=T) (decompress=F)
| |
'--------+-------'
|
NFS cache hit? ──yes──> write to mmap
| + notify waiters
no |
| v
GCS range read return []byte ref
(C-space / U-space)
|
compressed? ──no──> store in NFS
| |
yes v
| write to mmap
zstd/lz4 decode + notify waiters
| |
store in NFS v
| return []byte ref
v
write to mmap
+ notify waiters
|
v
return []byte ref
-
Per-artifact compression config: memfile and rootfs have different runtime requirements. The
compress-configflag should support separate codec, level, and frame size settings per artifact type rather than applying a single config to both. -
Verify
getFrametimer lifecycle: audit thatSuccess()/Failure()is always called on every code path in the storage cache'sgetFrameCompressedandgetFrameUncompressed. -
Feature flag to disable progressive
GetBlockreading: add a flag that bypasses progressive reading/returning inGetBlockand falls back to the original whole-block fetch behavior. Useful as a fault-tolerance lever if progressive reads cause issues in production. -
NFS write-through for compressed uploads: during
StoreFilewith compression, tee out uncompressed chunk data to NFS cache via a callback, so uncompressedGetFramereads can hit cache immediately after upload without a cold GCS fetch.
-
Compressed-only write mode: add a
compress-configflag (e.g."skipUncompressed": true) that skips the uncompressed upload entirely and writes only compressed data + v4 header. Code:TemplateBuild.UploadAll/UploadExceptV4Headerscurrently always uploads uncompressed; gate that behind the flag. Read path:probeAssetsalready handles missing uncompressed variants, so this should work as-is. Saves the dual-write bandwidth and storage cost, but makes rollback to uncompressed reads impossible for those builds. -
Purity enforcement (no mixed compressed/uncompressed stacks): add a
chunker-configflag (e.g."requirePureCompression": true) that, at template load time, validates that if the top-layer build has compressed assets then every ancestor build in the header's mappings also has compressed assets (and vice versa). Fail sandbox creation if the check fails rather than silently mixing. This interacts with the write path: whenrequirePureCompressionis enabled and a new layer is built on top of an uncompressed parent, the build must either (a) refuse to compress, (b) refuse to start, or (c) trigger background compression of the parent chain first. Today'sprobeAssetsper-build routing lets mixed stacks work; purity enforcement would intentionally break that flexibility for correctness guarantees. -
Sync vs async layer compression: today compression is either inline (during
TemplateBuild.Upload*, blocking the build) or fully async (backgroundcompress-buildCLI, after the fact). Middle ground to explore:- Compress before upload submission: the snapshot data is already in memory/mmap after Firecracker pause. Compress frames in-process before kicking off the GCS upload, so the upload only sends compressed data (pairs with #5). Tradeoff: adds compression latency to the critical path before the sandbox can be resumed on another server.
- Compress shortly after build completes: fire an async compression job (in-process goroutine or separate task) that runs after the uncompressed upload finishes. The sandbox is resumable immediately from uncompressed data, and compressed data appears later. But: if another build references this layer before compression finishes, the child gets an uncompressed parent — violating purity (#6). And if the sandbox is resumed from the uncompressed image on a different server while compression is in-flight, we have a race on the GCS objects.
- Implications for purity: strict purity enforcement (#6) effectively forces synchronous compression of the entire ancestor chain before a compressed child can be built. Async compression is only safe when purity is not enforced, or when there's a coordination mechanism (e.g. a "compression pending" state that blocks child builds until the parent is compressed).
-
Storage Provider/Backend layer separation: decompose
StorageProviderinto distinct Provider (high-level:FrameGetter,FileStorer,Blobber) and Backend (low-level:Basic,RangeGetter,MultipartUploaderFactory) layers. Prerequisite for clean instrumentation wrapping. -
OTEL instrumentation middleware (
instrumented_provider.go,instrumented_backend.go): full span and metrics wrapping at both layers. ~400 lines. -
Test coverage (~4300 lines total): chunker matrix tests (
chunk_test.go— concurrent access, decompression stats, cross-chunker coverage), compression round-trip tests (compress_test.go), NFS cache with compressed data (storage_cache_seekable_test.go), template build upload tests (template_build_test.go).
Triggered by sbx.Pause() or initial template build. The orchestrator creates a Snapshot (FC memory + rootfs diffs, headers, snapfile, metadata), then constructs a TemplateBuild which owns the upload lifecycle:
-
Single-layer (initial build, simple pause):
TemplateBuild.UploadAll(ctx)— synchronous, creates its ownPendingFrameTablesinternally. Uploads uncompressed data + compressed data (ifcompressBuildsFF enabled) + uncompressed headers + snapfile + metadata concurrently in an errgroup. V4 headers are finalized and uploaded after all data uploads complete (they depend onFrameTableresults). -
Multi-layer (layered build):
TemplateBuild.UploadExceptV4Headers(ctx)uploads all data, then returnshasCompressed. The caller coordinates withUploadTrackerto wait for ancestor layers, then callsTemplateBuild.UploadV4Header(ctx)which reads accumulatedPendingFrameTablesfrom all layers and serializes the final v4 header.
A standalone CLI tool for compressing existing uncompressed builds after the fact:
compress-build -build <uuid> [-storage gs://bucket] [-compression lz4|zstd] [-recursive]
- Reads the uncompressed data from GCS, compresses into frames, writes compressed data + v4 header back.
--recursivewalks header mappings to discover and compress dependency builds first (parent templates), avoiding nil-FrameTable gaps in derived templates.- Supports
--dry-run,-template <alias>(resolves via E2B API), configurable frame size and compression level. - Idempotent: skips builds that already have compressed artifacts.
Corrupted compressed frame in GCS or NFS: no automatic fallback to uncompressed today. The read fails, GetBlock returns an error, and the sandbox page-faults. Unresolved: should the Chunker retry with the uncompressed variant when decompression fails and HasUncompressed is true?
Half-compressed builds (some layers have v4 header + compressed data, ancestors don't): handled by design. probeAssets finds whichever variants exist per build; each Chunker routes independently. A v4 header with a nil FrameTable for an ancestor mapping falls through to uncompressed fetch for that mapping.
NFS unavailable: compressed frames that miss NFS go straight to GCS (existing behavior). Uncompressed reads also use NFS caching with read-through and async write-back. No circuit breaker — repeated NFS timeouts will add latency to every miss until the cache recovers.
Upload path complexity: dual-write (uncompressed + compressed), PendingFrameTables accumulation, and V4 header serialization add failure surface to the build hot path. Multi-layer builds add UploadTracker coordination between layers. A compression failure during upload could fail the entire build. Back-out: set compressBuilds: false in compress-config — this disables compressed writes entirely; uncompressed uploads continue as before and the read path already handles missing compressed variants. No cleanup of already-written compressed data needed (it becomes inert).
- Should Chunker fall back to uncompressed on a corrupt V4 header or a decompression error?
Sampled from gs://e2b-staging-lev-fc-templates/ (262 builds, zstd level 2):
| Artifact | Builds sampled | Avg uncompressed | Avg compressed | Ratio |
|---|---|---|---|---|
| memfile | 191 (both variants) | 140 MiB | 35 MiB | 4.0x |
| rootfs | 153 (compressed-only) | unknown | varies | est. 2-10x (diff layers are tiny, full builds ~2x) |
During dual-write, GCS storage increases ~25% for memfile. After dropping uncompressed, net savings are ~75% for memfile. Rootfs savings depend on the mix of diff vs full builds.
New per-orchestrator CPU cost: decompressing every GCS-fetched frame. At ~35 MiB compressed per cold memfile load and zstd level 2 decode throughput of ~1-2 GB/s, each cold load burns ~20-40 ms of CPU. Scales with cold template load rate, not sandbox count. Encode cost is write-path only (build/pause), bounded by upload concurrency.
The main cost: mmap regions are allocated at uncompressed size but frames are fetched whole. A 4 KB NBD read triggers a full frame fetch (4-16 MiB uncompressed), filling mmap with data the sandbox may never touch. This inflates RSS and can pressure the orchestrator fleet into scaling. Mitigations: tune frameMaxUncompressedMB down, or drop unrequested bytes from the mmap after the requesting read completes.
Smaller GCS reads (4x fewer bytes) and smaller NFS cache entries reduce network bandwidth. Upload path doubles bandwidth during dual-write.
Each TimerFactory metric emits three series with the same name but different units: a duration histogram (ms), a bytes counter (By), and an ops counter. All three carry the same attributes listed below plus an automatic result = success | failure.
| Metric | What it measures | Attributes |
|---|---|---|
orchestrator.blocks.slices |
End-to-end GetBlock latency (mmap hit or remote fetch) |
compressed (bool), pull-type (local · remote), failure-reason* |
orchestrator.blocks.chunks.fetch |
Remote storage fetch (GCS range read + optional decompress) | compressed (bool), failure-reason* |
orchestrator.blocks.chunks.store |
Writing fetched data into local mmap cache | — |
* failure-reason values: local-read, local-read-again, remote-read, cache-fetch, session_create
| Metric | What it measures | Attributes |
|---|---|---|
orchestrator.storage.slab.nfs.read |
NFS cache read (frame or size lookup) | operation (GetFrame · Size) |
orchestrator.storage.slab.nfs.write |
NFS cache write (store frame after GCS fetch) | — |
orchestrator.storage.cache.ops |
NFS cache operation count | cache_type (blob · framed_file), op_type*, cache_hit (bool) |
orchestrator.storage.cache.bytes |
NFS cache bytes transferred | cache_type, op_type*, cache_hit (bool) |
orchestrator.storage.cache.errors |
NFS cache errors (excluding expected ErrNotExist) |
cache_type, op_type*, error_type (read · write · write-lock) |
* op_type values: get_frame, write_to, size, put, store_file
| Metric | What it measures | Attributes |
|---|---|---|
orchestrator.storage.gcs.read |
GCS read operations | operation (Size · WriteTo · GetFrame) |
orchestrator.storage.gcs.write |
GCS write operations | operation (Write · WriteFromFileSystem · WriteFromFileSystemOneShot) |
- Compressed vs uncompressed latency:
orchestrator.blocks.slicesgrouped bycompressed, filtered toresult=success - Cache hit rate:
orchestrator.blocks.sliceswherepull-type=localvspull-type=remote - NFS effectiveness:
orchestrator.storage.cache.opswhereop_type=get_frame, ratio ofcache_hit=trueto total - GCS fetch volume:
orchestrator.storage.gcs.readwhereoperation=GetFrame, bytes counter - Decompression overhead:
orchestrator.blocks.chunks.fetchwherecompressed=true, compare duration histogram tocompressed=false
TBD