Skip to content

Use memfd to track sandbox memory#2522

Open
bchalios wants to merge 17 commits into
mainfrom
zero-copy-pause
Open

Use memfd to track sandbox memory#2522
bchalios wants to merge 17 commits into
mainfrom
zero-copy-pause

Conversation

@bchalios
Copy link
Copy Markdown
Contributor

@bchalios bchalios commented Apr 29, 2026

What

In Unix OSs memfd is an anonymous file that can be used to back memory. Firecracker uses this construct when it needs to share memory with external processes (currently, when using vhost-user devices).

Currently, when we take a snapshot of the sandbox (for example, during PAUSE operations) we need to copy its memory using process_vm_readv. memfd allows us to do this in a more idiomatic way.

Why

memfd allows us to have a direct view of the sandbox memory from the orchestrator without having to copy memory across processes. Moreover, if the orchestrator holds a reference to memfd, we can post process the sandbox memory after the Firecracker process is killed. This opens up possibilities for various latency and memory utilization optimizations.

What we do in this PR is that we change the cache logic to use memfd to copy Firecracker memory into the diff file if the memfd is present.

@cursor
Copy link
Copy Markdown

cursor Bot commented Apr 29, 2026

PR Summary

Medium Risk
Changes snapshot resume and pause memory-export paths to optionally rely on Firecracker-provided memfd fds, which is a core correctness/performance area and could cause snapshot failures or fd/mmap leaks if version gating or ownership is wrong.

Overview
Potential issues: NewFromFd mmaps using int(st.Size) which can overflow on non-64-bit builds and will map the entire guest memory eagerly; the UFFD handshake assumes Firecracker sends a single control message and at most 2 fds, so any deviation will hard-fail resume; memfd ownership is transferred across several layers (UFFD swap, pause export, cache construction) and mistakes here could leak or double-close fds/mappings or silently fall back to slower process_vm_readv when memfd isn’t available.

Reviewed by Cursor Bugbot for commit 18ee751. Bugbot is set up for automated code reviews on this repo. Configure here.

Comment thread packages/orchestrator/pkg/sandbox/uffd/uffd.go
Comment thread packages/orchestrator/pkg/sandbox/block/cache.go Outdated
Comment thread packages/orchestrator/pkg/sandbox/block/memfd.go Outdated
Comment thread packages/orchestrator/pkg/sandbox/sandbox.go
@bchalios bchalios force-pushed the zero-copy-pause branch 2 times, most recently from 6d2b804 to 314abd0 Compare April 29, 2026 09:25
Comment thread packages/orchestrator/pkg/sandbox/block/cache.go Outdated
Comment thread packages/orchestrator/pkg/sandbox/sandbox.go Outdated
@bchalios bchalios force-pushed the zero-copy-pause branch 5 times, most recently from 1ab27f8 to 4f0b47b Compare April 29, 2026 15:56
Comment thread packages/orchestrator/pkg/sandbox/block/cache.go Outdated
Comment thread packages/orchestrator/pkg/sandbox/block/memfd.go Outdated
Comment thread packages/orchestrator/pkg/sandbox/block/memfd.go Outdated
Comment thread packages/orchestrator/pkg/sandbox/block/cache.go Outdated
@bchalios bchalios force-pushed the zero-copy-pause branch from 4f0b47b to d09dbca Compare May 4, 2026 14:46
@codecov
Copy link
Copy Markdown

codecov Bot commented May 4, 2026

❌ 4 Tests Failed:

Tests completed Failed Passed Skipped
2624 4 2620 5
View the full list of 6 ❄️ flaky test(s)
github.com/e2b-dev/infra/tests/integration/internal/tests/api/sandboxes::TestUpdateNetworkConfig

Flake rate in main: 76.85% (Passed 231 times, Failed 767 times)

Stack Traces | 40.2s run time
=== RUN   TestUpdateNetworkConfig
=== PAUSE TestUpdateNetworkConfig
=== CONT  TestUpdateNetworkConfig
--- FAIL: TestUpdateNetworkConfig (40.24s)
github.com/e2b-dev/infra/tests/integration/internal/tests/api/sandboxes::TestUpdateNetworkConfig/pause_resume_preserves_allow_internet_access_false

Flake rate in main: 77.39% (Passed 222 times, Failed 760 times)

Stack Traces | 3.6s run time
=== RUN   TestUpdateNetworkConfig/pause_resume_preserves_allow_internet_access_false
Executing command curl in sandbox ilftjcda93xzpi94qae88
    sandbox_network_update_test.go:372: Command [curl] output: event:{start:{pid:1364}}
    sandbox_network_update_test.go:372: Command [curl] output: event:{end:{exit_code:35 exited:true status:"exit status 35" error:"exit status 35"}}
Executing command curl in sandbox ilftjcda93xzpi94qae88
    sandbox_network_update_test.go:372: Command [curl] output: event:{start:{pid:1365}}
    sandbox_network_update_test.go:372: Command [curl] output: event:{end:{exit_code:35 exited:true status:"exit status 35" error:"exit status 35"}}
Executing command /bin/sh in sandbox i531q7nnhz8nh677vqhxn
    sandbox_network_update_test.go:391: Command [curl] output: event:{start:{pid:1366}}
    sandbox_network_update_test.go:391: Command [curl] output: event:{data:{stdout:"HTTP/2 302 \r\nx-content-type-options: nosniff\r\nlocation: https://dns.google/\r\ndate: Sun, 17 May 2026 06:50:36 GMT\r\ncontent-type: text/html; charset=UTF-8\r\nserver: HTTP server (unknown)\r\ncontent-length: 216\r\nx-xss-protection: 0\r\nx-frame-options: SAMEORIGIN\r\nalt-svc: h3=\":443\"; ma=2592000,h3-29=\":443\"; ma=2592000\r\n\r\n"}}
    sandbox_network_update_test.go:391: Command [curl] output: event:{end:{exited:true status:"exit status 0"}}
    sandbox_network_update_test.go:391: Command [curl] completed successfully in sandbox ilftjcda93xzpi94qae88
    sandbox_network_update_test.go:391: 
        	Error Trace:	.../api/sandboxes/sandbox_network_out_test.go:74
        	            				.../api/sandboxes/sandbox_network_update_test.go:60
        	            				.../api/sandboxes/sandbox_network_update_test.go:391
        	Error:      	An error is expected but got nil.
        	Test:       	TestUpdateNetworkConfig/pause_resume_preserves_allow_internet_access_false
        	Messages:   	https://8.8.8.8 should be blocked
--- FAIL: TestUpdateNetworkConfig/pause_resume_preserves_allow_internet_access_false (3.60s)
github.com/e2b-dev/infra/tests/integration/internal/tests/envd::TestBindLocalhost

Flake rate in main: 56.36% (Passed 391 times, Failed 505 times)

Stack Traces | 0s run time
=== RUN   TestBindLocalhost
=== PAUSE TestBindLocalhost
=== CONT  TestBindLocalhost
--- FAIL: TestBindLocalhost (0.00s)
github.com/e2b-dev/infra/tests/integration/internal/tests/envd::TestBindLocalhost/bind_::1

Flake rate in main: 64.45% (Passed 219 times, Failed 397 times)

Stack Traces | 7.3s run time
=== RUN   TestBindLocalhost/bind_::1
=== PAUSE TestBindLocalhost/bind_::1
=== CONT  TestBindLocalhost/bind_::1
Executing command python in sandbox iba3wt7tfp1dgkfz3buoi
    localhost_bind_test.go:69: Command [python] output: event:{start:{pid:1268}}
    localhost_bind_test.go:90: 
        	Error Trace:	.../tests/envd/localhost_bind_test.go:90
        	Error:      	Not equal: 
        	            	expected: 200
        	            	actual  : 502
        	Test:       	TestBindLocalhost/bind_::1
        	Messages:   	Unexpected status code 502 for bind address ::1
--- FAIL: TestBindLocalhost/bind_::1 (7.30s)
github.com/e2b-dev/infra/tests/integration/internal/tests/orchestrator::TestSandboxMemoryIntegrity

Flake rate in main: 66.17% (Passed 229 times, Failed 448 times)

Stack Traces | 83.4s run time
=== RUN   TestSandboxMemoryIntegrity
=== PAUSE TestSandboxMemoryIntegrity
=== CONT  TestSandboxMemoryIntegrity
    sandbox_memory_integrity_test.go:26: Build completed successfully
--- FAIL: TestSandboxMemoryIntegrity (83.41s)
github.com/e2b-dev/infra/tests/integration/internal/tests/orchestrator::TestSandboxMemoryIntegrity/tmpfs_hash

Flake rate in main: 66.87% (Passed 219 times, Failed 442 times)

Stack Traces | 25s run time
=== RUN   TestSandboxMemoryIntegrity/tmpfs_hash
=== PAUSE TestSandboxMemoryIntegrity/tmpfs_hash
=== CONT  TestSandboxMemoryIntegrity/tmpfs_hash
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{start:{pid:1256}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stdout:"Total memory: 985 MB\nUsed memory before tmpfs mount: 183 MB\nFree memory before tmpfs mount: 801 MB\nMemory to use in integrity test (80% of free, min 64MB): 640 MB\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"640+0 records in\n640+0 records out\n671088640 bytes (671 MB, 640 MiB) copied, 3.19698 s, 210 MB/s\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"\tCommand being timed: \"dd if=/dev/urandom of=/mnt/testfile bs=1M count=640\"\n\tUser time (seconds): 0.00\n\tSystem time (seconds): 3.17\n\tPercent of CPU this job got: 99%\n\tElapsed (wall clock) time (h:mm:ss or m:ss): 0:03.20\n\tAverage shared text size (kbytes): 0\n\tAverage unshared data size (kbytes): 0\n\tAverage stack size (kbytes): 0\n\tAverage total size (kbytes): 0\n\tMaximum resident set size (kbytes): 2612\n\tAverage resident set size (kbytes): 0\n\tMajor (requiring I/O) page faults: 2\n\tMinor (reclaiming a frame) page faults: 341\n\tVoluntary context switches: 3\n\tInvoluntary context switches: 20\n\tSwaps: 0\n\tFile system inputs: 176\n\tFile system outputs: 0\n\tSocket messages sent: 0\n\tSocket messages received: 0\n\tSignals delivered: 0\n\tPage size (bytes): 4096\n\tExit status: 0\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stdout:"Used memory after tmpfs mount and file fill: 829 MB\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{end:{exited:true status:"exit status 0"}}
    sandbox_memory_integrity_test.go:70: Command [bash] completed successfully in sandbox iasf8ba3t5wjl53f1ee6b
Executing command bash in sandbox iasf8ba3t5wjl53f1ee6b (user: root)
    sandbox_memory_integrity_test.go:74: Command [bash] output: event:{start:{pid:1272}}
    sandbox_memory_integrity_test.go:74: Command [bash] output: event:{data:{stdout:"f5d9300f38474ce4d50056be24f0743f10caf00e75d63a18a4eb9e45bc9cf543\n"}}
    sandbox_memory_integrity_test.go:74: Command [bash] output: event:{end:{exited:true status:"exit status 0"}}
    sandbox_memory_integrity_test.go:74: Command [bash] completed successfully in sandbox iasf8ba3t5wjl53f1ee6b
Executing command bash in sandbox iasf8ba3t5wjl53f1ee6b (user: root)
    sandbox_memory_integrity_test.go:99: Command [bash] output: event:{start:{pid:1275}}
    sandbox_memory_integrity_test.go:100: 
        	Error Trace:	.../tests/orchestrator/sandbox_memory_integrity_test.go:100
        	Error:      	Received unexpected error:
        	            	failed to execute command bash in sandbox iasf8ba3t5wjl53f1ee6b: invalid_argument: protocol error: incomplete envelope: unexpected EOF
        	Test:       	TestSandboxMemoryIntegrity/tmpfs_hash
--- FAIL: TestSandboxMemoryIntegrity/tmpfs_hash (25.02s)

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

@bchalios bchalios force-pushed the zero-copy-pause branch 2 times, most recently from 4ae4ebb to 9198a96 Compare May 4, 2026 15:13
@bchalios
Copy link
Copy Markdown
Contributor Author

bchalios commented May 4, 2026

Update: I've removed the logic that punches holes in the memfd, progressively after copying data into the diff file. I've ran some experiments and got some signal about this causing increase in CPU utilization and slowing down PAUSE and RESUMEs.

I think that we can proceed with adding support for memfd and revisiting after the deduplication work.

@bchalios bchalios marked this pull request as ready for review May 4, 2026 15:19
@bchalios bchalios requested review from dobrac and jakubno as code owners May 4, 2026 15:19
@qodo-code-review
Copy link
Copy Markdown

ⓘ You've reached your Qodo monthly free-tier limit. Reviews pause until next month — upgrade your plan to continue now, or link your paid account if you already have one.

@bchalios bchalios requested a review from ValentaTomas May 4, 2026 15:26
Comment thread packages/orchestrator/pkg/sandbox/uffd/uffd.go Outdated
Comment thread packages/orchestrator/pkg/sandbox/uffd/uffd.go
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bed7ac7e41

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread packages/orchestrator/pkg/sandbox/sandbox.go
Comment thread packages/orchestrator/pkg/sandbox/uffd/uffd.go
@bchalios bchalios force-pushed the zero-copy-pause branch 2 times, most recently from 328d277 to ad966b6 Compare May 14, 2026 16:17
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ad966b6c63

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread packages/orchestrator/pkg/sandbox/uffd/uffd.go Outdated
Comment thread packages/orchestrator/pkg/sandbox/sandbox.go
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e5aedb7b33

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread packages/orchestrator/pkg/sandbox/uffd/uffd.go Outdated
bchalios added 5 commits May 15, 2026 14:08
Introduce a Cacher interface which abstracts the memory cache
implementation, as seen by the diff/upload layer. Currently, it's only
the Cache type that implements that.

This is preparation for introducing a second cache type which is backed
by the memfd used to map guest memory.

Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>
We are changing Firecracker to, optionally, back the guest memory using
a memfd object. When enabled, Firecracker passes over the memfd file
descriptor over the UFFD UDS, alongside the UFFD file descriptor, using
SCM_RIGHTS.

Change the UFFD serve logic to also parse the memfd file descriptor.
When present, wrap the descriptor in a Memfd object. The object itself
provides an interface that lets users access the guest memory from the
memfd.

UFFD logic exposes the Memfd object over a newly added method of the
MemoryBackend interface, called Memfd(). The noop memory backend always
returns nil for now, as Firecracker might only use memfd when resuming
from a snapshot.

Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>
Change the ExportMemory() logic to export the memory via a MemfdCache
when Firecracker has sent us a memfd file descriptor.

Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>
Add unit tests for memfd and MemfdCache functionality.

Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>
Add a feature flag that controls whether the orchestrator will instruct
Firecracker to use memfd for backing the guest memory.

Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>
Trim verbose comments and drop trivial tests so the PR is easier to
review. The behavioral changes are:

- copyFromMemfd uses a fixed 2 MiB chunk (memfdCopyChunkSize) matching
  the source hugepage size, decoupled from cache.blockSize (which
  remains the dirty-tracking unit).
- NewCacheFromMemfd no longer logs-and-swallows the memfd close error;
  it returns it. The size==0 fast path drops; the loop is a no-op when
  ranges sum to 0.
- UseMemFdFlag comment matches the mmap-based implementation.
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 2bbbf3a. Configure here.

Comment thread packages/orchestrator/pkg/sandbox/fc/memory.go Outdated
NewCacheFromMemfd already closes the memfd on every error path; calling
memfd.Close() again here is a (harmless) double-close that muddies
ownership semantics.
Trim the diff further:

- MemfdCache embeds *Cache, dropping six delegate methods and the
  custom Close (Cache.Close handles the file; the memfd is already
  closed before the wrapper is returned).
- copyFromMemfd performs one copy per range instead of a chunked
  inner loop; the memfdCopyChunkSize constant goes away. Cancellation
  checks move to range boundaries.
- Memfd.Slice uses a simple nil-check instead of sync.Once for the
  lazy mmap; no concurrent callers in either the sync or the upcoming
  async path.
- Drop TestMemfdCache_DirtyBitmap (covered by MultipleRanges + the
  BitsetRanges adapter is exercised elsewhere) and the chunk-boundary
  test (no chunked loop anymore). Consolidate SliceOutOfBounds.
NewFromFd now mmaps the fd eagerly via fstat and returns
(*Memfd, error). The size field on Memfd goes away (use len(mmap)),
the lazy-init nil-check in Slice goes away, and the explicit size
computation in uffd.go is dropped — the kernel already knows the size.

Also standardize on golang.org/x/sys/unix (drop the mixed syscall import).
Memfd has a single owner across all paths: NewCacheFromMemfd consumes
it during construction, and the UFFD handshake transfers ownership via
atomic Swap. There is no path that calls Close twice on the same Memfd,
so the m.mmap=nil / m.fd=-1 / nil-check sentinels are dead weight.

Drop them and document the single-use contract.
Instead of materializing a []Range in the caller just to iterate it
once inside NewCacheFromMemfd, take the dirty *roaring.Bitmap and the
block size directly. Total cache size comes from cardinality * blockSize,
and copyFromMemfd iterates BitsetRanges in place. Drops the now-thin
exportMemoryFromMemfd helper in fc/memory.go.
The gRPC handler already injects sandbox/team/template contexts via
ctx, so team and template targeting for UseMemFdFlag already worked.
Add a sandbox-type attribute (sandbox vs build) and pass the explicit
sandboxLDContext to BoolFlag so flags can roll out to production
sandboxes separately from template-builds.
Cacher was a vague -er name for an interface with seven methods. The
type exists purely so the diff/upload layer can accept either *Cache or
*MemfdCache; DiffSource names that role.
- copyFromMemfd uses ctx.Err() instead of the select/default form.
- pauseProcessMemory runs ExportMemory before ToDiffHeader so the
  memfd is owned by ExportMemory throughout; the conditional close on
  ToDiffHeader failure goes away.
- fc.ExportMemory returns NewCacheFromMemfd directly; the "create
  MemfdCache" wrap is redundant with the inner error context.
PR 2522 introduced MemfdCache (wrapper around *Cache) and the
DiffSource interface purely so the async-copy follow-up could attach
extra state without churning callers. PR 2522 itself never uses the
indirection — NewCacheFromMemfd returns *Cache, fc.ExportMemory
returns *block.Cache, localDiff takes *block.Cache. Drop the
scaffolding from this PR; the async PR introduces its own wrapper
when it needs the override behavior.

Also:
- Inline the copyFromMemfd loop into NewCacheFromMemfd (single use).
- Trim tests to the two that earn their keep: non-adjacent blocks and
  the non-zero range-start regression.
- Tighten the use_memfd/firecracker-fd comments.
- Loop over fds for cleanup instead of two manual closes.
FC < 1.14 rejects the use_memfd field on snapshot load
(deny_unknown_fields on MemoryBackend), so combining
FCSupportsMemfd(version) with the flag avoids hard-failing resumes
when the flag is flipped on across a heterogeneous fleet.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants