fix(volumes): broadcast volume directory creation to all cluster nodes by AdaAibaby · Pull Request #2595 · e2b-dev/infra

AdaAibaby · 2026-05-08T05:48:42Z

Problem

When a volume is created via POST /volumes, the directory is only
created on a single randomly-selected orchestrator node
(executeOnOrchestratorByClusterID uses rand.Shuffle and returns on
the first success). Sandbox placement (PlaceSandbox) is independent
and selects nodes based on CPU/RAM availability, with no awareness of
which node holds the volume directory.

This causes an intermittent failure: sandboxes that land on the same
node as the volume get a working NFS mount; sandboxes that land on any
other node fail to mount because the NFS proxy calls
syscall.Mount(path, path, MS_BIND) on a path that does not exist on
that node.

A second, related bug exists in envd/internal/api/init.go: when two
concurrent /init requests race on isMountingNFS, the second request
silently returns nil, causing the sandbox to start successfully with
no NFS mount.

Changes

volume_util.go

Add executeOnAllClusterNodes: fans out a gRPC call concurrently to
every ready node in the cluster using errgroup, returning a combined
error if any node fails.

volume_create.go

createVolume now calls executeOnAllClusterNodes instead of
executeOnOrchestratorByClusterID, so the volume directory is
created on every orchestrator node at creation time.

volume_delete.go

deleteVolume likewise broadcasts to all nodes to clean up the
directory everywhere it was created.

init.go

setupNFS: when isMountingNFS CAS fails (concurrent request),
return an explicit error instead of nil so the caller receives a
400 and can retry, rather than silently succeeding with no mount.

Testing

go build ./packages/api/... ./packages/envd/... packages. ✅
go vet ./packages/api/internal/handlers/... internal. ✅
go test internal. ✅
Manual: on a staging cluster with ≥2 orchestrator nodes, creating a
volume and then creating 10 sandboxes with that volume mount all
succeed with a valid NFS mount (mount | grep nfs non-empty in every
sandbox).

cla-bot · 2026-05-08T05:48:45Z

We require contributors to sign our Contributor License Agreement, and we don't have @adababys on file. You can sign our CLA at https://e2b.dev/docs/cla . Once you've signed, post a comment here that says '@cla-bot check'

qodo-code-review · 2026-05-08T05:48:46Z

ⓘ You've reached your Qodo monthly free-tier limit. Reviews pause until next month — upgrade your plan to continue now, or link your paid account if you already have one.

AdaAibaby · 2026-05-08T05:50:13Z

@cla-bot check

@cla-bot check

cla-bot · 2026-05-08T05:50:16Z

We require contributors to sign our Contributor License Agreement, and we don't have @adababys on file. You can sign our CLA at https://e2b.dev/docs/cla . Once you've signed, post a comment here that says '@cla-bot check'

cla-bot · 2026-05-08T05:50:18Z

The cla-bot has been summoned, and re-checked this pull request!

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bce5e5094d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-05-08T05:51:47Z

+		return ErrClusterNotFound
+	}
+
+	wg, wgCtx := errgroup.WithContext(ctx)


Do not cancel remaining node RPCs on first failure

Using errgroup.WithContext here cancels wgCtx as soon as one node returns an error, so other in-flight or not-yet-started node RPCs can be aborted before they execute. That breaks the intended “run on all nodes” behavior and can leave partial state (for example, deleteVolume may skip cleanup on healthy nodes after one early failure). Run goroutines under a non-canceling group and aggregate errors after all node attempts complete.

Useful? React with 👍 / 👎.

gemini-code-assist

Code Review

The executeOnAllClusterNodes function incorrectly uses errgroup.WithContext, which cancels all operations on the first error, preventing full cleanup during deletions, and returns success when no nodes are ready.

…ators

cla-bot · 2026-05-08T05:58:43Z

We require contributors to sign our Contributor License Agreement, and we don't have @adababys on file. You can sign our CLA at https://e2b.dev/docs/cla . Once you've signed, post a comment here that says '@cla-bot check'

adababys · 2026-05-08T06:11:35Z

We require contributors to sign our Contributor License Agreement, and we don't have @adababys on file. You can sign our CLA at https://e2b.dev/docs/cla . Once you've signed, post a comment here that says '@cla-bot check'

@cla-bot check

cla-bot · 2026-05-08T06:11:40Z

The cla-bot has been summoned, and re-checked this pull request!

fix(volumes): broadcast volume directory creation to all cluster nodes

bce5e50

AdaAibaby requested review from ValentaTomas, dobrac and jakubno as code owners May 8, 2026 05:48

e2b-request-same-site-reviewers Bot assigned jakubno May 8, 2026

chatgpt-codex-connector Bot reviewed May 8, 2026

View reviewed changes

gemini-code-assist Bot reviewed May 8, 2026

View reviewed changes

Comment thread packages/api/internal/handlers/volume_util.go Outdated

fix: check ready node count and return error when no healthy orchestr…

13af0a9

…ators

cla-bot Bot added the cla-signed label May 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(volumes): broadcast volume directory creation to all cluster nodes#2595

fix(volumes): broadcast volume directory creation to all cluster nodes#2595
AdaAibaby wants to merge 2 commits intoe2b-dev:mainfrom
AdaAibaby:fix-volumes

AdaAibaby commented May 8, 2026

Uh oh!

cla-bot Bot commented May 8, 2026

Uh oh!

qodo-code-review Bot commented May 8, 2026

Uh oh!

AdaAibaby commented May 8, 2026

Uh oh!

cla-bot Bot commented May 8, 2026

Uh oh!

cla-bot Bot commented May 8, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector Bot May 8, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

cla-bot Bot commented May 8, 2026

Uh oh!

adababys commented May 8, 2026

Uh oh!

cla-bot Bot commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

AdaAibaby commented May 8, 2026

Problem

Changes

Testing

Uh oh!

cla-bot Bot commented May 8, 2026

Uh oh!

qodo-code-review Bot commented May 8, 2026

Uh oh!

AdaAibaby commented May 8, 2026

Uh oh!

cla-bot Bot commented May 8, 2026

Uh oh!

cla-bot Bot commented May 8, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

cla-bot Bot commented May 8, 2026

Uh oh!

adababys commented May 8, 2026

Uh oh!

cla-bot Bot commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants