Skip to content

Surface s3fs mount failures from mountBucket #650

@aron-cf

Description

@aron-cf

Problem

Sandbox.mountBucket() returns success even when the underlying s3fs mount silently fails. The user-facing symptom is that the mount endpoint reports {"ok":true} but no FUSE filesystem is ever attached: writes "succeed" against a plain local directory and never reach R2, and the later unmountBucket() blows up with

fusermount: entry for <mount> not found in /etc/mtab

because there was nothing to unmount. The unmount error is the first visible signal of a problem that began at mount time.

I hit this with a typo in the bucket name which caused R2 to reject the bucket check with 403 AccessDenied, but the SDK treated the mount as successful.

Root cause

In packages/sandbox/src/sandbox.ts, executeS3FSMount invokes s3fs in its default (daemonising) mode:

const mountCmd = `s3fs ${shellEscape(bucket)} ${shellEscape(mountPath)} -o ${optionsStr}`;
const result = sessionId
  ? await this.execWithSession(mountCmd, sessionId, { origin: 'internal' })
  : await this.execInternal(mountCmd);

if (result.exitCode !== 0) {
  throw new S3FSMountError(
    `S3FS mount failed: ${result.stderr || result.stdout || 'Unknown error'}`
  );
}

s3fs forks a child to run the FUSE event loop and the parent exits 0 before the bucket check completes. When the child then fails its SigV4 bucket check (auth error, wrong bucket name, network error, …) it logs Exiting FUSE event loop due to errors and dies — but the parent has long since returned success, so executeS3FSMount never throws.

I confirmed this directly by running s3fs … -o curldbg -f -d against a misnamed bucket: the foreground mount surfaces the 403 immediately, while the default backgrounded invocation exits 0 and leaves no mountpoint, no s3fs process, and no entry in /proc/self/mountinfo.

Suggested fix (from Claude)

After invoking s3fs, verify the mount actually established before returning. The cheapest reliable check is mountpoint -q <path> with a short retry loop to absorb the brief window between fork and the FUSE filesystem appearing in the kernel's mount table. If the path never becomes a mountpoint, throw S3FSMountError with whatever stderr captured (and, if possible, the s3fs log) so the caller sees a real error instead of a false ok.

A defensive secondary measure: capture s3fs stderr to a tempfile via -o logfile=… (or shell redirection) so the thrown error can include the actual reason (403 AccessDenied, unable to connect, etc.) instead of the empty stdout/stderr the parent produces.

For yr agent

Tasks

  • Reproduce the silent success in a unit/integration test: invoke mountBucket against a bogus bucket name and assert it throws S3FSMountError (currently it resolves)
  • In executeS3FSMount, after the s3fs exec, poll mountpoint -q <mountPath> with a small retry budget (e.g. up to ~2s, 50ms intervals) before declaring success
  • When verification fails, throw S3FSMountError and roll back the same way the existing catch does (delete password file, drop activeMounts entry); make sure mkdir -p'd mount-point directory is also removed so we don't leave a stale empty dir behind
  • Capture s3fs stderr (logfile or piped) and include the tail of it in the thrown error message so credential / bucket / network failures are diagnosable from the API response
  • Add an E2E case that exercises a failing mount (bad bucket or bad credentials) and asserts the bridge returns 502 with a meaningful error body — the existing happy-path E2E masked this class of bug for the entire life of the feature

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions