I have done the following
Steps to reproduce
Not well understood
Current behavior
I've been seeing frequent crashes with an app that uses Containerization. The following is Claude Code's analysis of three crash reports from today:
Related: #403, #503, #552, #572
Environment
- macOS 26.4 (25E246), Apple Silicon (Mac15,3)
- Xcode 26.4 (17E192), Swift 6.3
- Containerization from commit 250546f
- grpc-swift 1.x (the version currently pinned in Containerization's Package.resolved)
- Debug build run from Xcode
Three crashes, two patterns
All three crashes are EXC_BREAKPOINT (SIGTRAP) caused by errno 9 (EBADF) on a vsock file descriptor used by the gRPC connection to the VM agent. The gRPC/NIO stack hits a precondition failure (preconditionIsNotUnacceptableErrno) when the syscall returns EBADF.
Crash 1: 07:36 (overnight, sleep possibly involved)
- Runtime: ~14 hours (launched previous day at 17:25)
- Crash site:
fcntl(F_SETNOSIGPIPE) on the vsock fd — the gRPC client was idle, and when a new exec() triggered an RPC, the ConnectionManager tried to reconnect using the original .connectedSocket(fd) target. The fd was already closed.
- Likely cause: The grpc-swift default idle timeout (30 minutes) closed the connection's socket, and with
connectionBackoff = nil, the reconnection attempt reused the closed fd. Sleep/wake may also be a factor given the 14-hour runtime.
Relevant stack:
0: _assertionFailure
1: preconditionIsNotUnacceptableErrno(err:where:)
2: syscall(blocking:where:_:)
3: Posix.fcntl(descriptor:command:value:)
4: BaseSocketProtocol.ignoreSIGPIPE(descriptor:)
...
12: ClientBootstrap.withConnectedSocket(_:)
...
17: ConnectionManager.startConnecting(...)
Crashes 2 and 3: 11:46 and 13:42 (no sleep)
- Runtime: ~1h 49m and ~1h 47m respectively
- Machine did not sleep during these runs
- Crash site:
writev on the vsock fd — the gRPC client believed the connection was still active and attempted a new RPC. The writev on the vsock socket got EBADF.
Unlike crash 1, this is not a reconnection attempt. The gRPC client considers the connection live and tries to use it normally, but the underlying vsock fd has been invalidated by something external.
Both crashes have identical stacks. Key frames:
0: _assertionFailure
1: preconditionIsNotUnacceptableErrno(err:where:)
2: syscall(blocking:where:_:)
3: Posix.writev(descriptor:iovecs:) <-- EBADF on the vsock fd
...
57: GRPCClientChannelHandler.flush(context:) <-- gRPC initiating a new RPC
Timeline from system logs
Crash 2 (11:46), launched at 09:57
09:57:44 VM 1 created
09:57:45 VM 2 created
(no container-related activity in between)
11:46:04 Precondition failed: unacceptable errno 9 Bad file descriptor in writev
Crash 3 (13:42), launched at 11:55
11:55:44 VM 1 created
11:55:45 VM 2 created
(no container-related activity until a third VM is created ~97 minutes later)
13:32:59 VM 3 created
13:42:09 Precondition failed: unacceptable errno 9 Bad file descriptor in writev
No TimeSyncer errors ("failed to sync time") were logged during either run, which means the TimeSyncer's agent connection (also created via dupHandle()) remained healthy throughout.
Observations
-
The ~1h 47m timing is consistent. Both non-sleep crashes happened almost exactly 107-109 minutes after launch. This suggests a timer or deferred operation rather than a random race.
-
The fd was valid initially. The crashes happen on a vsock connection that was already established and working. Something invalidated the fd after it was in use.
-
The fd was not closed by the gRPC/NIO layer. The gRPC channel is still in an active state at the time of the crash. The fd was invalidated by something outside the gRPC stack.
-
connectionBackoff = nil and default idle timeout. Vminitd.Client.init sets connectionBackoff = nil but does not override connectionIdleTimeout, which defaults to 30 minutes in grpc-swift. The init process agent (created during LinuxContainer.start()) goes idle after the createProcess/startProcess RPCs complete. The 30-minute idle timeout would close that connection's socket. This explains crash 1 (the reconnection path), but 30 minutes doesn't match the ~107-minute timing of crashes 2 and 3.
-
Two VMs, different crashes. The app creates two VMs at launch. Crash 1 was on one VM's gRPC client, crashes 2 and 3 were on the other's.
Questions
- Could the Virtualization framework be doing deferred cleanup of vsock endpoints after
VZVirtioSocketConnection.close() is called in dupHandle()? The ~107-minute delay suggests something on a schedule rather than immediate invalidation.
- Is there any internal timeout or resource reclamation in the vsock/XPC layer that could invalidate file descriptors for established connections?
- Would disabling the grpc-swift idle timeout (setting
connectionIdleTimeout to a very large value) and/or adding keepalive be worth trying to narrow down whether it's involved?
Expected behavior
Shouldn't crash
Environment
Relevant log output
Code of Conduct
I have done the following
Steps to reproduce
Not well understood
Current behavior
I've been seeing frequent crashes with an app that uses Containerization. The following is Claude Code's analysis of three crash reports from today:
Related: #403, #503, #552, #572
Environment
Three crashes, two patterns
All three crashes are
EXC_BREAKPOINT(SIGTRAP) caused by errno 9 (EBADF) on a vsock file descriptor used by the gRPC connection to the VM agent. The gRPC/NIO stack hits a precondition failure (preconditionIsNotUnacceptableErrno) when the syscall returnsEBADF.Crash 1: 07:36 (overnight, sleep possibly involved)
fcntl(F_SETNOSIGPIPE)on the vsock fd — the gRPC client was idle, and when a newexec()triggered an RPC, theConnectionManagertried to reconnect using the original.connectedSocket(fd)target. The fd was already closed.connectionBackoff = nil, the reconnection attempt reused the closed fd. Sleep/wake may also be a factor given the 14-hour runtime.Relevant stack:
Crashes 2 and 3: 11:46 and 13:42 (no sleep)
writevon the vsock fd — the gRPC client believed the connection was still active and attempted a new RPC. Thewritevon the vsock socket gotEBADF.Unlike crash 1, this is not a reconnection attempt. The gRPC client considers the connection live and tries to use it normally, but the underlying vsock fd has been invalidated by something external.
Both crashes have identical stacks. Key frames:
Timeline from system logs
Crash 2 (11:46), launched at 09:57
Crash 3 (13:42), launched at 11:55
No TimeSyncer errors ("failed to sync time") were logged during either run, which means the TimeSyncer's agent connection (also created via
dupHandle()) remained healthy throughout.Observations
The ~1h 47m timing is consistent. Both non-sleep crashes happened almost exactly 107-109 minutes after launch. This suggests a timer or deferred operation rather than a random race.
The fd was valid initially. The crashes happen on a vsock connection that was already established and working. Something invalidated the fd after it was in use.
The fd was not closed by the gRPC/NIO layer. The gRPC channel is still in an active state at the time of the crash. The fd was invalidated by something outside the gRPC stack.
connectionBackoff = niland default idle timeout.Vminitd.Client.initsetsconnectionBackoff = nilbut does not overrideconnectionIdleTimeout, which defaults to 30 minutes in grpc-swift. The init process agent (created duringLinuxContainer.start()) goes idle after thecreateProcess/startProcessRPCs complete. The 30-minute idle timeout would close that connection's socket. This explains crash 1 (the reconnection path), but 30 minutes doesn't match the ~107-minute timing of crashes 2 and 3.Two VMs, different crashes. The app creates two VMs at launch. Crash 1 was on one VM's gRPC client, crashes 2 and 3 were on the other's.
Questions
VZVirtioSocketConnection.close()is called indupHandle()? The ~107-minute delay suggests something on a schedule rather than immediate invalidation.connectionIdleTimeoutto a very large value) and/or adding keepalive be worth trying to narrow down whether it's involved?Expected behavior
Shouldn't crash
Environment
Relevant log output
Code of Conduct