[Bug]: Frequent crashes due to vsock

### I have done the following

- [x] I have searched the existing issues
- [ ] If possible, I've reproduced the issue using the 'main' branch of this project

### Steps to reproduce

Not well understood

### Current behavior

I've been seeing frequent crashes with an app that uses Containerization. The following is Claude Code's analysis of three crash reports from today:

Related: #403, #503, #552, #572

## Environment

- macOS 26.4 (25E246), Apple Silicon (Mac15,3)
- Xcode 26.4 (17E192), Swift 6.3
- Containerization from commit 250546f
- grpc-swift 1.x (the version currently pinned in Containerization's Package.resolved)
- Debug build run from Xcode

## Three crashes, two patterns

All three crashes are `EXC_BREAKPOINT` (SIGTRAP) caused by errno 9 (`EBADF`) on a vsock file descriptor used by the gRPC connection to the VM agent. The gRPC/NIO stack hits a precondition failure (`preconditionIsNotUnacceptableErrno`) when the syscall returns `EBADF`.

### Crash 1: 07:36 (overnight, sleep possibly involved)

- **Runtime:** ~14 hours (launched previous day at 17:25)
- **Crash site:** `fcntl(F_SETNOSIGPIPE)` on the vsock fd — the gRPC client was idle, and when a new `exec()` triggered an RPC, the `ConnectionManager` tried to reconnect using the original `.connectedSocket(fd)` target. The fd was already closed.
- **Likely cause:** The grpc-swift default idle timeout (30 minutes) closed the connection's socket, and with `connectionBackoff = nil`, the reconnection attempt reused the closed fd. Sleep/wake may also be a factor given the 14-hour runtime.

Relevant stack:

```
0: _assertionFailure
1: preconditionIsNotUnacceptableErrno(err:where:)
2: syscall(blocking:where:_:)
3: Posix.fcntl(descriptor:command:value:)
4: BaseSocketProtocol.ignoreSIGPIPE(descriptor:)
...
12: ClientBootstrap.withConnectedSocket(_:)
...
17: ConnectionManager.startConnecting(...)
```

### Crashes 2 and 3: 11:46 and 13:42 (no sleep)

- **Runtime:** ~1h 49m and ~1h 47m respectively
- **Machine did not sleep** during these runs
- **Crash site:** `writev` on the vsock fd — the gRPC client believed the connection was still active and attempted a new RPC. The `writev` on the vsock socket got `EBADF`.

Unlike crash 1, this is not a reconnection attempt. The gRPC client considers the connection live and tries to use it normally, but the underlying vsock fd has been invalidated by something external.

Both crashes have identical stacks. Key frames:

```
0: _assertionFailure
1: preconditionIsNotUnacceptableErrno(err:where:)
2: syscall(blocking:where:_:)
3: Posix.writev(descriptor:iovecs:)          <-- EBADF on the vsock fd
...
57: GRPCClientChannelHandler.flush(context:)  <-- gRPC initiating a new RPC
```

## Timeline from system logs

### Crash 2 (11:46), launched at 09:57

```
09:57:44  VM 1 created
09:57:45  VM 2 created
          (no container-related activity in between)
11:46:04  Precondition failed: unacceptable errno 9 Bad file descriptor in writev
```

### Crash 3 (13:42), launched at 11:55

```
11:55:44  VM 1 created
11:55:45  VM 2 created
          (no container-related activity until a third VM is created ~97 minutes later)
13:32:59  VM 3 created
13:42:09  Precondition failed: unacceptable errno 9 Bad file descriptor in writev
```

No TimeSyncer errors ("failed to sync time") were logged during either run, which means the TimeSyncer's agent connection (also created via `dupHandle()`) remained healthy throughout.

## Observations

1. **The ~1h 47m timing is consistent.** Both non-sleep crashes happened almost exactly 107-109 minutes after launch. This suggests a timer or deferred operation rather than a random race.

2. **The fd was valid initially.** The crashes happen on a vsock connection that was already established and working. Something invalidated the fd after it was in use.

3. **The fd was not closed by the gRPC/NIO layer.** The gRPC channel is still in an active state at the time of the crash. The fd was invalidated by something outside the gRPC stack.

4. **`connectionBackoff = nil` and default idle timeout.** `Vminitd.Client.init` sets `connectionBackoff = nil` but does not override `connectionIdleTimeout`, which defaults to 30 minutes in grpc-swift. The init process agent (created during `LinuxContainer.start()`) goes idle after the `createProcess`/`startProcess` RPCs complete. The 30-minute idle timeout would close that connection's socket. This explains crash 1 (the reconnection path), but 30 minutes doesn't match the ~107-minute timing of crashes 2 and 3.

5. **Two VMs, different crashes.** The app creates two VMs at launch. Crash 1 was on one VM's gRPC client, crashes 2 and 3 were on the other's.

## Questions

- Could the Virtualization framework be doing deferred cleanup of vsock endpoints after `VZVirtioSocketConnection.close()` is called in `dupHandle()`? The ~107-minute delay suggests something on a schedule rather than immediate invalidation.
- Is there any internal timeout or resource reclamation in the vsock/XPC layer that could invalidate file descriptors for established connections?
- Would disabling the grpc-swift idle timeout (setting `connectionIdleTimeout` to a very large value) and/or adding keepalive be worth trying to narrow down whether it's involved?

### Expected behavior

Shouldn't crash

### Environment

```markdown
See above
```

### Relevant log output

```shell
See above
```

### Code of Conduct

- [x] I agree to follow this project's Code of Conduct

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Frequent crashes due to vsock #678

I have done the following

Steps to reproduce

Current behavior

Environment

Three crashes, two patterns

Crash 1: 07:36 (overnight, sleep possibly involved)

Crashes 2 and 3: 11:46 and 13:42 (no sleep)

Timeline from system logs

Crash 2 (11:46), launched at 09:57

Crash 3 (13:42), launched at 11:55

Observations

Questions

Expected behavior

Environment

Relevant log output

Code of Conduct

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: Frequent crashes due to vsock #678

Description

I have done the following

Steps to reproduce

Current behavior

Environment

Three crashes, two patterns

Crash 1: 07:36 (overnight, sleep possibly involved)

Crashes 2 and 3: 11:46 and 13:42 (no sleep)

Timeline from system logs

Crash 2 (11:46), launched at 09:57

Crash 3 (13:42), launched at 11:55

Observations

Questions

Expected behavior

Environment

Relevant log output

Code of Conduct

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions