[vitest-pool-workers] DO inputGateBroken fires ~150×/run as a normal lifecycle event

*Note: This issue is written by my clanker. While I try to verify its steps as much as possible, it might not be 100% correct about everything*

---

## What I observed

Running `vitest run` (no watch, clean working tree, no editor auto-save, source committed) on a Workers test suite emits **~162 occurrences** of:

```
workerd/jsg/exception.c++:146: info: Annotating with brokenness;
internalMessage = jsg.Error: /…/src/index.ts changed,
invalidating this Durable Object.
Please retry the `DurableObjectStub#fetch()` call.;
brokennessReason = broken.inputGateBroken
```

The source files do **not** actually change during the run. The message is misleading.

## Suite shape (using the recommended config)

- 113 test files, 1322 tests
- `pool: '@cloudflare/vitest-pool-workers'`
- `singleWorker: true`  ← documented as the recommendation for sharing module cache
- `isolatedStorage: false`
- `wrangler: { configPath: './wrangler.toml' }`
- `@cloudflare/vitest-pool-workers` 0.12.x · `wrangler` 4.45.0 · `vitest` 3.2.4 · `workerd` 1.20251011.0

No `WorkersPoolOptions` setting (verified in `dist/pool/config.d.ts`) exposes a way to disable this behaviour, so this is a structural cost in the pool, not a misconfiguration.

## What's actually happening

In `dist/worker/lib/cloudflare/test-internal.mjs`, every DO method invocation goes through `kEnsureInstance`, which re-imports the worker module via `instance.executor.executeId(specifier)` (vite-node) and reference-equality-checks the constructor against the cached one:

```js
Wrapper.prototype[kEnsureInstance] = async function() {
  const mainModule = await importModule(env, mainPath)
  const constructor = mainModule[className]
  this[kInstanceConstructor] ??= constructor
  if (this[kInstanceConstructor] !== constructor) {
    await ctx.blockConcurrencyWhile(() => {
      throw new Error(`${mainPath} changed, invalidating this Durable Object. ...`)
    })
  }
  // ...
}
```

Vite-node returns a fresh constructor reference whenever it re-evaluates the module — which it does between test files for isolation. Reference-equality fails → the DO is marked broken and the `inputGateBroken` exception is logged.

## Why it matters

1. **Log noise**: 100–200 multi-line stack traces per clean run, drowning out actual test output and making real failures harder to spot.
2. **Misleading message**: `"<path> changed, invalidating this Durable Object"` is wrong — the file didn't change. New users hit this and assume their code is broken or their working tree is dirty when it isn't.
3. **Forces a retry wrapper on every consumer**: every call site of `DurableObjectStub#fetch()` ends up needing retry-with-backoff. With naive backoff (e.g. 2000 ms × 3 retries), worst-case sleep on a 76-second suite is `162 × 3 × 2 s = 972 s` — over 16 minutes of pure sleep added per run. Even modest backoff (200 ms) adds ~97 s.
4. **Compounds with other slowdowns**: combined with a missing `testTimeout` and uncapped vitest forks, we observed a single test run take 5000+ s without completion.

## Proposed direction (ordered by risk, low → high)

1. **Improve the error message** — `"<path> changed"` is factually wrong when the trigger is module-cache invalidation between test files, not a source-file change. Something like `"Durable Object instance invalidated by test isolation between files; please retry"` would set correct expectations and stop sending users on a fruitless source-of-truth hunt.

2. **Lower the log level for this specific path** — when `kEnsureInstance` detects a constructor mismatch, it's expected during normal test isolation, not a true exception. `console.debug` or a custom `[vitest-pool-workers]` channel would let users opt-in via a flag without flooding stderr.

3. **Make the retry transparent inside the pool** — `kEnsureInstance` could itself drop the cached constructor and re-instantiate on mismatch, returning a fresh stub seamlessly. Then user code never sees `inputGateBroken` during normal runs and no retry wrapper is needed at the call sites.

## Workaround we landed

Cap the backoff in our DO-stub retry wrapper. Took the worst-case sleep added per run from ~972 s (2000 ms cap) to ~97 s (200 ms cap) without affecting production semantics — production never sees the path because real workerd doesn't re-import.

```ts
// apps/api/src/core/durable-objects/durableObjectClient.ts
const maxAttempts = 3
const baseBackoffMs = 100
const maxBackoffMs = 200  // was 2000 — production transient blips still recover
```

## Repro

Happy to share a minimal repro repo on request. The smallest reproducer would be a worker exporting a Durable Object that does any internal work, plus 100+ trivial test files each making a single `stub.fetch()` call. The error count scales roughly linearly with file count.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[vitest-pool-workers] DO inputGateBroken fires ~150×/run as a normal lifecycle event #13761

What I observed

Suite shape (using the recommended config)

What's actually happening

Why it matters

Proposed direction (ordered by risk, low → high)

Workaround we landed

Repro

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[vitest-pool-workers] DO inputGateBroken fires ~150×/run as a normal lifecycle event #13761

Description

What I observed

Suite shape (using the recommended config)

What's actually happening

Why it matters

Proposed direction (ordered by risk, low → high)

Workaround we landed

Repro

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions