Skip to content

[vitest-pool-workers] DO inputGateBroken fires ~150×/run as a normal lifecycle event #13761

@webbertakken

Description

@webbertakken

Note: This issue is written by my clanker. While I try to verify its steps as much as possible, it might not be 100% correct about everything


What I observed

Running vitest run (no watch, clean working tree, no editor auto-save, source committed) on a Workers test suite emits ~162 occurrences of:

workerd/jsg/exception.c++:146: info: Annotating with brokenness;
internalMessage = jsg.Error: /…/src/index.ts changed,
invalidating this Durable Object.
Please retry the `DurableObjectStub#fetch()` call.;
brokennessReason = broken.inputGateBroken

The source files do not actually change during the run. The message is misleading.

Suite shape (using the recommended config)

  • 113 test files, 1322 tests
  • pool: '@cloudflare/vitest-pool-workers'
  • singleWorker: true ← documented as the recommendation for sharing module cache
  • isolatedStorage: false
  • wrangler: { configPath: './wrangler.toml' }
  • @cloudflare/vitest-pool-workers 0.12.x · wrangler 4.45.0 · vitest 3.2.4 · workerd 1.20251011.0

No WorkersPoolOptions setting (verified in dist/pool/config.d.ts) exposes a way to disable this behaviour, so this is a structural cost in the pool, not a misconfiguration.

What's actually happening

In dist/worker/lib/cloudflare/test-internal.mjs, every DO method invocation goes through kEnsureInstance, which re-imports the worker module via instance.executor.executeId(specifier) (vite-node) and reference-equality-checks the constructor against the cached one:

Wrapper.prototype[kEnsureInstance] = async function() {
  const mainModule = await importModule(env, mainPath)
  const constructor = mainModule[className]
  this[kInstanceConstructor] ??= constructor
  if (this[kInstanceConstructor] !== constructor) {
    await ctx.blockConcurrencyWhile(() => {
      throw new Error(`${mainPath} changed, invalidating this Durable Object. ...`)
    })
  }
  // ...
}

Vite-node returns a fresh constructor reference whenever it re-evaluates the module — which it does between test files for isolation. Reference-equality fails → the DO is marked broken and the inputGateBroken exception is logged.

Why it matters

  1. Log noise: 100–200 multi-line stack traces per clean run, drowning out actual test output and making real failures harder to spot.
  2. Misleading message: "<path> changed, invalidating this Durable Object" is wrong — the file didn't change. New users hit this and assume their code is broken or their working tree is dirty when it isn't.
  3. Forces a retry wrapper on every consumer: every call site of DurableObjectStub#fetch() ends up needing retry-with-backoff. With naive backoff (e.g. 2000 ms × 3 retries), worst-case sleep on a 76-second suite is 162 × 3 × 2 s = 972 s — over 16 minutes of pure sleep added per run. Even modest backoff (200 ms) adds ~97 s.
  4. Compounds with other slowdowns: combined with a missing testTimeout and uncapped vitest forks, we observed a single test run take 5000+ s without completion.

Proposed direction (ordered by risk, low → high)

  1. Improve the error message"<path> changed" is factually wrong when the trigger is module-cache invalidation between test files, not a source-file change. Something like "Durable Object instance invalidated by test isolation between files; please retry" would set correct expectations and stop sending users on a fruitless source-of-truth hunt.

  2. Lower the log level for this specific path — when kEnsureInstance detects a constructor mismatch, it's expected during normal test isolation, not a true exception. console.debug or a custom [vitest-pool-workers] channel would let users opt-in via a flag without flooding stderr.

  3. Make the retry transparent inside the poolkEnsureInstance could itself drop the cached constructor and re-instantiate on mismatch, returning a fresh stub seamlessly. Then user code never sees inputGateBroken during normal runs and no retry wrapper is needed at the call sites.

Workaround we landed

Cap the backoff in our DO-stub retry wrapper. Took the worst-case sleep added per run from ~972 s (2000 ms cap) to ~97 s (200 ms cap) without affecting production semantics — production never sees the path because real workerd doesn't re-import.

// apps/api/src/core/durable-objects/durableObjectClient.ts
const maxAttempts = 3
const baseBackoffMs = 100
const maxBackoffMs = 200  // was 2000 — production transient blips still recover

Repro

Happy to share a minimal repro repo on request. The smallest reproducer would be a worker exporting a Durable Object that does any internal work, plus 100+ trivial test files each making a single stub.fetch() call. The error count scales roughly linearly with file count.

Metadata

Metadata

Labels

awaiting reporter responseNeeds clarification or followup from OPvitestRelating to the Workers Vitest integration

Type

No type
No fields configured for issues without a type.

Projects

Status

Untriaged

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions