Skip to content

fix(producer): guard fileServer.close in all render cleanup paths#1406

Merged
jrusso1020 merged 2 commits into
heygen-com:mainfrom
calcarazgre646:fix/producer-fileserver-close-guard
Jun 16, 2026
Merged

fix(producer): guard fileServer.close in all render cleanup paths#1406
jrusso1020 merged 2 commits into
heygen-com:mainfrom
calcarazgre646:fix/producer-fileserver-close-guard

Conversation

@calcarazgre646

@calcarazgre646 calcarazgre646 commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Problem

fileServer.close() is called bare (no try/catch) in several render cleanup paths. FileServerHandle.close tears down the underlying http.Server, whose close() throws ERR_SERVER_NOT_RUNNING if the server was already torn down (for example a cancellation path that already closed it once). An unguarded throw escapes the cleanup and masks the original plan/render result.

The immediately adjacent probe-session close already guards against exactly this, and its comment spells it out:

// Close inside a try/catch — leaking a Chrome process here would mask
// the original plan() result on cancellation paths.

The file-server closes next to it were left unguarded. There were three:

  • distributed/plan.ts (probe cleanup),
  • distributed/renderChunk.ts (finally block),
  • renderOrchestrator.ts (the in-process render success path, right before the assemble stage — a throw here would skip assembly and mask the result).

Fix

Add closeFileServerSafely(fileServer, label, log) in fileServer.ts, which wraps the close in a try/catch and logs, and route all three sites through it.

Tests

Unit tests for closeFileServerSafely: a throwing close() is swallowed and logged (does not propagate), and the happy path closes exactly once with no warning. plan test suite stays green. (The renderChunk byte-identical determinism test is host-Chrome dependent and soft-skips outside Dockerfile.test.)

`plan()` and `renderChunk()` both close the probe/chunk file server with a
bare `fileServer.close()` in their cleanup sequence. `FileServerHandle.close`
tears down the underlying http.Server, whose `close()` throws
`ERR_SERVER_NOT_RUNNING` if the server was already torn down (for example a
cancellation path that closed it once already). An unguarded throw there
escapes the cleanup and masks the original plan/render result, exactly the
failure the adjacent probe-session close already guards against with a
try/catch (its comment even spells this out).

Add `closeFileServerSafely`, which wraps the close in a try/catch and logs,
and route both cleanup sites through it so the two stay consistent and a
throwing close can never mask the real result. Covered by unit tests for
both the throwing and happy paths.
@calcarazgre646 calcarazgre646 changed the title fix(producer): guard fileServer.close in distributed cleanup paths fix(producer): guard fileServer.close in all render cleanup paths Jun 13, 2026

@jrusso1020 jrusso1020 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed end-to-end. Small, correct defensive-hardening fix — approving.

Strengths

  • Correct root cause. FileServerHandle.close() calls http.Server.close() with no callback (fileServer.ts:671), and Node's net.Server.close() throws ERR_SERVER_NOT_RUNNING synchronously on a double-close — so a bare close in a finally/cleanup path genuinely can escape and mask the real result. A sync try/catch is exactly the right shape here.
  • Highest-value sites are renderChunk.ts:657 (bare close in a finally → classic finally-masks-the-original-exception) and the renderOrchestrator.ts:2277 success path (a throw there skips the assemble stage and fails an otherwise-successful render). Both now guarded, consistent with the adjacent probe-session try/catch the PR description cites.
  • Same-contract coverage is complete: all three direct producer close sites are routed through the guard; the cancel/error paths already go through safeCleanup / cleanupRenderResources (render/cleanup.ts); and probeStage.ts hands the handle back to the caller rather than closing it (header comment, lines 6–8) — so there's no missed site.
  • Tests pin both branches: throwing close → swallowed + logged with the [label] and error message; happy path → closes exactly once, no warning.

Nits (non-blocking)

  • Observability: the new helper logs at log.warn (fileServer.ts), but the pre-existing safeCleanup logs the same class of failure at log.debug (render/cleanup.ts:29). Since ERR_SERVER_NOT_RUNNING on a double-close is the expected, benign condition this guard exists for, the distributed/cancel paths will now emit WARN-level noise for a non-actionable event. Consider debug for consistency, or a short comment on why the success path warrants warn.
  • DRY: there are now two "swallow + log a failed close" helpers (safeCleanup vs closeFileServerSafely). The split is defensible (module layering — cleanup.ts imports orchestrator types, so importing it into fileServer.ts risks a cycle; and the new helper is sync while safeCleanup is async), but a one-line cross-reference comment would help future readers pick the right one.

Note (out of scope): packages/engine/src/services/fileServer.ts:106 has the same bare close: () => server.close() shape in a separate module — same latent throw if any engine consumer double-closes. Separate package; flagging as a possible follow-up, not a gap in this PR.

Verdict: APPROVE
Reasoning: Real synchronous-throw bug, confirmed against Node's net.Server.close() and the handle impl; all same-contract producer sites are covered; good tests; CI fully green. Only minor observability/DRY nits.

— Claude Code (pr-review)

@jrusso1020 jrusso1020 merged commit 1f9c70c into heygen-com:main Jun 16, 2026
42 of 60 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants