Skip to content

fix(gateway): surface missing VC++ Redistributable diagnostic (issue #884)#888

Draft
hazeone wants to merge 10 commits into
mainfrom
cursor/issue-884-root-cause-ee26
Draft

fix(gateway): surface missing VC++ Redistributable diagnostic (issue #884)#888
hazeone wants to merge 10 commits into
mainfrom
cursor/issue-884-root-cause-ee26

Conversation

@hazeone
Copy link
Copy Markdown
Contributor

@hazeone hazeone commented Apr 22, 2026

Summary

Closes #884.

On Windows, OpenClaw 2026.4.x's bundled acpx plugin runs an embedded
ACP runtime probe at Gateway startup by spawning
npx @zed-industries/codex-acp@^0.11.x. That npm package re-executes a
Rust-native binary (@zed-industries/codex-acp-win32-x64) which depends
on the Microsoft Visual C++ 2015–2022 Redistributable
(VCRUNTIME140.dll / MSVCP140.dll / VCRUNTIME140_1.dll). When those
DLLs are missing, Windows terminates the child with
exit=3221225781 (0xC0000135 / STATUS_DLL_NOT_FOUND), the probe
never completes, and chat.history sits in "unavailable during gateway
startup" until the 35 s RPC timeout fires. All retries fail and the
chat UI appears frozen for a minute+ with no actionable hint.

This PR turns that cryptic failure into a clear, actionable prompt and
stops the chat UI from burning the full retry schedule.

Related Issue(s)

Closes #884

Type of Change

  • Bug fix
  • New feature (in-app diagnostic + Doctor host check)
  • Documentation

What changed

P0 — Detect and surface the problem

  • electron/gateway/startup-stderr.ts
  • electron/gateway/manager.ts
    • Stores active diagnostics per Gateway session, clears on every
      start(), emits a new diagnostic event, and exposes them via
      getStatus().activeDiagnostics so renderer consumers observe them
      through the existing status flow.
  • electron/main/ipc-handlers.ts + electron/preload/index.ts +
    src/lib/host-events.ts
    • Forward a new gateway:diagnostic IPC channel to the renderer.
  • src/types/gateway.ts + src/stores/gateway.ts
    • Type the new activeDiagnostics field and subscribe to
      gateway:diagnostic as a redundant signal.
  • src/components/diagnostics/GatewayDiagnosticsBanner.tsx (new)
    • Reusable amber banner for known diagnostic codes. Offers
      "Download VC++ Redistributable" (→ https://aka.ms/vs/17/release/vc_redist.x64.exe)
      and "Learn more" (→ Microsoft docs) as primary/secondary CTAs.
  • src/pages/Chat/index.tsx
    • Renders the banner above the existing chat error bar.
  • src/i18n/locales/{en,zh,ja,ru}/common.json
    • Adds common.diagnostics.acpxVcRedistMissing.{title,body,downloadButton,learnMoreButton}.

P1 — Don't block the chat UI

  • src/stores/chat/history-startup-retry.ts
    • New hasFatalStartupDiagnostic(status) helper.
    • shouldRetryStartupHistoryLoad now bails out immediately when the
      active diagnostics contain a fatal code.
  • src/stores/chat/history-actions.ts
    • After retries exhaust, suppresses the generic "Failed to load chat
      history" error bar when a fatal diagnostic is present — the banner
      already communicates the real root cause.

P1 — Sanitiser regression guards

  • tests/unit/sanitize-config.test.ts
    • Locks in that the existing sanitiser preserves
      plugins.entries.acpx.enabled=false (user opt-out) and
      plugins.entries.acpx.config.probeAgent=<custom> (probe-agent
      override). Both were already correct; tests are guards.

P2 — Doctor Windows runtime check

  • electron/utils/windows-runtime-check.ts (new)
    • Probes System32 / SysWOW64 for the three MSVC DLLs. Returns
      applicable=false on non-Windows platforms so Doctor is a no-op
      elsewhere.
  • electron/utils/openclaw-doctor.ts
    • runOpenClawDoctor / runOpenClawDoctorFix now include
      hostChecks.msvcRuntime in the result.
  • src/pages/Settings/index.tsx
    • Renders a badge + actionable hint directing the user to
      vc_redist.x64.exe when the runtime is missing. Non-Windows shows
      nothing.
  • src/i18n/locales/{en,zh,ja,ru}/settings.json
    • New developer.doctorHostChecks / doctorMsvcOk /
      doctorMsvcMissing / doctorMsvcMissingHint.

Docs

  • README.md, README.zh-CN.md, README.ja-JP.md, README.ru-RU.md
    • New Troubleshooting section explaining exit=3221225781, the
      PowerShell one-liner fix, the new banner/Doctor behaviour, and the
      plugins.entries.acpx.enabled=false opt-out.

Validation

  • pnpm run lint — 0 errors (1 pre-existing warning in
    Chat/index.tsx unrelated to this PR).
  • pnpm typecheck (npx tsc --noEmit) — clean.
  • pnpm test542/542 passing (added 19 new tests across
    gateway-startup-stderr.test.ts, gateway-diagnostics-banner.test.tsx,
    history-startup-retry.test.ts, windows-runtime-check.test.ts, plus 2
    new tests in sanitize-config.test.ts).
  • ✅ Playwright E2E subset passes with xvfb-run -a npx playwright test:
    • tests/e2e/gateway-diagnostics-banner.spec.ts (new) — banner renders.
    • tests/e2e/chat-history-startup-retry.spec.ts — no regression.
    • tests/e2e/app-smoke.spec.ts, tests/e2e/main-navigation.spec.ts
      no regression.
  • pnpm build:vite clean.

Visual proof

Banner rendered in the Electron app (captured via the E2E spec with
CLAWX_BANNER_SCREENSHOT=...):

ACPX_VC_REDIST_MISSING banner in Chat page

Checklist

  • I ran relevant checks/tests locally (lint, typecheck, 542 unit
    tests, 6 E2E tests).
  • I updated docs (all four README variants).
  • I verified there are no unrelated changes in this PR.
  • Added an Electron E2E spec for the new UI surface (per
    AGENTS.md "UI change validation" rule).
  • Comms paths not touched — no comms:replay / comms:compare
    needed.

To show artifacts inline, enable in settings.

Open in Web Open in Cursor 

cursoragent and others added 10 commits April 22, 2026 04:17
Detect the Windows STATUS_DLL_NOT_FOUND (0xC0000135 / exit=3221225781)
crash of the embedded acpx ACP runtime probe introduced in
openclaw 2026.4. This is the root cause of issue #884 where the MSVC
2015-2022 runtime is missing and chat.history stalls waiting on the
plugin to become ready.

Returns a structured { code, rawLine, detail } when recognised so
the main process and renderer can react (surface a banner, bail out
of retries, etc).  No behaviour change yet -- detector is unused.

Co-authored-by: Haze <hazeone@users.noreply.github.com>
When the startup stderr detector fires a GatewayStartupDiagnostic, store
it on the manager, emit a 'diagnostic' event, and surface it via
getStatus().activeDiagnostics so renderer consumers pick it up through
the existing status flow without extra subscriptions.

Diagnostics are cleared at the start of every start() cycle so stale
entries from a previous session don't leak into a fresh one.

Co-authored-by: Haze <hazeone@users.noreply.github.com>
- ipc-handlers: forward new GatewayManager 'diagnostic' event
- preload: whitelist 'gateway:diagnostic' channel
- host-events: map 'gateway:diagnostic' to the IPC channel
- types: export GatewayStartupDiagnosticSnapshot / code
- store: subscribe to diagnostic events and merge into status.activeDiagnostics

Co-authored-by: Haze <hazeone@users.noreply.github.com>
- Add reusable <GatewayDiagnosticsBanner /> component that reads
  status.activeDiagnostics from the gateway store and renders one
  amber banner per known diagnostic (ACPX_VC_REDIST_MISSING today).
- Banner offers 'Download VC++ Redistributable' CTA (aka.ms/vs/17
  /release/vc_redist.x64.exe) and a learn-more link to Microsoft
  docs, both opened via window.electron.openExternal.
- Localise banner strings in en, zh, ja, ru under common.json.
- Wire the banner into the Chat page just above the existing error bar.
- Add unit test covering render/no-render and link click behaviour.

Co-authored-by: Haze <hazeone@users.noreply.github.com>
…active

When the Gateway stderr classifier raises a fatal diagnostic such as
ACPX_VC_REDIST_MISSING, the plugin startup will never recover without
user action (installing the VC++ Redistributable).  Stop burning 90+
seconds on the retry schedule -- the banner already communicates the
root cause.  Also suppress the generic 'Failed to load chat history'
error bar when such a diagnostic is present so the user is not
confused by two messages.

Co-authored-by: Haze <hazeone@users.noreply.github.com>
Add two regression tests to sanitize-config covering issue #884:

- plugins.entries.acpx.enabled=false must survive sanitisation so
  users who work around the broken embedded ACP probe by disabling
  the plugin don't have it re-enabled every launch.
- plugins.entries.acpx.config.probeAgent override (e.g. switching
  the codex probe to claude) must be preserved.

Both were already correctly preserved by the production sanitiser;
these tests lock in that behaviour.

Co-authored-by: Haze <hazeone@users.noreply.github.com>
- electron/utils/windows-runtime-check.ts: probe System32 / SysWOW64
  for VCRUNTIME140.dll, MSVCP140.dll, VCRUNTIME140_1.dll on Windows.
- openclaw-doctor: run the probe after 'openclaw doctor' completes
  and attach hostChecks.msvcRuntime to the result.
- Settings page: render a badge + hint directing the user to
  vc_redist.x64.exe when the runtime is missing.  Non-Windows users
  see nothing (applicable=false).
- Localise new strings in en, zh, ja, ru.
- Unit tests for the MSVC probe (all permutations incl. SysWOW64
  fallback).

Co-authored-by: Haze <hazeone@users.noreply.github.com>
Document issue #884 (exit=3221225781 / STATUS_DLL_NOT_FOUND from the
embedded acpx codex probe), the one-click fix (vc_redist.x64.exe), the
new in-app banner + Doctor check, and the opt-out config for users who
don't need the embedded ACP runtime.

Added to all four README variants (en, zh-CN, ja-JP, ru-RU).

Co-authored-by: Haze <hazeone@users.noreply.github.com>
- Playwright spec asserts the ACPX_VC_REDIST_MISSING banner renders
  when the main process status includes the diagnostic.
- Remove unused vi import in windows-runtime-check test to satisfy
  eslint no-unused-vars.

Co-authored-by: Haze <hazeone@users.noreply.github.com>
Allows CI / reviewers to grab a one-off screenshot of the new diagnostic
banner for PR review without modifying the test.  No behaviour change
by default (env var is opt-in).

Co-authored-by: Haze <hazeone@users.noreply.github.com>
@ashione
Copy link
Copy Markdown
Contributor

ashione commented Apr 22, 2026

Review comments from issue #884 follow-up:

  1. Blocking before merge: GatewayManager.recordStartupDiagnostic() emits gateway:diagnostic and then calls setStatus({}), but the current GatewayStateController callback forwards the raw status object from stateController to gateway:status. That raw status does not include activeDiagnostics, because activeDiagnostics is only added by GatewayManager.getStatus(). In the renderer, src/stores/gateway.ts replaces the whole store status on each gateway:status event, so the status event can clear the diagnostic that the previous gateway:diagnostic event just merged. The banner may flash and disappear.

Suggested fix: after assigning this.status = status in the GatewayManager constructor's emitStatus hook, emit this.getStatus() instead of the raw status, or otherwise ensure every emitted status includes active diagnostics. Please also add a regression test that simulates diagnostic event + status event ordering and asserts status.activeDiagnostics survives.

  1. Link polish: Microsoft Learn currently lists https://aka.ms/vc14/vc_redist.x64.exe as the latest supported x64 Visual C++ v14 Redistributable permalink: https://learn.microsoft.com/en-us/cpp/windows/latest-supported-vc-redist?view=msvc-170. The PR uses https://aka.ms/vs/17/release/vc_redist.x64.exe; it likely works, but using the documented v14 permalink would align the banner/docs with the official table.

Validation I ran locally on origin/cursor/issue-884-root-cause-ee26:

  • pnpm vitest run tests/unit/gateway-startup-stderr.test.ts tests/unit/history-startup-retry.test.ts tests/unit/windows-runtime-check.test.ts tests/unit/gateway-diagnostics-banner.test.tsx -> passed, 4 files / 28 tests.
  • pnpm run ext:bridge && pnpm run typecheck -> passed.

@ashione ashione self-requested a review April 22, 2026 09:13
Copy link
Copy Markdown
Contributor

@ashione ashione left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

read comments, and fix related issues.

@ashione
Copy link
Copy Markdown
Contributor

ashione commented Apr 22, 2026

Deep review pass notes:

I re-reviewed the full PR surface: Gateway stderr classifier, GatewayManager diagnostic lifecycle, IPC/preload/host-events plumbing, chat retry suppression, Settings Doctor host check, banner UI, docs, and the new unit/E2E tests. I did not find another clear blocking issue beyond the diagnostic-status overwrite already called out.

The main blocker is still the event lifecycle bug: recordStartupDiagnostic() emits gateway:diagnostic, then setStatus({}) emits a raw status snapshot without activeDiagnostics; src/stores/gateway.ts replaces the whole status on every gateway:status, so the banner can be immediately cleared. Please fix this before merge by ensuring emitted status snapshots are enriched through GatewayManager.getStatus() or equivalent, and add a regression test for diagnostic event + subsequent status event ordering.

Non-blocking but recommended:

  • Add/adjust tests so they cover the actual GatewayManager -> status event pipeline, not only the happy path where /api/gateway/status is mocked with activeDiagnostics already present. The current E2E test would pass even if the real event path drops diagnostics.
  • Align the VC++ download URL in banner/docs/settings with Microsoft Learn's current permalink (https://aka.ms/vc14/vc_redist.x64.exe) or document why the vs/17/release URL is preferred.
  • Consider whether repeated diagnostics should re-emit status when occurrence counts change. Not required for the banner, but useful if the UI/logs ever expose occurrence count.

Validation from the earlier local pass remains applicable to current head 51e6591: focused unit tests passed and typecheck passed after generating the extension bridge.

// Re-emit status so subscribers that snapshot `status.activeDiagnostics`
// pick up the change without requiring them to subscribe to the
// `diagnostic` event separately.
this.setStatus({});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This re-emitted status is the place where the diagnostic can be lost. setStatus({}) goes through GatewayStateController.emitStatus, which currently emits the raw state-controller status; that snapshot does not include activeDiagnostics because those are only added by GatewayManager.getStatus(). The renderer then replaces its whole gateway status on gateway:status, so this event can clear the diagnostic that gateway:diagnostic just merged and the banner may disappear. Please ensure emitted status snapshots are enriched with active diagnostics (for example emit this.getStatus() after assigning this.status = status) and add a regression for this ordering.

};

await installIpcMocks(app, {
gatewayStatus: diagnosticStatus,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test seeds the renderer with an already-enriched gatewayStatus, so it exercises banner rendering but not the real failure-prone path: GatewayManager detects stderr, emits gateway:diagnostic, then emits gateway:status. Please add coverage for that event ordering (or a lower-level GatewayManager/store test) so a raw status event without activeDiagnostics cannot clear the diagnostic from the store.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: chat history can't load, embedded acpx runtime backend probe failed: embedded ACP runtime probe failed exited before initialize completed

3 participants