fix(gateway): surface missing VC++ Redistributable diagnostic (issue #884)#888
fix(gateway): surface missing VC++ Redistributable diagnostic (issue #884)#888hazeone wants to merge 10 commits into
Conversation
Detect the Windows STATUS_DLL_NOT_FOUND (0xC0000135 / exit=3221225781) crash of the embedded acpx ACP runtime probe introduced in openclaw 2026.4. This is the root cause of issue #884 where the MSVC 2015-2022 runtime is missing and chat.history stalls waiting on the plugin to become ready. Returns a structured { code, rawLine, detail } when recognised so the main process and renderer can react (surface a banner, bail out of retries, etc). No behaviour change yet -- detector is unused. Co-authored-by: Haze <hazeone@users.noreply.github.com>
When the startup stderr detector fires a GatewayStartupDiagnostic, store it on the manager, emit a 'diagnostic' event, and surface it via getStatus().activeDiagnostics so renderer consumers pick it up through the existing status flow without extra subscriptions. Diagnostics are cleared at the start of every start() cycle so stale entries from a previous session don't leak into a fresh one. Co-authored-by: Haze <hazeone@users.noreply.github.com>
- ipc-handlers: forward new GatewayManager 'diagnostic' event - preload: whitelist 'gateway:diagnostic' channel - host-events: map 'gateway:diagnostic' to the IPC channel - types: export GatewayStartupDiagnosticSnapshot / code - store: subscribe to diagnostic events and merge into status.activeDiagnostics Co-authored-by: Haze <hazeone@users.noreply.github.com>
- Add reusable <GatewayDiagnosticsBanner /> component that reads status.activeDiagnostics from the gateway store and renders one amber banner per known diagnostic (ACPX_VC_REDIST_MISSING today). - Banner offers 'Download VC++ Redistributable' CTA (aka.ms/vs/17 /release/vc_redist.x64.exe) and a learn-more link to Microsoft docs, both opened via window.electron.openExternal. - Localise banner strings in en, zh, ja, ru under common.json. - Wire the banner into the Chat page just above the existing error bar. - Add unit test covering render/no-render and link click behaviour. Co-authored-by: Haze <hazeone@users.noreply.github.com>
…active When the Gateway stderr classifier raises a fatal diagnostic such as ACPX_VC_REDIST_MISSING, the plugin startup will never recover without user action (installing the VC++ Redistributable). Stop burning 90+ seconds on the retry schedule -- the banner already communicates the root cause. Also suppress the generic 'Failed to load chat history' error bar when such a diagnostic is present so the user is not confused by two messages. Co-authored-by: Haze <hazeone@users.noreply.github.com>
Add two regression tests to sanitize-config covering issue #884: - plugins.entries.acpx.enabled=false must survive sanitisation so users who work around the broken embedded ACP probe by disabling the plugin don't have it re-enabled every launch. - plugins.entries.acpx.config.probeAgent override (e.g. switching the codex probe to claude) must be preserved. Both were already correctly preserved by the production sanitiser; these tests lock in that behaviour. Co-authored-by: Haze <hazeone@users.noreply.github.com>
- electron/utils/windows-runtime-check.ts: probe System32 / SysWOW64 for VCRUNTIME140.dll, MSVCP140.dll, VCRUNTIME140_1.dll on Windows. - openclaw-doctor: run the probe after 'openclaw doctor' completes and attach hostChecks.msvcRuntime to the result. - Settings page: render a badge + hint directing the user to vc_redist.x64.exe when the runtime is missing. Non-Windows users see nothing (applicable=false). - Localise new strings in en, zh, ja, ru. - Unit tests for the MSVC probe (all permutations incl. SysWOW64 fallback). Co-authored-by: Haze <hazeone@users.noreply.github.com>
Document issue #884 (exit=3221225781 / STATUS_DLL_NOT_FOUND from the embedded acpx codex probe), the one-click fix (vc_redist.x64.exe), the new in-app banner + Doctor check, and the opt-out config for users who don't need the embedded ACP runtime. Added to all four README variants (en, zh-CN, ja-JP, ru-RU). Co-authored-by: Haze <hazeone@users.noreply.github.com>
- Playwright spec asserts the ACPX_VC_REDIST_MISSING banner renders when the main process status includes the diagnostic. - Remove unused vi import in windows-runtime-check test to satisfy eslint no-unused-vars. Co-authored-by: Haze <hazeone@users.noreply.github.com>
Allows CI / reviewers to grab a one-off screenshot of the new diagnostic banner for PR review without modifying the test. No behaviour change by default (env var is opt-in). Co-authored-by: Haze <hazeone@users.noreply.github.com>
|
Review comments from issue #884 follow-up:
Suggested fix: after assigning
Validation I ran locally on
|
ashione
left a comment
There was a problem hiding this comment.
read comments, and fix related issues.
|
Deep review pass notes: I re-reviewed the full PR surface: Gateway stderr classifier, GatewayManager diagnostic lifecycle, IPC/preload/host-events plumbing, chat retry suppression, Settings Doctor host check, banner UI, docs, and the new unit/E2E tests. I did not find another clear blocking issue beyond the diagnostic-status overwrite already called out. The main blocker is still the event lifecycle bug: Non-blocking but recommended:
Validation from the earlier local pass remains applicable to current head |
| // Re-emit status so subscribers that snapshot `status.activeDiagnostics` | ||
| // pick up the change without requiring them to subscribe to the | ||
| // `diagnostic` event separately. | ||
| this.setStatus({}); |
There was a problem hiding this comment.
This re-emitted status is the place where the diagnostic can be lost. setStatus({}) goes through GatewayStateController.emitStatus, which currently emits the raw state-controller status; that snapshot does not include activeDiagnostics because those are only added by GatewayManager.getStatus(). The renderer then replaces its whole gateway status on gateway:status, so this event can clear the diagnostic that gateway:diagnostic just merged and the banner may disappear. Please ensure emitted status snapshots are enriched with active diagnostics (for example emit this.getStatus() after assigning this.status = status) and add a regression for this ordering.
| }; | ||
|
|
||
| await installIpcMocks(app, { | ||
| gatewayStatus: diagnosticStatus, |
There was a problem hiding this comment.
This test seeds the renderer with an already-enriched gatewayStatus, so it exercises banner rendering but not the real failure-prone path: GatewayManager detects stderr, emits gateway:diagnostic, then emits gateway:status. Please add coverage for that event ordering (or a lower-level GatewayManager/store test) so a raw status event without activeDiagnostics cannot clear the diagnostic from the store.
Summary
Closes #884.
On Windows, OpenClaw 2026.4.x's bundled
acpxplugin runs an embeddedACP runtime probe at Gateway startup by spawning
npx @zed-industries/codex-acp@^0.11.x. That npm package re-executes aRust-native binary (
@zed-industries/codex-acp-win32-x64) which dependson the Microsoft Visual C++ 2015–2022 Redistributable
(
VCRUNTIME140.dll/MSVCP140.dll/VCRUNTIME140_1.dll). When thoseDLLs are missing, Windows terminates the child with
exit=3221225781(0xC0000135/STATUS_DLL_NOT_FOUND), the probenever completes, and
chat.historysits in "unavailable during gatewaystartup" until the 35 s RPC timeout fires. All retries fail and the
chat UI appears frozen for a minute+ with no actionable hint.
This PR turns that cryptic failure into a clear, actionable prompt and
stops the chat UI from burning the full retry schedule.
Related Issue(s)
Closes #884
Type of Change
What changed
P0 — Detect and surface the problem
electron/gateway/startup-stderr.tsdetectGatewayStartupDiagnostic(line)recognises the issue [Bug]: chat history can't load, embedded acpx runtime backend probe failed: embedded ACP runtime probe failed exited before initialize completed #884stderr line (embedded acpx probe + codex adapter + DLL-not-found
exit code in any of
3221225781/0xc0000135/-1073741515) andreturns a structured
{ code: 'ACPX_VC_REDIST_MISSING', ... }diagnostic.
electron/gateway/manager.tsstart(), emits a newdiagnosticevent, and exposes them viagetStatus().activeDiagnosticsso renderer consumers observe themthrough the existing status flow.
electron/main/ipc-handlers.ts+electron/preload/index.ts+src/lib/host-events.tsgateway:diagnosticIPC channel to the renderer.src/types/gateway.ts+src/stores/gateway.tsactiveDiagnosticsfield and subscribe togateway:diagnosticas a redundant signal.src/components/diagnostics/GatewayDiagnosticsBanner.tsx(new)"Download VC++ Redistributable" (→
https://aka.ms/vs/17/release/vc_redist.x64.exe)and "Learn more" (→ Microsoft docs) as primary/secondary CTAs.
src/pages/Chat/index.tsxsrc/i18n/locales/{en,zh,ja,ru}/common.jsoncommon.diagnostics.acpxVcRedistMissing.{title,body,downloadButton,learnMoreButton}.P1 — Don't block the chat UI
src/stores/chat/history-startup-retry.tshasFatalStartupDiagnostic(status)helper.shouldRetryStartupHistoryLoadnow bails out immediately when theactive diagnostics contain a fatal code.
src/stores/chat/history-actions.tshistory" error bar when a fatal diagnostic is present — the banner
already communicates the real root cause.
P1 — Sanitiser regression guards
tests/unit/sanitize-config.test.tsplugins.entries.acpx.enabled=false(user opt-out) andplugins.entries.acpx.config.probeAgent=<custom>(probe-agentoverride). Both were already correct; tests are guards.
P2 — Doctor Windows runtime check
electron/utils/windows-runtime-check.ts(new)System32/SysWOW64for the three MSVC DLLs. Returnsapplicable=falseon non-Windows platforms so Doctor is a no-opelsewhere.
electron/utils/openclaw-doctor.tsrunOpenClawDoctor/runOpenClawDoctorFixnow includehostChecks.msvcRuntimein the result.src/pages/Settings/index.tsxvc_redist.x64.exewhen the runtime is missing. Non-Windows showsnothing.
src/i18n/locales/{en,zh,ja,ru}/settings.jsondeveloper.doctorHostChecks/doctorMsvcOk/doctorMsvcMissing/doctorMsvcMissingHint.Docs
README.md,README.zh-CN.md,README.ja-JP.md,README.ru-RU.mdPowerShell one-liner fix, the new banner/Doctor behaviour, and the
plugins.entries.acpx.enabled=falseopt-out.Validation
pnpm run lint— 0 errors (1 pre-existing warning inChat/index.tsxunrelated to this PR).pnpm typecheck(npx tsc --noEmit) — clean.pnpm test— 542/542 passing (added 19 new tests acrossgateway-startup-stderr.test.ts,gateway-diagnostics-banner.test.tsx,history-startup-retry.test.ts,windows-runtime-check.test.ts, plus 2new tests in
sanitize-config.test.ts).xvfb-run -a npx playwright test:tests/e2e/gateway-diagnostics-banner.spec.ts(new) — banner renders.tests/e2e/chat-history-startup-retry.spec.ts— no regression.tests/e2e/app-smoke.spec.ts,tests/e2e/main-navigation.spec.ts—no regression.
pnpm build:viteclean.Visual proof
Banner rendered in the Electron app (captured via the E2E spec with
CLAWX_BANNER_SCREENSHOT=...):ACPX_VC_REDIST_MISSING banner in Chat page
Checklist
tests, 6 E2E tests).
AGENTS.md "UI change validation" rule).
comms:replay/comms:compareneeded.
To show artifacts inline, enable in settings.