docs: record issue audit roadmap

dwgx · dwgx · commit c247c338310c · 2026-06-07T02:29:08.000+09:00
diff --git a/docs/MAINTAINER_NOTES.md b/docs/MAINTAINER_NOTES.md
@@ -0,0 +1,87 @@
+# Maintainer Notes
+
+These notes capture project operating rules that should survive context
+resets. They are not release notes.
+
+## Evidence Rules
+
+- Do not claim support from names, guesses, or encode/decode round trips. For
+  protocol work, require descriptor evidence, LS binary field evidence, or a
+  real redacted trace.
+- Do not widen production defaults from a single lab success. First add gated
+  smoke, logs, docs, and a rollback path.
+- Keep unsupported boundaries explicit. If a tool, model, media input, or
+  backend cannot be bridged safely, return a clear error instead of pretending
+  it is OpenAI-compatible.
+- When an issue is broad, keep it as a reproduction bucket and require logs.
+  Do not close it because a related bug was fixed elsewhere.
+
+## Native Bridge Rules
+
+- Production default native bridge scope is the Bash family only:
+  `Bash`, `shell_command`, and `run_command`.
+- `Read`, `Grep`, `Glob`, `WebSearch`, and `WebFetch` are protocol-lab tools
+  until real traces confirm argument shape, result shape, and execution
+  boundary.
+- `WINDSURFAPI_NATIVE_TOOL_BRIDGE=all_mapped` is not a generic fix for "tools
+  not called". Use it only with explicit API key, account, model, and tool
+  gates.
+- Native bridge executes in the remote Windsurf workspace. Do not describe it
+  as local IDE/MCP/client tool execution.
+- Keep raw proto traces redacted by default. Raw string trace switches are for
+  gated lab runs only.
+
+## SWE / Special-Agent Rules
+
+- SWE-1.6 and SWE-1.6-fast are special-agent work unless a real official trace
+  proves direct Cascade chat support.
+- Do not mix SWE-1.6 with ordinary cloud catalog fixes.
+- Devin/ACP backends must be default-off, bounded, and text-only first.
+- Client-local tools and media must be rejected or explicitly bridged; never
+  silently execute them in a different workspace.
+
+## WebSearch / WebFetch Rules
+
+- Direct `GetWebSearchResults` is confirmed for WebSearch investigation.
+- No direct WebFetch/read-url API is confirmed. Do not implement one from a
+  guessed method name.
+- The observed WebFetch path is LS requested interaction plus
+  `HandleCascadeUserInteraction`, then a later trajectory step.
+- Do not bypass production VPS memory guards just to force a WebFetch canary.
+  Use an isolated memory-safe lab environment.
+
+## Release Rules
+
+- For code releases, update `package.json`, add release notes, run the focused
+  tests, run `npm run test:release`, run `npm run secret-scan`, and run full
+  shards when the blast radius is not trivial.
+- After tag push, verify GitHub CI, Release, Docker build, and deployed VPS
+  smoke before calling the release done.
+- VPS smoke should include `/health?verbose=1`, Docker image labels, `/v1/models`,
+  and one basic chat completion.
+- Verify the actual WindsurfAPI entrypoint before judging VPS health. In the
+  current VPS deployment the compose nginx entry is on `:3003`; public port 80
+  may be served by another stack and is not a WindsurfAPI health signal.
+- `/health` build metadata matters. If commit is missing, fix build metadata
+  injection instead of relying only on image labels.
+
+## Security And Privacy Rules
+
+- Never write raw API keys, passwords, account credentials, session tokens, or
+  customer email lists into docs, release notes, issue comments, or logs.
+- Use hashes, counts, IDs, and redacted previews for diagnostics.
+- Run secret scan before release and before pushing documentation that touched
+  examples or operational notes.
+
+## Code And UI Rules
+
+- Prefer existing local helpers and patterns. Avoid new dependencies unless the
+  maintenance tradeoff is clearly worth it.
+- Keep patches scoped. Do not mix protocol reverse engineering, dashboard UI,
+  release workflow, and unrelated cleanup in one release unless there is a real
+  dependency.
+- Dashboard UI should stay operational and dense: pagination, summaries,
+  compact tables, predictable controls, and no marketing-style layout.
+- Dashboard interactions should use existing app confirmation/prompt patterns,
+  not native browser alerts.
+- Do not revert unrelated user or generated changes in the worktree.
diff --git a/docs/README.md b/docs/README.md
@@ -0,0 +1,15 @@
+# WindsurfAPI Docs
+
+High-signal operational documents:
+
+- [Maintainer Notes](MAINTAINER_NOTES.md): persistent quality, release,
+  security, native bridge, SWE, and WebFetch working rules.
+- [Audit 2026-06-07](audits/AUDIT_2026-06-07.md): current open issue triage,
+  priority order, SWE-1.6 plan, and WebSearch/WebFetch plan.
+- [Audit 2026-06-06](audits/AUDIT_2026-06-06.md): prior hardening audit for
+  release metadata, dashboard pagination, native bridge, and HTTP ingress.
+- [Native Bridge Protocol Notes](native-bridge-protocol-notes.md): protobuf and
+  runtime trace notes for native bridge protocol work.
+- [Dashboard i18n](dashboard-i18n.md): dashboard localization notes.
+
+Release-specific changes live under [releases](releases/).
diff --git a/docs/audits/AUDIT_2026-06-07.md b/docs/audits/AUDIT_2026-06-07.md
@@ -0,0 +1,146 @@
+# Audit 2026-06-07
+
+Scope: current GitHub issues, open PRs, recent release state, SWE-1.6,
+WebSearch/WebFetch, native bridge boundaries, and project working rules.
+
+## Baseline
+
+- Local and remote HEAD: `v2.0.142` (`72e1b9c`,
+  `fix: clean partial stream error tails`).
+- Working tree at audit start: clean.
+- Open PRs: none.
+- VPS deployment: healthy on `v2.0.142` through the WindsurfAPI compose entry
+  (`:3003` in the current VPS deployment). `/health?verbose=1` reports version
+  `2.0.142` and commit `72e1b9cf079e`; authenticated `/v1/models` and a basic
+  chat smoke returned HTTP 200 after deployment. Do not use the VPS public port
+  80 Apache/PHP 404 page as a health signal for this service.
+- Recent closed issue cluster: #191, #189, #176, and #180 were closed after
+  rate-limit / provider-deadline / cooldown handling was documented and
+  surfaced more clearly.
+- Current open issues: #190, #186, #185, #183, #178, #177, and #169.
+
+## Open Issue Triage
+
+| Issue | Current problem | Keep open because | Next evidence needed |
+| --- | --- | --- | --- |
+| #177 | Broad "model degraded / tool failures" bucket. Causes can include model family, tool schema size, tool-choice translation, prompt emulation limits, and upstream section behavior. | v2.0.141 added `ToolRoute[...]` diagnostics, but there is no single confirmed repro left to close the bucket. | Client, route, model, tool names/count, `ToolRoute[...]`, and `Probe[...]` logs for a failing request. |
+| #178 | "No tools get called" reports from Kilo Code, opencode, Codex-like clients. | The proxy can now distinguish stripped tools, forced missing tools, native gate misses, compacted preambles, and model narration without tool calls; reporters still need concrete logs. | Same as #177, plus whether native bridge was off, narrow, or `all_mapped`. |
+| #183 | Claude Code / OpenWebUI web-search flow can lose or repeat user input after search. | WebSearch/WebFetch LS-native protocol is still lab-only; the direct WebSearch API works, but WebFetch direct endpoint is not confirmed. | A memory-safe gated WebFetch/WebSearch canary with proto trace and `webFetchTrace.state`, not a production VPS guard bypass. |
+| #185 | Cursor truncation and stray JSON in streamed answers. | v2.0.142 fixed one concrete post-content error JSON tail, but upstream long-stream provider deadlines can still truncate content. | Reporter retest on `v2.0.142`; if JSON still appears, capture route, stream/non-stream, and debug request logs. |
+| #186 | Gemini / DeepSeek model request plus SWE-1.6 mention. | Normal model additions depend on Windsurf upstream/cloud catalog; SWE-1.6 is split to #190 and must not be tracked here as normal catalog work. | Upstream catalog evidence for Gemini/DeepSeek, or a model name that is present upstream but missing locally. |
+| #190 | SWE-1.6 / SWE-1.6-fast works in official tools but not in direct Cascade chat. | Direct Cascade reports unknown/missing model path behavior; this should be a special-agent / Devin / ACP POC, not a normal enum/UID fix. | Devin-capable text-only smoke, then ACP initialize/auth/session/prompt validation. |
+| #169 | Dashboard account card display mode request. | Product requirement is underspecified and lower priority than protocol/tool correctness. | Exact desired modes, for example compact rows, grouped-by-status, or grouped-by-model. |
+
+No currently open issue should be closed only from the information above. The
+right pattern is to add a closing comment only when the specific acceptance
+condition is met and a released version is available.
+
+## Priority Order
+
+1. Keep #177/#178/#185 as reproduction buckets and require the new diagnostics
+   before making more tool-call claims. v2.0.141/v2.0.142 already improved
+   observability and one streaming edge; the next work is evidence collection,
+   not broad native-bridge enablement.
+2. Implement the SWE-1.6 special-agent POC as a separate backend. Start with
+   text-only Devin CLI print mode, default off, no local client tools, no media,
+   and bounded process/output limits.
+3. Continue WebSearch/WebFetch in a lab environment with enough LS memory
+   budget. Do not bypass the production VPS memory guard to force a canary.
+4. Keep Read/Grep/Glob/WebSearch/WebFetch out of the default production native
+   allowlist until protocol fields and runtime semantics are confirmed by real
+   traces.
+5. Address #169 after the protocol/tool work has stable evidence. Dashboard
+   pagination from #168 is already done; #169 is about additional view modes.
+6. Treat #186 as upstream/catalog watch work. Do not invent Gemini/DeepSeek
+   support before Windsurf exposes usable upstream catalog entries.
+
+## SWE-1.6 Plan
+
+SWE-1.6 is not a normal catalog patch. Do not "fix" it by adding or changing a
+Cascade UID unless a real official trace proves that direct Cascade accepts it.
+
+POC shape:
+
+- `WINDSURFAPI_SPECIAL_AGENT_BACKEND=devin-cli`
+- `swe-1.6` and `swe-1.6-fast` remain hidden from normal `/v1/models` unless
+  the special backend is explicitly enabled.
+- Initial mode is text-only. Requests with client-local tools or media should
+  return a clear unsupported-boundary error, not silently execute in a different
+  workspace.
+- Process management must have explicit max processes, timeout, output byte
+  limit, and account/session binding before production recommendation.
+- ACP mode is second phase: initialize/auth/session/prompt first, then
+  permission handling. Default permission answer should be deny/cancel until a
+  safe mapping exists.
+
+Acceptance before closing #190:
+
+- A real `swe-1.6-fast` or `swe-1.6` smoke succeeds through the special-agent
+  backend in a Devin-capable environment.
+- `/health?verbose=1` exposes enough status to show that the backend is enabled
+  and bounded.
+- Negative smoke proves tools/media are rejected or handled explicitly.
+- Docs make clear this is not the same execution model as ordinary Cascade chat.
+
+## WebSearch/WebFetch Plan
+
+Current facts:
+
+- Direct `GetWebSearchResults` is confirmed and should remain the preferred
+  WebSearch investigation route.
+- No descriptor-backed direct WebFetch/read-url endpoint has been confirmed.
+  `RecordReadUrlContent` is not a fetch endpoint.
+- Official WebFetch flow appears to be LS `requested_interaction` plus
+  `HandleCascadeUserInteraction`, followed by a later trajectory step.
+- v2.0.141 added `webFetchTrace.state` summaries, but the VPS canary did not
+  send the request because LS capacity preflight refused with
+  `ls_capacity:memory_guard`.
+
+Next valid run:
+
+- Use an isolated or local environment with enough LS memory budget.
+- Gate by one API key, one account, one model, and one tool.
+- Enable proto trace and `WINDSURFAPI_NATIVE_TOOL_BRIDGE_WEBFETCH_AUTO_APPROVE`
+  only for an allowlisted safe origin such as `https://example.com`.
+- Success requires `completed_web_document` or equivalent verified document
+  payload. `pending_permission`, `auto_run_decision_only`, natural-language
+  narration, or a repeated prompt is not success.
+
+Do not implement a WebFetch direct endpoint by name guessing. Do not production
+allowlist `WebSearch` or `WebFetch` until the trace shows a real completed
+payload and the execution boundary is documented.
+
+## Native Bridge Boundary
+
+The mature production canary remains the Bash family:
+
+- `Bash`
+- `shell_command`
+- `run_command`
+
+Everything else is mapped for protocol lab work, not default production use:
+
+- `Read`
+- `Grep`
+- `Glob`
+- `WebSearch`
+- `WebFetch`
+
+Native bridge means remote Windsurf workspace execution. It is not a generic
+fix for local IDE tools, MCP tools, `apply_patch`, or arbitrary client-side
+tools. If a client mixes native-mapped tools with custom tools, prefer prompt
+emulation unless a narrow test proves the exact route.
+
+## Recent Release Context
+
+- v2.0.137: dashboard pagination, release scan hardening, bounded release gate.
+- v2.0.139: full shard stabilization and docs around native bridge boundaries.
+- v2.0.140: upstream cooldowns surfaced as real upstream cooldowns.
+- v2.0.141: `ToolRoute[...]` diagnostics and route-specific tool handling
+  improvements.
+- v2.0.142: partial stream error tails are cleaned so post-content error JSON
+  is not appended to already-visible streamed assistant content.
+
+This sequence improved observability and some concrete edge cases, but it did
+not make Read/WebFetch/SWE-1.6 production-ready. Future notes should preserve
+that distinction.
diff --git a/docs/native-bridge-protocol-notes.md b/docs/native-bridge-protocol-notes.md
@@ -286,6 +286,18 @@ valid canary must send `HandleCascadeUserInteraction` and then verify whether
 the same trajectory advances to `read_url_content.web_document`, an error step,
 or another requested interaction.
 
+The v2.0.141/v2.0.142 state:
+
+- `scripts/native-bridge-smoke.mjs` can summarize
+  `semantic.steps[].webFetchTrace.state` so a canary does not require manual
+  raw-trace reading for the first classification pass.
+- A narrow VPS WebFetch canary was prepared with API-key gating, one model, one
+  tool, and an allowlisted safe origin, but the request did not run because LS
+  capacity preflight refused with `ls_capacity:memory_guard`.
+- That memory-guard refusal is not WebFetch protocol evidence. The next valid
+  run must use an isolated or local environment with enough LS memory budget.
+  Do not bypass the production VPS guard just to force the canary.
+
 ## Direct Web Search API
 
 `GetWebSearchResults` is confirmed independently of the LS-native tool path: