feat(review): Cursor + OpenCode review engines (unified marker-review)#959
Merged
Conversation
…ew engine Makes Cursor/OpenCode first-class review engines, equal to Claude/Codex, with no duplication. Their only distinction stays the big dynamic model list. - packages/server/marker-review.ts (new): one shared module for 'marker engines' (CLIs with no schema flag -> findings parsed from a <plannotator-review-json> block). Per-engine variation is a ~6-field descriptor (binary, buildArgv, extractText, modelsArgv, parseModels, author); everything else is shared once: nullable findings, validator, line-buffered NDJSON reducer, last-block parser, transformMarkerFindings (via classifyFindingPlacement -> line/whole-file/ general, drops nothing, same as Claude), log formatter, and the split methodology/output-contract prompt. Pi-safe (zero Bun.*). - DELETED cursor-review.ts + opencode-review.ts (+ tests): ~1900 lines of duplication gone. - Finding parity: Cursor/OpenCode now emit line / whole-file / general findings. - Review profiles: composeMarkerReviewPrompt ALWAYS appends the marker output contract (even for a custom profile, or parsing would silently fail); review.ts + Pi serverReview.ts collapse two branches into one profile-aware marker branch storing reviewProfileId/Label, fail-closed by mutation. - agent-jobs (Bun+Pi): capability + model discovery loop over MARKER_ENGINES. - UI: CursorIcon + OpenCodeIcon; renderEngineSelect generalized; the review engine picker is now the same icon-button row as Tour (dropdown removed). Net +228/-1902. Typecheck green; marker-review 27 tests + affected families pass.
…ncher Replaces the placeholder geometric icons with the repo's actual brand SVGs: Cursor from packages/ui/components/icons/app/cursor.svg, OpenCode from the marketing icon-opencode-dark.svg (same vector ProviderIcons.tsx uses).
…e review selector
…ches buildReviewLaunch spread the chosen reviewProfileId for claude/codex but not cursor/opencode, so a custom review was silently dropped before reaching the server and the marker engines always ran the built-in default. Spread ...review into both branches.
…, harden Bug: - ReviewAgentJobDetailPanel: OpenCode jobs labeled 'Shell' -> 'OpenCode'. Robustness: - Marker model discovery is now lazy + cached (first /capabilities request), not a synchronous spawn during handler construction — no startup stall. Bun+Pi. - Marker validator rejects non-integer/NaN/Infinity line numbers and clamps confidence to [0,1]. - UI reconciles a stale saved Cursor/OpenCode model id against the live catalog (collapses to auto/Default) so a stale cookie can't fail a launch. Dedup: - Shared review-findings.ts: ReviewFinding + transformSeverityFindings (via classifyFindingPlacement). Claude and the marker engines now delegate to it; deleted both copies. Vendored for Pi. - Marker onJobComplete reuses the shared ingest() decoration (ingest now returns its result; the marker path adds its fail-closed check). Bun+Pi. - AgentsTab: one renderMarkerEngineConfig helper for the cursor/opencode blocks. - Engine display name moved onto the MarkerEngine descriptor (dropped the duplicate MARKER_ENGINE_NAMES map); fail-closed guard uses MARKER_ENGINES lookup. Declined: the 'rewrite all four engines into a provider-adapter registry' verdict — out of scope (rewrites working Claude/Codex/Tour; different transport). Typecheck green; 89 tests pass.
…load The stale-model reconcile effects ran before capabilities loaded, when the model list is just the fallback — so a valid saved model (e.g. a prior gpt-5.2) got wiped to auto/Default before the live catalog arrived. Gate both effects on the engine being available, like the engine-reconcile effect already does.
- extractLastMarkerBlock: a dangling opener after a complete block now fails closed (returns null) instead of falling back to the earlier block — prevents a truncated final block from surfacing stale or echoed prompt-example findings as a passed review. - opencodeParseModels: accept nested-slash model ids (e.g. OpenRouter's openrouter/deepseek/deepseek-chat-v3) instead of dropping them from the picker. - marker-review: MarkerSeverity/MarkerFinding/MarkerReviewAnnotationInput now alias the shared ReviewSeverity/ReviewFinding/ReviewAnnotationInput from review-findings.ts (no byte-identical copies that can drift). - Pi agent-jobs: genericize the buildCommand JSDoc; fix the fail-closed comment to say 'Cursor and OpenCode'. Typecheck green; 55 tests pass (incl. a new truncation test).
…dings types Self-review follow-up: ClaudeSeverity/ClaudeFinding were the third byte-identical copy of ReviewSeverity/ReviewFinding (marker's were aliased in the prior commit). Claude now aliases the shared types too — one finding shape, no drift.
…g to parse) Real Cursor run on PR #959 produced a valid review but the job FAILED: the agent was reviewing this very module, so its prose contained the bare string '<plannotator-review-json>'. The static-tag parser latched onto that mention, matched it to the real block's close, and returned prose+JSON -> JSON.parse threw -> null -> failed job. This is the sentinel-in-text risk reviewers flagged, now reproduced live. Fix: the marker payload is delimited by per-job nonce-bearing tags '<plannotator-review-json:NONCE> ... </plannotator-review-json:NONCE>'. The nonce is generated at prompt-build time (makeMarkerNonce), embedded in the output contract the model copies, and recovered from the stored job.prompt at parse time (extractMarkerNonce). Bare/echoed tags lack the nonce and are ignored; absent nonce fails closed. extractLastMarkerBlock now takes the open/close tags; parse/compose take the nonce. Threaded through Bun (review.ts) and Pi (serverReview.ts). Tests: nonce round-trip + a real-run regression (bare static sentinels in prose no longer corrupt extraction). Typecheck green; 42 marker/Pi tests pass.
… blocks the event loop discoverMarkerModels used Bun.spawnSync (Bun) / execFileSync (Pi) on the GET /api/agents/capabilities request path. With both 'agent' and 'opencode' on PATH but slow/unauthenticated, the first capabilities request blocked the single-threaded event loop for up to ~10s (2x5s), freezing all other in-flight requests (diff, annotations, AI streams). Swap to async spawn (Bun.spawn / child_process.execFile) and discover both engines in parallel via Promise.all; the handler already awaits. Same 5s timeout, same fail-to-empty behavior, no pre-warming or retries. Typecheck green; 42 tests pass.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds Cursor CLI and OpenCode as background review engines, alongside Claude and Codex. They are full equals — default and custom reviews, line / whole-file / general findings, the same icon-button launcher — behind one shared implementation with no duplication. Their only distinction is a large, dynamically-discovered, multi-provider model list.
The engines
agent -p --mode ask --output-format stream-json --stream-partial-output --trust --sandbox enabled [--model …]opencode run --format json --agent plan [--model provider/model] --dir …Both have no schema-output flag, so findings come back as a
<plannotator-review-json>block parsed out of prose (same approach, validated against each CLI's real NDJSON envelope). That's what makes them a single "marker engine" family.One shared implementation (no duplication)
packages/server/marker-review.tsholds everything once — the line-buffered NDJSON reducer, last-block marker extraction, schema validation (nullablefile/line), the finding→annotation transform, the log formatter, and the split methodology/output-contract prompt. Per-engine variation is a ~6-field descriptor (binary,buildArgv,extractText,modelsArgv,parseModels,author). Capability detection and model discovery inagent-jobs.ts(Bun + Pi) loop overMARKER_ENGINESrather than hand-written pairs.This deleted the earlier
cursor-review.ts+opencode-review.ts(≈1,900 lines of duplication) in favor of the shared module.First-class with Claude/Codex
classifyFindingPlacement()→ line / whole-file / general, nothing dropped (identical to Claude).composeMarkerReviewPromptalways appends the marker output contract (even for a custom profile — otherwise parsing would silently fail), and the resolvedreviewProfileId/reviewProfileLabelare stored on the job.agent models/opencode models, surfaced on the capability and rendered as a dropdown (the one place they differ from Claude/Codex's curated lists).--mode ask+--sandboxfor Cursor;--agent planfor OpenCode) — not a verified sandbox (see caveats).UI
Pi parity
Mirrored across the Bun and Pi servers (
apps/pi-extension/server/*),marker-review.tsvendored viavendor.sh, and Pi-safe (zeroBun.*in the vendored module — the model-discovery spawn stays per-runtime).Verification
bun run typecheck(vendor + tsc across shared/ai/server/ui/pi-extension): green.marker-review.test.ts(shared machinery + both descriptors + nullable→general findings + profile composition) plus affected review-agent families: pass.plannotator review).Built and then run through an adversarial multi-agent review (Pi-safety, the always-appended marker contract, nullable findings, leftover duplication, UI). A subsequent fix in this PR also corrected the UI silently dropping the chosen review profile for Cursor/OpenCode launches.
--mode askand OpenCode's--agent planare documented read-only postures, not sandboxes, and were not empirically confirmed to block writes end-to-end. Ship as experimental until a human verifies on an authenticated account.Notes for reviewers
packages/server/marker-review.ts; everything else is wiring or deletion.review.ts/serverReview.tscollapse the previous two cursor/opencode branches into one profile-aware marker branch each.