feat(review): Cursor + OpenCode review engines (unified marker-review) by backnotprop · Pull Request #959 · backnotprop/plannotator

backnotprop · 2026-06-24T14:57:43Z

What

Adds Cursor CLI and OpenCode as background review engines, alongside Claude and Codex. They are full equals — default and custom reviews, line / whole-file / general findings, the same icon-button launcher — behind one shared implementation with no duplication. Their only distinction is a large, dynamically-discovered, multi-provider model list.

Supersedes #912 (the Cursor-only PR). This branch contains all of that work plus OpenCode and the unification, rebased on current main.

The engines

Cursor — agent -p --mode ask --output-format stream-json --stream-partial-output --trust --sandbox enabled [--model …]
OpenCode — opencode run --format json --agent plan [--model provider/model] --dir …

Both have no schema-output flag, so findings come back as a <plannotator-review-json> block parsed out of prose (same approach, validated against each CLI's real NDJSON envelope). That's what makes them a single "marker engine" family.

One shared implementation (no duplication)

packages/server/marker-review.ts holds everything once — the line-buffered NDJSON reducer, last-block marker extraction, schema validation (nullable file/line), the finding→annotation transform, the log formatter, and the split methodology/output-contract prompt. Per-engine variation is a ~6-field descriptor (binary, buildArgv, extractText, modelsArgv, parseModels, author). Capability detection and model discovery in agent-jobs.ts (Bun + Pi) loop over MARKER_ENGINES rather than hand-written pairs.

This deleted the earlier cursor-review.ts + opencode-review.ts (≈1,900 lines of duplication) in favor of the shared module.

First-class with Claude/Codex

Findings: route through the shared classifyFindingPlacement() → line / whole-file / general, nothing dropped (identical to Claude).
Reviews: participate in the review-profile system — built-in default or a custom review skill. composeMarkerReviewPrompt always appends the marker output contract (even for a custom profile — otherwise parsing would silently fail), and the resolved reviewProfileId/reviewProfileLabel are stored on the job.
Models: live catalog from agent models / opencode models, surfaced on the capability and rendered as a dropdown (the one place they differ from Claude/Codex's curated lists).
Failure: fail-closed by mutation (a missing/unparseable marker block fails the job; never throws).
Read-only posture: flags-only (--mode ask + --sandbox for Cursor; --agent plan for OpenCode) — not a verified sandbox (see caveats).

UI

Real Cursor and OpenCode brand marks (from the repo's existing assets) in the launcher.
The review Provider picker is the same icon-button row as Code Tour, over all four engines, sitting above the review selector.
Per-engine model controls preserved (Cursor/OpenCode get the dynamic dropdown).

Pi parity

Mirrored across the Bun and Pi servers (apps/pi-extension/server/*), marker-review.ts vendored via vendor.sh, and Pi-safe (zero Bun.* in the vendored module — the model-discovery spawn stays per-runtime).

Verification

bun run typecheck (vendor + tsc across shared/ai/server/ui/pi-extension): green.
marker-review.test.ts (shared machinery + both descriptors + nullable→general findings + profile composition) plus affected review-agent families: pass.
Built and installed locally (plannotator review).

Built and then run through an adversarial multi-agent review (Pi-safety, the always-appended marker contract, nullable findings, leftover duplication, UI). A subsequent fix in this PR also corrected the UI silently dropping the chosen review profile for Cursor/OpenCode launches.

⚠️ Honest caveats

Read-only is unverified. Cursor's --mode ask and OpenCode's --agent plan are documented read-only postures, not sandboxes, and were not empirically confirmed to block writes end-to-end. Ship as experimental until a human verifies on an authenticated account.
Not runtime-exercised through the UI by automation. The icon row, model dropdowns, and a custom review firing on Cursor/OpenCode should get a quick manual pass.

Notes for reviewers

The interesting file is packages/server/marker-review.ts; everything else is wiring or deletion.
review.ts / serverReview.ts collapse the previous two cursor/opencode branches into one profile-aware marker branch each.

…ew engine Makes Cursor/OpenCode first-class review engines, equal to Claude/Codex, with no duplication. Their only distinction stays the big dynamic model list. - packages/server/marker-review.ts (new): one shared module for 'marker engines' (CLIs with no schema flag -> findings parsed from a <plannotator-review-json> block). Per-engine variation is a ~6-field descriptor (binary, buildArgv, extractText, modelsArgv, parseModels, author); everything else is shared once: nullable findings, validator, line-buffered NDJSON reducer, last-block parser, transformMarkerFindings (via classifyFindingPlacement -> line/whole-file/ general, drops nothing, same as Claude), log formatter, and the split methodology/output-contract prompt. Pi-safe (zero Bun.*). - DELETED cursor-review.ts + opencode-review.ts (+ tests): ~1900 lines of duplication gone. - Finding parity: Cursor/OpenCode now emit line / whole-file / general findings. - Review profiles: composeMarkerReviewPrompt ALWAYS appends the marker output contract (even for a custom profile, or parsing would silently fail); review.ts + Pi serverReview.ts collapse two branches into one profile-aware marker branch storing reviewProfileId/Label, fail-closed by mutation. - agent-jobs (Bun+Pi): capability + model discovery loop over MARKER_ENGINES. - UI: CursorIcon + OpenCodeIcon; renderEngineSelect generalized; the review engine picker is now the same icon-button row as Tour (dropdown removed). Net +228/-1902. Typecheck green; marker-review 27 tests + affected families pass.

…ncher Replaces the placeholder geometric icons with the repo's actual brand SVGs: Cursor from packages/ui/components/icons/app/cursor.svg, OpenCode from the marketing icon-opencode-dark.svg (same vector ProviderIcons.tsx uses).

…e review selector

…ches buildReviewLaunch spread the chosen reviewProfileId for claude/codex but not cursor/opencode, so a custom review was silently dropped before reaching the server and the marker engines always ran the built-in default. Spread ...review into both branches.

…, harden Bug: - ReviewAgentJobDetailPanel: OpenCode jobs labeled 'Shell' -> 'OpenCode'. Robustness: - Marker model discovery is now lazy + cached (first /capabilities request), not a synchronous spawn during handler construction — no startup stall. Bun+Pi. - Marker validator rejects non-integer/NaN/Infinity line numbers and clamps confidence to [0,1]. - UI reconciles a stale saved Cursor/OpenCode model id against the live catalog (collapses to auto/Default) so a stale cookie can't fail a launch. Dedup: - Shared review-findings.ts: ReviewFinding + transformSeverityFindings (via classifyFindingPlacement). Claude and the marker engines now delegate to it; deleted both copies. Vendored for Pi. - Marker onJobComplete reuses the shared ingest() decoration (ingest now returns its result; the marker path adds its fail-closed check). Bun+Pi. - AgentsTab: one renderMarkerEngineConfig helper for the cursor/opencode blocks. - Engine display name moved onto the MarkerEngine descriptor (dropped the duplicate MARKER_ENGINE_NAMES map); fail-closed guard uses MARKER_ENGINES lookup. Declined: the 'rewrite all four engines into a provider-adapter registry' verdict — out of scope (rewrites working Claude/Codex/Tour; different transport). Typecheck green; 89 tests pass.

…load The stale-model reconcile effects ran before capabilities loaded, when the model list is just the fallback — so a valid saved model (e.g. a prior gpt-5.2) got wiped to auto/Default before the live catalog arrived. Gate both effects on the engine being available, like the engine-reconcile effect already does.

- extractLastMarkerBlock: a dangling opener after a complete block now fails closed (returns null) instead of falling back to the earlier block — prevents a truncated final block from surfacing stale or echoed prompt-example findings as a passed review. - opencodeParseModels: accept nested-slash model ids (e.g. OpenRouter's openrouter/deepseek/deepseek-chat-v3) instead of dropping them from the picker. - marker-review: MarkerSeverity/MarkerFinding/MarkerReviewAnnotationInput now alias the shared ReviewSeverity/ReviewFinding/ReviewAnnotationInput from review-findings.ts (no byte-identical copies that can drift). - Pi agent-jobs: genericize the buildCommand JSDoc; fix the fail-closed comment to say 'Cursor and OpenCode'. Typecheck green; 55 tests pass (incl. a new truncation test).

…dings types Self-review follow-up: ClaudeSeverity/ClaudeFinding were the third byte-identical copy of ReviewSeverity/ReviewFinding (marker's were aliased in the prior commit). Claude now aliases the shared types too — one finding shape, no drift.

…g to parse) Real Cursor run on PR #959 produced a valid review but the job FAILED: the agent was reviewing this very module, so its prose contained the bare string '<plannotator-review-json>'. The static-tag parser latched onto that mention, matched it to the real block's close, and returned prose+JSON -> JSON.parse threw -> null -> failed job. This is the sentinel-in-text risk reviewers flagged, now reproduced live. Fix: the marker payload is delimited by per-job nonce-bearing tags '<plannotator-review-json:NONCE> ... </plannotator-review-json:NONCE>'. The nonce is generated at prompt-build time (makeMarkerNonce), embedded in the output contract the model copies, and recovered from the stored job.prompt at parse time (extractMarkerNonce). Bare/echoed tags lack the nonce and are ignored; absent nonce fails closed. extractLastMarkerBlock now takes the open/close tags; parse/compose take the nonce. Threaded through Bun (review.ts) and Pi (serverReview.ts). Tests: nonce round-trip + a real-run regression (bare static sentinels in prose no longer corrupt extraction). Typecheck green; 42 marker/Pi tests pass.

… blocks the event loop discoverMarkerModels used Bun.spawnSync (Bun) / execFileSync (Pi) on the GET /api/agents/capabilities request path. With both 'agent' and 'opencode' on PATH but slow/unauthenticated, the first capabilities request blocked the single-threaded event loop for up to ~10s (2x5s), freezing all other in-flight requests (diff, annotations, AI streams). Swap to async spawn (Bun.spawn / child_process.execFile) and discover both engines in parallel via Promise.all; the handler already awaits. Same 5s timeout, same fail-to-empty behavior, no pre-warming or retries. Typecheck green; 42 tests pass.

backnotprop added 11 commits June 23, 2026 20:56

feat(review): add Cursor + OpenCode review engines (squashed for rebase)

26f6a9f

feat(review): label the engine picker 'Provider' and move it above th…

9c0e59c

…e review selector

backnotprop merged commit e138c82 into main Jun 24, 2026
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(review): Cursor + OpenCode review engines (unified marker-review)#959

feat(review): Cursor + OpenCode review engines (unified marker-review)#959
backnotprop merged 11 commits into
mainfrom
feat/opencode-cli

backnotprop commented Jun 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

backnotprop commented Jun 24, 2026

What

The engines

One shared implementation (no duplication)

First-class with Claude/Codex

UI

Pi parity

Verification

⚠️ Honest caveats

Notes for reviewers

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant