Skip to content

feat(review): Cursor + OpenCode review engines (unified marker-review)#959

Merged
backnotprop merged 11 commits into
mainfrom
feat/opencode-cli
Jun 24, 2026
Merged

feat(review): Cursor + OpenCode review engines (unified marker-review)#959
backnotprop merged 11 commits into
mainfrom
feat/opencode-cli

Conversation

@backnotprop

Copy link
Copy Markdown
Owner

What

Adds Cursor CLI and OpenCode as background review engines, alongside Claude and Codex. They are full equals — default and custom reviews, line / whole-file / general findings, the same icon-button launcher — behind one shared implementation with no duplication. Their only distinction is a large, dynamically-discovered, multi-provider model list.

Supersedes #912 (the Cursor-only PR). This branch contains all of that work plus OpenCode and the unification, rebased on current main.

The engines

  • Cursoragent -p --mode ask --output-format stream-json --stream-partial-output --trust --sandbox enabled [--model …]
  • OpenCodeopencode run --format json --agent plan [--model provider/model] --dir …

Both have no schema-output flag, so findings come back as a <plannotator-review-json> block parsed out of prose (same approach, validated against each CLI's real NDJSON envelope). That's what makes them a single "marker engine" family.

One shared implementation (no duplication)

packages/server/marker-review.ts holds everything once — the line-buffered NDJSON reducer, last-block marker extraction, schema validation (nullable file/line), the finding→annotation transform, the log formatter, and the split methodology/output-contract prompt. Per-engine variation is a ~6-field descriptor (binary, buildArgv, extractText, modelsArgv, parseModels, author). Capability detection and model discovery in agent-jobs.ts (Bun + Pi) loop over MARKER_ENGINES rather than hand-written pairs.

This deleted the earlier cursor-review.ts + opencode-review.ts (≈1,900 lines of duplication) in favor of the shared module.

First-class with Claude/Codex

  • Findings: route through the shared classifyFindingPlacement()line / whole-file / general, nothing dropped (identical to Claude).
  • Reviews: participate in the review-profile system — built-in default or a custom review skill. composeMarkerReviewPrompt always appends the marker output contract (even for a custom profile — otherwise parsing would silently fail), and the resolved reviewProfileId/reviewProfileLabel are stored on the job.
  • Models: live catalog from agent models / opencode models, surfaced on the capability and rendered as a dropdown (the one place they differ from Claude/Codex's curated lists).
  • Failure: fail-closed by mutation (a missing/unparseable marker block fails the job; never throws).
  • Read-only posture: flags-only (--mode ask + --sandbox for Cursor; --agent plan for OpenCode) — not a verified sandbox (see caveats).

UI

  • Real Cursor and OpenCode brand marks (from the repo's existing assets) in the launcher.
  • The review Provider picker is the same icon-button row as Code Tour, over all four engines, sitting above the review selector.
  • Per-engine model controls preserved (Cursor/OpenCode get the dynamic dropdown).

Pi parity

Mirrored across the Bun and Pi servers (apps/pi-extension/server/*), marker-review.ts vendored via vendor.sh, and Pi-safe (zero Bun.* in the vendored module — the model-discovery spawn stays per-runtime).

Verification

  • bun run typecheck (vendor + tsc across shared/ai/server/ui/pi-extension): green.
  • marker-review.test.ts (shared machinery + both descriptors + nullable→general findings + profile composition) plus affected review-agent families: pass.
  • Built and installed locally (plannotator review).

Built and then run through an adversarial multi-agent review (Pi-safety, the always-appended marker contract, nullable findings, leftover duplication, UI). A subsequent fix in this PR also corrected the UI silently dropping the chosen review profile for Cursor/OpenCode launches.

⚠️ Honest caveats

  • Read-only is unverified. Cursor's --mode ask and OpenCode's --agent plan are documented read-only postures, not sandboxes, and were not empirically confirmed to block writes end-to-end. Ship as experimental until a human verifies on an authenticated account.
  • Not runtime-exercised through the UI by automation. The icon row, model dropdowns, and a custom review firing on Cursor/OpenCode should get a quick manual pass.

Notes for reviewers

  • The interesting file is packages/server/marker-review.ts; everything else is wiring or deletion.
  • review.ts / serverReview.ts collapse the previous two cursor/opencode branches into one profile-aware marker branch each.

…ew engine

Makes Cursor/OpenCode first-class review engines, equal to Claude/Codex, with
no duplication. Their only distinction stays the big dynamic model list.

- packages/server/marker-review.ts (new): one shared module for 'marker engines'
  (CLIs with no schema flag -> findings parsed from a <plannotator-review-json>
  block). Per-engine variation is a ~6-field descriptor (binary, buildArgv,
  extractText, modelsArgv, parseModels, author); everything else is shared once:
  nullable findings, validator, line-buffered NDJSON reducer, last-block parser,
  transformMarkerFindings (via classifyFindingPlacement -> line/whole-file/
  general, drops nothing, same as Claude), log formatter, and the split
  methodology/output-contract prompt. Pi-safe (zero Bun.*).
- DELETED cursor-review.ts + opencode-review.ts (+ tests): ~1900 lines of
  duplication gone.
- Finding parity: Cursor/OpenCode now emit line / whole-file / general findings.
- Review profiles: composeMarkerReviewPrompt ALWAYS appends the marker output
  contract (even for a custom profile, or parsing would silently fail); review.ts
  + Pi serverReview.ts collapse two branches into one profile-aware marker branch
  storing reviewProfileId/Label, fail-closed by mutation.
- agent-jobs (Bun+Pi): capability + model discovery loop over MARKER_ENGINES.
- UI: CursorIcon + OpenCodeIcon; renderEngineSelect generalized; the review
  engine picker is now the same icon-button row as Tour (dropdown removed).

Net +228/-1902. Typecheck green; marker-review 27 tests + affected families pass.
…ncher

Replaces the placeholder geometric icons with the repo's actual brand SVGs:
Cursor from packages/ui/components/icons/app/cursor.svg, OpenCode from the
marketing icon-opencode-dark.svg (same vector ProviderIcons.tsx uses).
…ches

buildReviewLaunch spread the chosen reviewProfileId for claude/codex but not
cursor/opencode, so a custom review was silently dropped before reaching the
server and the marker engines always ran the built-in default. Spread ...review
into both branches.
…, harden

Bug:
- ReviewAgentJobDetailPanel: OpenCode jobs labeled 'Shell' -> 'OpenCode'.

Robustness:
- Marker model discovery is now lazy + cached (first /capabilities request),
  not a synchronous spawn during handler construction — no startup stall. Bun+Pi.
- Marker validator rejects non-integer/NaN/Infinity line numbers and clamps
  confidence to [0,1].
- UI reconciles a stale saved Cursor/OpenCode model id against the live catalog
  (collapses to auto/Default) so a stale cookie can't fail a launch.

Dedup:
- Shared review-findings.ts: ReviewFinding + transformSeverityFindings (via
  classifyFindingPlacement). Claude and the marker engines now delegate to it;
  deleted both copies. Vendored for Pi.
- Marker onJobComplete reuses the shared ingest() decoration (ingest now returns
  its result; the marker path adds its fail-closed check). Bun+Pi.
- AgentsTab: one renderMarkerEngineConfig helper for the cursor/opencode blocks.
- Engine display name moved onto the MarkerEngine descriptor (dropped the
  duplicate MARKER_ENGINE_NAMES map); fail-closed guard uses MARKER_ENGINES lookup.

Declined: the 'rewrite all four engines into a provider-adapter registry' verdict
— out of scope (rewrites working Claude/Codex/Tour; different transport).

Typecheck green; 89 tests pass.
…load

The stale-model reconcile effects ran before capabilities loaded, when the model
list is just the fallback — so a valid saved model (e.g. a prior gpt-5.2) got
wiped to auto/Default before the live catalog arrived. Gate both effects on the
engine being available, like the engine-reconcile effect already does.
- extractLastMarkerBlock: a dangling opener after a complete block now fails
  closed (returns null) instead of falling back to the earlier block — prevents
  a truncated final block from surfacing stale or echoed prompt-example findings
  as a passed review.
- opencodeParseModels: accept nested-slash model ids (e.g. OpenRouter's
  openrouter/deepseek/deepseek-chat-v3) instead of dropping them from the picker.
- marker-review: MarkerSeverity/MarkerFinding/MarkerReviewAnnotationInput now
  alias the shared ReviewSeverity/ReviewFinding/ReviewAnnotationInput from
  review-findings.ts (no byte-identical copies that can drift).
- Pi agent-jobs: genericize the buildCommand JSDoc; fix the fail-closed comment
  to say 'Cursor and OpenCode'.

Typecheck green; 55 tests pass (incl. a new truncation test).
…dings types

Self-review follow-up: ClaudeSeverity/ClaudeFinding were the third byte-identical
copy of ReviewSeverity/ReviewFinding (marker's were aliased in the prior commit).
Claude now aliases the shared types too — one finding shape, no drift.
…g to parse)

Real Cursor run on PR #959 produced a valid review but the job FAILED: the
agent was reviewing this very module, so its prose contained the bare string
'<plannotator-review-json>'. The static-tag parser latched onto that mention,
matched it to the real block's close, and returned prose+JSON -> JSON.parse
threw -> null -> failed job. This is the sentinel-in-text risk reviewers flagged,
now reproduced live.

Fix: the marker payload is delimited by per-job nonce-bearing tags
'<plannotator-review-json:NONCE> ... </plannotator-review-json:NONCE>'. The nonce
is generated at prompt-build time (makeMarkerNonce), embedded in the output
contract the model copies, and recovered from the stored job.prompt at parse
time (extractMarkerNonce). Bare/echoed tags lack the nonce and are ignored;
absent nonce fails closed. extractLastMarkerBlock now takes the open/close tags;
parse/compose take the nonce. Threaded through Bun (review.ts) and Pi
(serverReview.ts).

Tests: nonce round-trip + a real-run regression (bare static sentinels in prose
no longer corrupt extraction). Typecheck green; 42 marker/Pi tests pass.
… blocks the event loop

discoverMarkerModels used Bun.spawnSync (Bun) / execFileSync (Pi) on the
GET /api/agents/capabilities request path. With both 'agent' and 'opencode' on
PATH but slow/unauthenticated, the first capabilities request blocked the
single-threaded event loop for up to ~10s (2x5s), freezing all other in-flight
requests (diff, annotations, AI streams).

Swap to async spawn (Bun.spawn / child_process.execFile) and discover both
engines in parallel via Promise.all; the handler already awaits. Same 5s
timeout, same fail-to-empty behavior, no pre-warming or retries. Typecheck
green; 42 tests pass.
@backnotprop backnotprop merged commit e138c82 into main Jun 24, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant