Skip to content

Merge main into feat/multi-device-tests#128

Open
droid-ash wants to merge 119 commits intofeat/multi-device-testsfrom
merge/main-into-multi-device-tests
Open

Merge main into feat/multi-device-tests#128
droid-ash wants to merge 119 commits intofeat/multi-device-testsfrom
merge/main-into-multi-device-tests

Conversation

@droid-ash
Copy link
Copy Markdown
Contributor

@droid-ash droid-ash commented Apr 28, 2026

Summary

Pull main (117 commits since divergence) into feat/multi-device-tests so the multi-device path works against main's reshaped runtime.

  • 1 textual conflict in packages/cli/bin/finalrun.ts — both branches restructured the same import block. Adopted main's lazy resolveLocalRuntime() + runUpgrade wiring; regrafted the multi-device branch on top.
  • Dynamic-import the heavy multi-device modules (AIAgent, MultiDeviceTestExecutor, prepareMultiDeviceSession) inside runMultiDeviceTestCommand so the standalone Bun binary stays slim for cloud-only callers.
  • Migrated the multi-device path to main's per-feature model APIs: resolveApiKeys (multi-provider) instead of resolveApiKey, and the new AIAgent({ apiKeys, defaults, features }) shape. Workspace reasoning level now shows up in the run banner and flows through to the planner.

What landed from main (high level)

  • Bun-compiled binary distribution (npm publish dropped); finalrun upgrade
  • TypeScript 6 + Vite 8 migration; report-web rewritten Next.js → Vite
  • Lazy-loaded @finalrun/local-runtime package + @finalrun/cloud-core split
  • finalrun cloud test/suite/upload commands
  • Per-feature model + reasoning effort config (workspace YAML)
  • Visual grounding fallback feature, async step callbacks, Langfuse traces
  • Windows installer (install.ps1), Mintlify docs, CHANGELOG, release workflow

Test plan

  • npm install — clean install against new workspace layout (cloud-core, local-runtime added; package-lock removed)
  • npm run build — green across all workspaces (common, cloud-core, device-node, goal-executor, report-web, cli)
  • Smoke finalrun check .finalrun/multi-device/tests/<test>.yaml against a real workspace
  • Smoke finalrun test .finalrun/multi-device/tests/<test>.yaml against two real devices

Notes

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Multi-provider AI support with per-feature model & reasoning overrides
    • Cloud test submission and app upload from the CLI
    • finalrun upgrade command and Windows PowerShell installer; improved curl|sh installer flow
    • Report web UI rebuilt as a client-side SPA for local viewing
  • Bug Fixes

    • Clearer, contextual error messages for config/provider issues
  • Documentation

    • Expanded installation, upgrade, workspace config, CLI reference, and troubleshooting guides

Srinidhi G S and others added 30 commits April 8, 2026 20:45
- CLI exits immediately after submission with run ID + status URL
- Hard-checks for HTTP 201 from server (fails on anything else)
- Sends runType (single_test/multi_test/suite), name, appFilename in FormData
- Captures full process.argv as the command string
- Removes pollRunUntilFinished, RunDetailsResponse, statusIcon, sleep
- Drops 'folder' runType — collapses to 'multi_test'

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Lift the all-or-nothing model config into a per-feature block so the
planner can run on a strong reasoning model while specialized grounders
use cheaper ones. Unified reasoning scale (minimal|low|medium|high)
maps to each provider's native knob (Google thinkingLevel, OpenAI
reasoningEffort, Anthropic effort); minimal stays OpenAI-only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tion

perf(planner): slim hierarchy via planner/grounder split + AI call visibility
The merge with main pulled in new _summarizePlannerRequest /
_summarizeGrounderRequest helpers that referenced the old _provider
and _modelName fields. Route them through _resolveFeatureConfig so the
log line reflects the actual provider/model used for each feature.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…vels

Add three labeled recipes (single model, per-feature effort tuning,
mixed providers), a provider/reasoning matrix so minimal is visible as
OpenAI-only before runtime, and the planner=medium / grounder=low
fallback when reasoning is omitted. Cross-link from cli-reference.md
so users reading only the CLI flags discover the YAML surface.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Vercel AI SDK's @ai-sdk/anthropic adapter silently drops Output.json()
when no schema is supplied — Anthropic has no schema-less JSON mode and
tool-calling is the only enforced path. Claude Sonnet 4.6 then free-writes
multiple candidate JSONs, breaking safeParseJSON at the planner step.

Add a per-feature zod schema (new schemas.ts), select it when the resolved
provider is anthropic, and pass through Output.object({ schema }). OpenAI
and Google keep their working schema-less Output.json() path untouched.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…lidator

Anthropic's tool-schema validator rejects `minimum`/`maximum` keywords on
the `integer` type, and zod v4's .int() emits safe-integer bounds by
default — producing HTTP 400:

    output_config.format.schema: For 'integer' type, properties maximum,
    minimum are not supported

Switch index/x/y fields to plain z.number(). Downstream parsers already
coerce where needed. Document the constraint inline so nobody
reintroduces .int().

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The prompts tell the model to emit {"output": {...}} as a text-JSON
convention for OpenAI/Google. Duplicating that in the zod schema meant
Claude, asked to satisfy both, nested twice:

    {"output":{"output":{"thought":...,"action":...,"remember":[]}}}

Vercel AI SDK docs confirm: a schema passed to Output.object({ schema })
describes the structured data directly — tool inputs/output_format
payloads are not wrapped. Flatten every schema to its inner shape.

The planner and grounder parsers already accept both wrapped (OpenAI
and Google) and unwrapped (Anthropic) shapes via their existing fallback
branches, so no parser changes.

Add a test that rejects a payload with an outer output wrapper so the
bug can't creep back.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The @ai-sdk/anthropic adapter's getModelCapabilities() carries a
hardcoded list of which models support structured output. Newer models
(e.g. claude-opus-4-7) miss the list and fall through to the catch-all
that returns supportsStructuredOutput: false, which makes `auto` fall
back to a `json` tool wrapper. The wrapper collides with the prompt's
{output: {...}} convention and Claude emits {input:{output:{...}}},
failing schema validation.

Force structuredOutputMode: 'outputFormat' on every Anthropic call.
This bypasses the SDK's stale model list entirely — Anthropic's API
itself validates whether the model supports output_config.format and
returns a clean HTTP 400 if not. Forward-compatible with every Claude
4.5+ model without any model-version checks on our side.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…effect

@ai-sdk/openai v3's openai(modelId) defaults to the Chat Completions API,
which silently ignores providerOptions.openai.reasoningEffort. Reasoning
models like gpt-5.4-mini only honor reasoning effort via the Responses
API. Use openai.responses(modelId) explicitly so the effort setting is
actually applied.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ture

Address several CodeRabbit comments as one cohesive tightening:

- Lift parseModel (+ SUPPORTED_AI_PROVIDERS, PROVIDER_ENV_VARS,
  MODEL_FORMAT_EXAMPLE, ParsedModel, SupportedProvider) into
  @finalrun/common and add an optional label parameter for error
  context. The CLI re-exports the names so existing imports keep
  working.
- _resolveFeatureConfig now calls parseModel with a
  `features.<name>.model` label instead of open-coding a looser split,
  so invalid per-feature overrides (empty halves, unsupported
  providers, whitespace quirks) fail with the same messages as the
  --model flag and workspace-level model.
- Narrow GrounderRequest.feature from string to FeatureName and drop
  the two `as FeatureName` casts; propagate the narrowing through
  ActionExecutor._groundToPoint and _groundTraceDetail so a typo can't
  silently reach schemaForFeature or the wrong prompt.
- Align resolveApiKeys with resolveApiKey: an empty/whitespace
  --api-key value falls through to env-var lookup instead of being
  accepted as the literal key.
- Persist workspace-level reasoning and per-feature overrides in
  run-context.json via writeRunInputs so mixed-provider runs are
  reproducible from artifacts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
droid-ash and others added 22 commits April 27, 2026 00:24
The install section was carrying details that belong elsewhere:

  - "Downloads a self-contained finalrun binary into ~/.finalrun/bin/,
    adds it to your PATH, downloads the per-platform runtime tarball,
    prompts for Android/iOS, installs host tools..." — internals of
    install.sh, already documented in the script's header comment and
    in the auto-generated release notes (.github/release-notes-template.md).
  - "For CI / non-interactive environments..." — CI flag and env-var
    documentation, already in release notes and install.sh's --help.
  - "After install, run finalrun doctor..." — diagnostic guidance that
    fits better in finalrun --help / docs/troubleshooting.md.

A README install block should answer "how do I get this on my machine"
in one or two commands, not document the installer's behavior. Trimmed
to just the install one-liners per platform plus a single sentence on
the iOS-on-Windows constraint (preempts the most common support
question without expanding the section).

Adds Windows install instructions for the first time. Until v0.1.10,
the .ps1 installer didn't exist; v0.1.10 shipped it (commit 9d5b78e on
PR #120) and the .exe / runtime tarball are now on every release. The
README continues to point only at install.sh and ignores Windows —
this catches it up.

Net: 14 lines → 6 lines, +1 platform.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
README: trim install section, add Windows
Root build ran `npm run build --workspaces`, which errored on
@finalrun/local-runtime because it only defines build:tarball (a
release-time packaging step, not a tsc compile). Add --if-present so
workspaces without a build script are skipped, matching the existing
test:workspaces pattern.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…out-build-script

Fix npm run build by skipping workspaces without a build script
install.sh / install.ps1: after the interactive flow, check whether
FINALRUN_API_KEY, ANTHROPIC_API_KEY, OPENAI_API_KEY, or GOOGLE_API_KEY
is set in the shell environment. If any is set, print one "✓ <NAME>
detected" line per key. If none is set, print a short block pointing
to FinalRun Cloud (cloud.finalrun.app) and the BYOK env vars. CI mode
is unchanged — the check only runs on the interactive path.

mintlify-docs: add configuration/cloud-api-key.mdx with an embedded
YouTube walkthrough (https://youtu.be/XDPef_yIbCU) and steps to grab
a key and set FINALRUN_API_KEY. Cross-link from ai-providers.mdx.
Add the new page to docs.json under the Configuration group.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wrap the cloud signup URL and both docs URLs in OSC 8 hyperlink
escapes so they're single-click in modern terminals (iTerm2,
Terminal.app, Windows Terminal, kitty, alacritty, WezTerm,
gnome-terminal). Older terminals strip the escapes and show the
bare URL, so the change degrades cleanly.

Also adds a Docs: link under the FinalRun Cloud option (mirrors
the existing one for BYOK), and aligns "Sign up at" -> "Sign up:"
so the two labels line up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
OSC 8 hyperlinks aren't supported by macOS Terminal.app, which is
the most common terminal for the target audience. Plain URL text is
auto-detected as clickable (cmd-click on macOS, ctrl-click on
Windows/Linux) by every modern terminal — Terminal.app, iTerm2, VS
Code, Windows Terminal, gnome-terminal, kitty, alacritty, WezTerm —
so dropping the OSC 8 wrapping is universally compatible at the cost
of a minor styling difference in iTerm2/etc.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wrap each URL in ANSI underline (ESC[4m / ESC[24m). Universally
supported by Terminal.app, iTerm2, VS Code, Windows Terminal, and
conhost — much wider than OSC 8. Click activation still works via
each terminal's native URL detection (cmd-click on macOS, ctrl-click
on Win/Linux).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…d60d

installer: detect AI provider keys; docs: add Cloud API key page
…de false positive

Skip the "Which platform(s) would you like to set up host tools for?"
prompt when every relevant platform is already detected as ready.
Detection matches the existing setup_* checks:
- Android: SDK path resolves AND scrcpy on PATH
- iOS (mac only): xcrun -f simctl succeeds AND applesimutils on PATH

When all relevant platforms are ready the installer prints one
"✓ <Platform> tools detected — skipping setup." line per platform,
sets PLATFORM_CHOICE so run_doctor still verifies, and proceeds.
Partial states (e.g. Android ready, iOS not) keep the existing
prompt — partial cases are uncommon and tracking them adds surface
area.

Also fixes a false-positive in setup_ios: the previous check
`xcode-select -p` succeeded for both full Xcode AND Command Line
Tools-only installs, so users with CLT (often installed for git)
saw a misleading "✓ Xcode detected" before the install failed
later. `xcrun -f simctl` is the canonical Xcode-vs-CLT signal —
simctl ships with Xcode and is absent from CLT-only installs, and
`xcrun -f` is a path lookup so it doesn't trigger a license check.

The "install Command Line Tools" branch in setup_ios is removed —
CLT alone never satisfies iOS testing, so suggesting it as a fix
was wrong. Error messages now distinguish the CLT-only case
(suggest `sudo xcode-select -s` after installing Xcode) from the
nothing-installed case (install Xcode from the App Store).

Same skip behavior in install.ps1 (Android-only on Windows).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…atforms

installer: skip platform prompt when tools already installed; fix Xcode false positive
Drops the [Y/n] prompt for skills install and replaces it with an
auto-trigger that's smart about already-installed skills.

Behavior:
- npx missing -> warn and skip (unchanged).
- No finalrun-* skills installed globally -> 'skills add final-run/finalrun-agent
  -y -g' (full install).
- Some finalrun-* skills installed -> 'skills update -g -y <names...>' for just
  those. The skills CLI's update is internally diff-aware: it compares against
  the source repo and only downloads what's stale. Output line then branches
  on whether the run reported "up to date" -> "already up to date." vs
  "updated.".

Detection uses 'skills ls -g --json' so it works regardless of how the user
got their skills installed. Update is scoped to installed finalrun-* names so
we don't inadvertently touch other global skills (find-skills, etc.) the user
maintains.

User removals are respected: if someone removed one of the FinalRun skills on
purpose, we don't reinstall it on re-run — we only update what's there.

Same logic in install.ps1 with ConvertFrom-Json on the skills list.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
installer: auto-sync FinalRun skills (no prompt, skip when up to date)
…ng users to export it

Two related UX fixes so the post-install "open a new terminal or run
export PATH=..." instruction shows up only when it's actually needed.

1. Symlink the binary into ~/.local/bin (Unix)
   The actual binary still lives at ~/.finalrun/bin/finalrun (versioned
   data isolated). install_binary now also creates ~/.local/bin/finalrun
   pointing at it. This is the convention used by claude, uv, pipx, pixi,
   mise — and ~/.local/bin is already on $PATH for the majority of Linux
   distros via /etc/profile or systemd. So Linux first-install users get
   instant gratification.

   setup_path now writes 'export PATH="$HOME/.local/bin:$PATH"' to
   .bashrc / .bash_profile / .profile / .zshrc / .zprofile and the fish
   equivalent (fish_add_path) — but only when ~/.local/bin isn't already
   on the current $PATH. Idempotent via grep on the literal marker.

2. Verify and only complain when broken
   New verify_path helper runs `command -v finalrun` and prints either
   "✓ finalrun is on your PATH." or a one-line "open a new terminal"
   hint. Replaces the unconditional 'open a new terminal, or run: export
   PATH=...' footer in both print_summary and print_ci_summary.

Windows (install.ps1) gets the analogous treatment:
- Update-UserPath now also assigns $env:Path in the running PowerShell
  session, so finalrun works immediately in the same window (irm | iex
  runs inside the user's process). New windows still pick up PATH via
  the registry write.
- New Test-FinalrunOnPath uses Get-Command finalrun to branch the
  message; replaces the static '$env:Path = ...' instruction in both
  Show-Summary and Show-CISummary.

Net result:
- Linux first install: silent ✓ (PATH already had ~/.local/bin)
- macOS first install: rc files updated, single line nudge to open a new
  terminal (instead of the long export hint)
- Windows first install: silent ✓, current shell works
- Re-runs on a working shell: silent ✓ (verification passes)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Today the line is a yellow ⚠ that's easy to skim past, even though
finalrun literally cannot run tests without a key. Switch to the red
✗ (the existing fail / Write-Failure helper, which prints without
exiting) and add a single explanatory clause:

    ✗ No API key detected — finalrun cannot run tests without one.

Also tightens the BYOK section header from "Prefer your own AI
provider account? Bring your own key:" to "Or bring your own
provider key:" — parallel structure with the Cloud option above it,
matches the new "this is not optional" tone.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1. install.ps1: split Update-UserPath into independent registry +
   session updates. The previous early-return at the registry check
   meant a current PowerShell session that lacked $BinDir wouldn't
   get patched if the registry was already correct (e.g. PS process
   started before a previous installer run, or the session PATH was
   manually pruned). Now the two checks run independently — registry
   update for new windows, $env:Path update for the current window.

2. install.sh: track a single PATH_TARGET_DIR and reuse it across
   install_binary, setup_path, and verify_path. Previously, when
   `ln -sf` to ~/.local/bin failed (read-only, exotic FS), setup_path
   still wrote ~/.local/bin to shell rcs and verify_path still nudged
   the user toward ~/.local/bin — but no finalrun lived there. Now
   PATH_TARGET_DIR falls back to $FINALRUN_DIR/bin on symlink failure
   and the rest of the flow follows that single source of truth.

Smoke-tested all four branches: ~/.local/bin success, fallback to
~/.finalrun/bin, skip-when-in-PATH, idempotent re-run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…link

installer: symlink into ~/.local/bin and verify PATH instead of telling users to export it
Pin typescript to ^6.0.3 in every workspace package so npm hoists one
major version for builds and tooling. Extend tsconfig.base.json with
typeRoots targeting the repo @types folder and types: [`*`] so
TypeScript 6 still loads ambient typings (Node globals, tests) after its
types default stopped auto-including packages under node_modules/@types.

Made-with: Cursor
Make `npm i && npm run build` work from a fully clean state. Three
issues surfaced after the TS 6 unification commit and Vite 8 bump:

- tsconfig.base.json typeRoots was workspace-relative only, so only
  workspaces that nested their own @types/ (report-web) found Node
  types; the rest hit TS2591 on `node:*` imports. Add the repo-root
  path so hoisted @types are visible from every workspace.
- Add @types/node at the root so a modern version hoists, instead of
  relying on an ancient transitive @types/node@12 that predates the
  `node:` module-name protocol.
- Vite 8 lists rolldown and lightningcss as regular deps, but npm
  workspace hoisting drops them. Declare both explicitly in
  report-web (rolldown was already there; lightningcss was missing).

Also replace the brittle hoisted-package check in
scripts/ensure-dev-install.mjs with a simple root node_modules check
so it stops false-firing on packages npm chose to nest.
Unify TypeScript 6 across workspaces and fix TS 6 typings defaults.
Reconciles the multi-device test path (parallel lanes, shared memory)
with main's lazy local-runtime split, per-feature model/reasoning
config, and Bun-binary distribution.

Conflict resolved in packages/cli/bin/finalrun.ts:
- Adopt main's lazy resolveLocalRuntime() + runUpgrade wiring as the base.
- Drop eager imports of AIAgent / MultiDeviceTestExecutor /
  prepareMultiDeviceSession; dynamic-import them inside
  runMultiDeviceTestCommand so cloud-only invocations don't pay the cost.
- Replace resolveApiKey (singular) with resolveApiKeys (multi-provider)
  on the multi-device path so it picks up workspace feature overrides.
- Construct AIAgent with the new { apiKeys, defaults, features } shape;
  surface workspace reasoning level in the run banner.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 28, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6d3b9619-1874-482b-9c46-be8ae00d4f3e

📥 Commits

Reviewing files that changed from the base of the PR and between 08da896 and 3b764c8.

📒 Files selected for processing (1)
  • mintlify-docs/installation.mdx
✅ Files skipped from review due to trivial changes (1)
  • mintlify-docs/installation.mdx

📝 Walkthrough

Walkthrough

Adds release infrastructure and distribution changes (Bun-compiled binaries, installer/upgrade, Changesets, GitHub release workflow), refactors CLI for multi-provider and per-feature model/reasoning support, introduces cloud submission package, migrates report UI to a Vite SPA with JSON APIs served by the CLI, and expands documentation (Mintlify site + guides).

Changes

Cohort / File(s) Summary
Release & Changesets
/.changeset/README.md, .changeset/config.json, .changeset/publish-common-report-web.md, .github/workflows/release.yml, .github/release-notes-template.md, RELEASING.md, CHANGELOG.md
Adds Changesets config and release docs; introduces manual GitHub Actions release workflow producing multi-platform Bun binaries, runtime tarballs, artifacts, and generated release notes.
Root & Monorepo Manifest
package.json, .gitignore
Bumps root version to 0.1.10, updates workspaces (adds cloud-core/local-runtime), adds Changesets scripts, tightens Node engine, and ignores additional build artifacts.
CLI Packaging & Scripts
packages/cli/package.json, packages/cli/scripts/*, packages/cli/LICENSE, packages/cli/README.md
Switches CLI distribution model (private Bun binary), removes old npm-focused packaging scripts, adds copy/build helpers, new test runner script, adds license and README updates for binary install flow.
Local Runtime / Runtime Tarballs
packages/local-runtime/*, packages/local-runtime/scripts/buildRuntimeTarball.mjs
Adds local-runtime package and tarball builder producing per-platform runtime bundles with manifest/sha256.
CLI Commands & Runtime Resolver
packages/cli/bin/finalrun.ts, packages/cli/src/localRuntime.ts, packages/cli/src/upgradeCommand.ts
Resolves and lazily loads local runtime, adds upgrade command invoking platform installers, routes report server/doctor via local runtime, and handles LocalRuntimeMissingError with recovery instructions.
Multi-provider & Reasoning Config
packages/common/src/constants.ts, packages/cli/src/env.ts, packages/cli/src/apiKey.ts, packages/cli/src/workspace.ts, packages/cli/src/env.test.ts
Centralizes feature/provider constants and types, adds parseModel/parseReasoningLevel, introduces resolveApiKeys for multi-provider API keys, and extends workspace config to support reasoning and per-feature features overrides.
Test Execution & Session Config
packages/cli/src/testRunner.ts, packages/cli/src/sessionRunner.ts, packages/cli/src/reportWriter.ts, packages/cli/src/goalRunner.test.ts, packages/cli/src/testRunner.test.ts
Re-shapes runner/session configs from single apiKey/provider/modelName to { apiKeys, defaults, features }, threads reasoning/features into persisted run inputs and executor calls, updates tests accordingly.
Report Server & View Models
packages/cli/src/reportServer.ts, packages/cli/src/reportViewModel.ts, packages/cli/src/reportArtifactStream.ts
Moves from server-side HTML rendering to serving JSON view models (/api/report/index, /api/report/runs/:runId) and static SPA assets; centralizes artifact path validation and range-serving; view-model loaders enrich manifests with snapshot YAML and device-log tails.
Report Web — Vite SPA & UI Library
packages/report-web/package.json, packages/report-web/index.html, packages/report-web/src/*, packages/report-web/scripts/*
Migrates report-web from Next.js to a client-side Vite SPA and a reusable UI library (/ui, /ui/styles.css, /routes), adds React components, UI utilities, icons, log parsing, and build/style scripts; removes server-side templates and Next routes.
Report Web — Removed Server Templates / Tests
packages/report-web/src/renderers.ts, packages/report-web/src/reportTemplate.ts, packages/report-web/src/reportIndexTemplate.ts, plus related tests and Next routes
Deletes legacy server-side HTML templates, renderer modules, and their tests as rendering moves to the SPA and CLI JSON APIs.
Cloud Submission Package
packages/cloud-core/*
Adds @finalrun/cloud-core with submitRun and uploadApp implementations: ZIP assembly, multipart uploads, timeouts, error handling, and public types.
Goal Executor: multi-provider & tracing
packages/goal-executor/src/ai/AIAgent.ts, packages/goal-executor/src/ai/schemas.ts, packages/goal-executor/src/ai/VisualGrounder.ts, packages/goal-executor/src/ai/*test.ts
Refactors AIAgent to accept { apiKeys, defaults, features }, apply per-feature reasoning, validate minimal reasoning constraints, adds Zod schemas for structured output, and threads per-LLM-call LLMCallTrace through planner/grounder.
Tracing across executor
packages/goal-executor/src/trace.ts, packages/goal-executor/src/ActionExecutor.ts, packages/goal-executor/src/TestExecutor.ts, packages/goal-executor/src/index.ts
Introduces LLMCallTrace type and accumulates/propagates llmCalls through ActionExecutor and TestExecutor outputs; exports trace type from package.
Report UI Controller & Client-side behavior
packages/report-web/src/ui/client/runDetailController.ts, packages/report-web/src/ui/components/*, packages/report-web/src/ui/*
Adds client-side DOM controller, components (VideoPanel, DeviceLogPanel, StepButton, StatusPill, SummaryCard, etc.), formatting, icons, and log parsing utilities for the SPA.
Docs & Mintlify Site
mintlify-docs/*, docs/*, README.md, CONTRIBUTING.md
Adds comprehensive Mintlify docs (installation, quickstart, configuration, CLI reference, reports, AI providers, troubleshooting), updates root README and contributing guidance to reflect new distribution model and workspace layout.

Sequence Diagram(s)

sequenceDiagram
    participant User as User (CLI)
    participant CLI as finalrun CLI
    participant WorkspaceConfig as Workspace Config
    participant GoalExecutor as Goal Executor
    participant AIAgent as AI Agent
    participant LLM as LLM Providers (OpenAI, Google, Anthropic)

    User->>CLI: finalrun test --model provider/model
    CLI->>WorkspaceConfig: Read .finalrun/config.yaml
    WorkspaceConfig-->>CLI: config + features overrides
    CLI->>CLI: resolveApiKeys(env, requiredProviders)
    CLI->>GoalExecutor: runTests(apiKeys, defaults, features)
    GoalExecutor->>AIAgent: create agent(apiKeys, defaults, features)
    loop per step
      GoalExecutor->>AIAgent: planner call (feature)
      AIAgent->>LLM: call planner LLM (provider/model/reasoning)
      LLM-->>AIAgent: response + llmCall
      AIAgent-->>GoalExecutor: PlannerResponse + llmCall
      GoalExecutor->>AIAgent: grounder call (feature)
      AIAgent->>LLM: call grounder LLM (provider/model/reasoning)
      LLM-->>AIAgent: response + llmCall
      AIAgent-->>GoalExecutor: GrounderResponse + llmCall
      GoalExecutor->>GoalExecutor: accumulate llmCalls
    end
    GoalExecutor-->>CLI: run result with step.llmCalls
    CLI->>CLI: write run manifest (include reasoning, features)
Loading
sequenceDiagram
    participant User as User (Browser)
    participant CLI as finalrun CLI / Report Server
    participant SPA as report-web SPA
    participant FS as Artifacts on disk

    User->>CLI: finalrun start-server
    CLI->>CLI: serve SPA shell + /api/report endpoints
    User->>Browser: open http://localhost:PORT
    Browser->>CLI: GET / (SPA shell)
    Browser->>SPA: load JS bundle, mount app
    SPA->>CLI: GET /api/report/index
    CLI->>FS: read run index, manifests, snapshot YAML
    CLI-->>SPA: JSON index view-model
    SPA->>CLI: GET /api/report/runs/:runId
    CLI->>FS: read run manifest, device logs, snapshot YAML
    CLI-->>SPA: JSON run manifest
    SPA->>Browser: render run detail, request /artifacts/.. (Range)
    Browser->>CLI: GET /artifacts/... (Range header)
    CLI->>FS: validate path, stream byte range
    CLI-->>Browser: 206 + Content-Range
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~55 minutes

Possibly related PRs

Suggested labels

enhancement

Suggested reviewers

  • arnoldlaishram
  • srinidhi-lwt

Poem

🐰 I hopped through tarballs, docs, and code so neat,

Binaries bundled, traces captured—what a feat!
From CLI to cloud and a React web art,
I stitched reasoning maps and gave each feature heart.
Hooray — I twitch my nose and celebrate this release treat!

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch merge/main-into-multi-device-tests

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 15

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (6)
packages/goal-executor/src/TestExecutor.ts (1)

633-659: ⚠️ Potential issue | 🟠 Major

Propagate llmCalls through the early-return paths too.

The new aggregation only reaches the fall-through action path. Planner completion/failure and terminal action failures still push step results without llmCalls, so observability data is lost on those branches.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/goal-executor/src/TestExecutor.ts` around lines 633 - 659, The step
LLm calls aggregation (stepLLMCalls) is only added to the fall-through
stepResult construction, so branches that return early (planner
completion/failure and terminal action failures) omit llmCalls; update the
early-return branches in TestExecutor (where plannerResponse and actionResult
produce AgentActionResult objects) to build the same stepLLMCalls array from
plannerResponse.llmCall and actionResult.llmCalls and include it in those step
result objects (same conditional spread used in the fall-through). Ensure you
reference the same fields (plannerResponse.llmCall, actionResult.llmCalls) and
include the llmCalls property when constructing any AgentActionResult (e.g., the
objects built around stepResult, planner completion/failure handlers, and
terminal error returns) so observability is consistent.
README.md (2)

90-92: ⚠️ Potential issue | 🟡 Minor

Use the correct command name here.

The sentence attributes test execution to /finalrun-generate-test, but the example below uses /finalrun-use-cli. That is misleading for readers.

✏️ Suggested fix
-`/finalrun-generate-test` skill validates and run the test once your tests are generated.
+`/finalrun-use-cli` skill runs the generated test once your specs are ready.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@README.md` around lines 90 - 92, The README's "Run tests" paragraph mentions
the wrong command name (/finalrun-generate-test) compared to the example that
uses /finalrun-use-cli; update the text so the command reference matches the
example (or update the example to use /finalrun-generate-test) — edit the
heading block that currently contains `/finalrun-generate-test` to instead show
`/finalrun-use-cli` (or make the example and paragraph consistently use
`/finalrun-generate-test`) so both the sentence and example align.

111-113: ⚠️ Potential issue | 🟡 Minor

Rewrite the auto-trigger paragraph for clarity.

It is a run-on sentence and still has the everytime typo. Split it into shorter instructions.

✏️ Suggested rewrite
-Add this **[content](docs/autotrigger-finalrun.md)** to your **AGENTS.md** to auto-trigger Finalrun, so you don’t have to explicitly ask your agent to generate and run tests everytime. This will also let your agent fix issues if there is any error while development
+Add [docs/autotrigger-finalrun.md](docs/autotrigger-finalrun.md) to your `AGENTS.md` to auto-trigger FinalRun. You do not need to ask your agent to generate and run tests every time. It can also fix issues while you develop.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@README.md` around lines 111 - 113, Rewrite the paragraph under the "How to
auto trigger Finalrun to generate and test once feature development is
completed" section so it's two concise sentences: state that adding
docs/autotrigger-finalrun.md to AGENTS.md will enable Finalrun to auto-trigger,
and clarify that this avoids manually asking the agent to generate and run tests
every time; also fix the typo "everytime" -> "every time" and keep the link to
docs/autotrigger-finalrun.md and the reference to AGENTS.md intact.
packages/cli/README.md (2)

111-113: ⚠️ Potential issue | 🟡 Minor

Tighten the auto-trigger instructions.

The sentence is a run-on and everytime should be every time. Splitting it into two sentences will make it easier to follow.

✏️ Suggested rewrite
-Add this **[content](docs/autotrigger-finalrun.md)** to your **AGENTS.md** to auto-trigger Finalrun, so you don’t have to explicitly ask your agent to generate and run tests everytime. This will also let your agent fix issues if there is any error while development
+Add [docs/autotrigger-finalrun.md](docs/autotrigger-finalrun.md) to your `AGENTS.md` to auto-trigger FinalRun. You do not need to ask your agent to generate and run tests every time. It can also fix issues while you develop.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/cli/README.md` around lines 111 - 113, Split the run-on sentence
into two clearer sentences and correct "everytime" to "every time": change the
text to something like "Add this content to your AGENTS.md to auto-trigger
Finalrun. This ensures you don't need to explicitly ask your agent to generate
and run tests every time and allows the agent to attempt fixes if errors occur
during development." Update the README.md phrase accordingly.

91-92: ⚠️ Potential issue | 🟡 Minor

Use the correct skill name here.

/finalrun-generate-test generates and validates specs; it does not run the test itself. This sentence points readers at the wrong workflow.

✏️ Suggested fix
-`/finalrun-generate-test` skill validates and run the test once your tests are generated.
+`/finalrun-use-cli` skill runs the generated test once your specs are ready.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/cli/README.md` around lines 91 - 92, The README incorrectly states
that the `/finalrun-generate-test` skill "validates and run the test"; update
the sentence to reflect the actual behavior by saying `/finalrun-generate-test`
generates and validates specs but does not execute tests (or replace with the
correct skill name if a different skill actually runs tests). Locate the string
`/finalrun-generate-test` in the README and change the wording to explicitly
state it "generates and validates specs" and remove or correct the part about
running tests so the workflow is not misrepresented.
mintlify-docs/CONTRIBUTING.md (1)

1-35: ⚠️ Potential issue | 🟠 Major

Convert this guide to Mintlify’s MDX page format.

If this is meant to ship in the docs site, rename it to CONTRIBUTING.mdx and add YAML frontmatter. The current .md page does not match the docs convention.

As per coding guidelines, pages should be MDX files with YAML frontmatter in Mintlify documentation projects.

🛠️ Suggested frontmatter starter
---
title: Contribute to the documentation
description: How to contribute to the FinalRun docs
---
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mintlify-docs/CONTRIBUTING.md` around lines 1 - 35, Rename CONTRIBUTING.md to
CONTRIBUTING.mdx, convert the content to MDX by adding YAML frontmatter at the
top (e.g., title: "Contribute to the documentation" and a short description),
and ensure the existing Markdown sections (How to contribute, Writing
guidelines) remain below the frontmatter unchanged; update any internal links
(like development.mdx) if needed so they resolve correctly in the MDX site
build.
🟡 Minor comments (22)
packages/cli/src/reportArtifactStream.ts-52-77 (1)

52-77: ⚠️ Potential issue | 🟡 Minor

Keep HEAD error responses bodyless.

writeErrorHtml() always emits HTML, so HEAD requests for missing artifacts still get a response body. Pass headOnly through the error path and suppress the body there too.

Suggested fix
   try {
     resolvedPath = resolveArtifactPath(artifactsDir, relativePath);
   } catch (error) {
-    writeErrorHtml(response, 404, 'Artifact Not Found', (error as Error).message);
+    writeErrorHtml(response, 404, 'Artifact Not Found', (error as Error).message, headOnly);
     return;
   }
@@
     writeErrorHtml(
       response,
       404,
       'Artifact Not Found',
       error instanceof Error ? error.message : String(error),
+      headOnly,
     );
     return;
   }
@@
 function writeErrorHtml(
   response: ServerResponse,
   status: number,
   title: string,
   message: string,
+  headOnly = false,
 ): void {
   response.writeHead(status, {
     'Content-Type': 'text/html; charset=utf-8',
     'Cache-Control': 'no-store',
   });
-  response.end(renderHtmlErrorPage({ title, message }));
+  response.end(headOnly ? undefined : renderHtmlErrorPage({ title, message }));
 }

Also applies to: 240-250

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/cli/src/reportArtifactStream.ts` around lines 52 - 77, The error
paths call writeErrorHtml() unconditionally and therefore send HTML bodies for
HEAD requests; change those calls to pass the headOnly flag through so
writeErrorHtml(response, 404, 'Artifact Not Found', message, headOnly) (or
equivalent signature) is used in both the resolvedPath resolve catch and the
realpath/stat catch, and update any other error-path calls (the secondary block
around the realpath/stat and the other block at the later error handling
mentioned) to forward the same headOnly boolean so the helper can suppress the
response body for HEAD requests; locate and update calls to writeErrorHtml and
ensure the helper signature and call sites (e.g., in reportArtifactStream and
other error paths) accept and use the headOnly parameter.
packages/report-web/scripts/build-styles.mjs-21-31 (1)

21-31: ⚠️ Potential issue | 🟡 Minor

Report the stylesheet count correctly.

parts.length includes markers and blank separators, so the log message overstates how many stylesheets were written. Use files.length or a dedicated counter if you want to report source files.

Suggested fix
- console.log(`✓ wrote ${path.relative(root, outPath)} (${parts.length} sections)`);
+ console.log(`✓ wrote ${path.relative(root, outPath)} (${files.length} source files)`);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/report-web/scripts/build-styles.mjs` around lines 21 - 31, The log
currently reports the number of "sections" using parts.length which counts
markers and blank lines; change the console.log to report the actual number of
source stylesheets by using files.length (or maintain a dedicated counter)
instead of parts.length—update the console.log call that references outPath and
parts.length so it uses files.length (or the new counter) to correctly report
how many stylesheets were written.
packages/report-web/src/ui/logs.ts-52-72 (1)

52-72: ⚠️ Potential issue | 🟡 Minor

Ignore invalid recordingStartedAt values instead of filtering out valid lines.

If recordingStartedAt is malformed, recStartMs becomes NaN, and the current comparison drops timestamped log lines unexpectedly. Treat non-finite values like an absent start time.

Suggested fix
   const recStartMs = recordingStartedAt ? new Date(recordingStartedAt).getTime() : undefined;
+  const hasValidRecordingStart = Number.isFinite(recStartMs);
   return logText
     .split('\n')
     .filter((line) => {
       if (line.length === 0) return false;
-      if (recStartMs === undefined) return true;
+      if (!hasValidRecordingStart) return true;
       const ts = parseLogTimestamp(line, recordingStartedAt);
       if (!ts) return true;
       const tsMs = new Date(ts).getTime();
       return !Number.isFinite(tsMs) || tsMs >= recStartMs;
     })
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/report-web/src/ui/logs.ts` around lines 52 - 72, The function
parseDeviceLogLines currently computes recStartMs from recordingStartedAt but
leaves it as NaN for malformed timestamps, causing valid lines to be filtered
out; change recStartMs calculation in parseDeviceLogLines so that if new
Date(recordingStartedAt).getTime() is not finite you set recStartMs to undefined
(treat non-finite the same as absent), then keep the existing filter logic that
checks recStartMs === undefined and compares tsMs >= recStartMs; reference
symbols: parseDeviceLogLines, recStartMs, parseLogTimestamp.
mintlify-docs/AGENTS.md-1-13 (1)

1-13: ⚠️ Potential issue | 🟡 Minor

Tighten the setup note to match the docs style guide.

The intro uses third person and leaves command/file references unformatted. Keep it in second person and wrap docs.json, mint dev, and mint broken-links in code formatting.

Proposed edit
-> **First-time setup**: Customize this file for your project. Prompt the user to customize this file for their project.
+> **First-time setup**: Customize this file for your project.
@@
-- Configuration lives in docs.json
-- Run mint dev to preview locally
-- Run mint broken-links to check links
+- Configuration lives in `docs.json`
+- Run `mint dev` to preview locally
+- Run `mint broken-links` to check links
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mintlify-docs/AGENTS.md` around lines 1 - 13, The "First-time setup" intro is
written in third person and leaves command/file refs unformatted; rewrite that
paragraph to address the reader in second person, tighten the prompt to
customize the file for their project, and wrap the references `docs.json`, `mint
dev`, and `mint broken-links` in inline code formatting; update the "First-time
setup" heading and its sentence to something like a brief instruction to
customize this file for your project while keeping the rest of the
"Documentation project instructions" content unchanged.
mintlify-docs/configuration/cloud-api-key.mdx-35-37 (1)

35-37: ⚠️ Potential issue | 🟡 Minor

Avoid overwriting the workspace .env.

Using > will replace any existing .env file and can silently drop unrelated variables. Use an append/edit flow instead.

📝 Safer example
-    echo "FINALRUN_API_KEY=your-key-here" > .env
+    printf '\nFINALRUN_API_KEY=%s\n' "your-key-here" >> .env
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mintlify-docs/configuration/cloud-api-key.mdx` around lines 35 - 37, The
current example overwrites the workspace .env using the snippet echo
"FINALRUN_API_KEY=your-key-here" > .env; change it to an append/edit flow so
existing variables aren’t lost (use the shell append operator or a safe
file-edit method to add the FINALRUN_API_KEY line to .env instead of replacing
it), and update the documented example accordingly so it appends rather than
truncates the file.
packages/cli/src/env.test.ts-47-65 (1)

47-65: ⚠️ Potential issue | 🟡 Minor

Add a labeled assertion for the required-model path.

The new label-prefix checks never exercise parseModel(undefined, label), so a regression there would still pass this suite.

🧪 Suggested additional assertion
   assert.throws(
     () => parseModel(undefined),
     /--model is required\. Use provider\/model, for example google\/gemini-3-flash-preview\. Supported providers: openai, google, anthropic\./,
   );
+  assert.throws(
+    () => parseModel(undefined, 'features.planner.model'),
+    /features\.planner\.model.*--model is required\./,
+  );
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/cli/src/env.test.ts` around lines 47 - 65, Add a test that calls
parseModel(undefined, 'features.planner.model') and assert it throws an error
that includes the provided label; specifically add an assert.throws(() =>
parseModel(undefined, 'features.planner.model'),
/features\.planner\.model.*--model is required\./) (or a regex that ensures the
thrown message contains the label and the required-model text) next to the
existing assertions so the labeled required-model path is exercised; reference
parseModel and the test in packages/cli/src/env.test.ts.
mintlify-docs/README.md-29-39 (1)

29-39: ⚠️ Potential issue | 🟡 Minor

Add language tags to the fenced commands.

The bare fences will trip markdownlint’s MD040 rule. Mark them as bash/sh so the page lints cleanly.

💡 Suggested fix
-```
+```bash
 npm i -g mint
-```
+```

-```
+```bash
 mint dev
-```
+```
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mintlify-docs/README.md` around lines 29 - 39, The fenced code blocks
containing the shell commands should include a language tag to satisfy
markdownlint MD040; update the two backtick fences that wrap the lines with the
commands "npm i -g mint" and "mint dev" so they read ```bash (or ```sh) instead
of plain ``` to mark them as shell code blocks; locate the fenced blocks around
those exact command lines in README.md and replace the opening ``` with ```bash
for each block.
mintlify-docs/running/cli-reference.mdx-88-90 (1)

88-90: ⚠️ Potential issue | 🟡 Minor

Clarify the --api-key limitation.

This reads like a universal override, but multi-provider runs reject --api-key. Add that restriction so users do not hit an unexpected validation error.

✏️ Proposed wording
-  Override the provider API key for this run. Use `--api-key` for one-off runs; for persistent configuration set the provider environment variable (`GOOGLE_API_KEY`, `OPENAI_API_KEY`, or `ANTHROPIC_API_KEY`).
+  Override the provider API key for a single-provider run. For persistent configuration, set the provider environment variable (`GOOGLE_API_KEY`, `OPENAI_API_KEY`, or `ANTHROPIC_API_KEY`).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mintlify-docs/running/cli-reference.mdx` around lines 88 - 90, Update the
ParamField for "--api-key" to clarify its limitation: change the description
text in the ParamField (ParamField path="--api-key") to state that --api-key
only overrides the provider API key for single-provider runs and will be
rejected for multi-provider runs, and point users to set provider environment
variables (GOOGLE_API_KEY, OPENAI_API_KEY, ANTHROPIC_API_KEY) for persistent or
multi-provider configurations; keep the existing examples (one-off vs
persistent) but add a short sentence noting the validation error occurs for
multi-provider runs.
mintlify-docs/tests/suites.mdx-35-60 (1)

35-60: ⚠️ Potential issue | 🟡 Minor

Use the suite manifest path in the examples.

The page says suite manifests live under .finalrun/suites/, but the commands show only auth_smoke.yaml. That makes the path convention look ambiguous.

📝 Proposed doc fix
- finalrun suite auth_smoke.yaml --platform android --model google/gemini-3-flash-preview
+ finalrun suite .finalrun/suites/auth_smoke.yaml --platform android --model google/gemini-3-flash-preview
- finalrun check --suite auth_smoke.yaml
+ finalrun check --suite .finalrun/suites/auth_smoke.yaml
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mintlify-docs/tests/suites.mdx` around lines 35 - 60, Examples use bare
filenames (auth_smoke.yaml) which contradict the stated convention that
manifests live under .finalrun/suites/; update the command examples to use the
full suite manifest path (e.g., .finalrun/suites/auth_smoke.yaml) in all three
places: the two finalrun suite examples and the finalrun check example so the
docs consistently demonstrate the one-to-one mapping between feature folders and
suite files.
mintlify-docs/configuration/ai-providers.mdx-31-33 (1)

31-33: ⚠️ Potential issue | 🟡 Minor

Don't overwrite .env in the examples.

If .env already exists, these snippets will clobber whatever is already there. Use >> or printf for additive setup, or say explicitly that the snippet creates a brand-new file.

Suggested fix
-echo "GOOGLE_API_KEY=your-key-here" > .env
+echo "GOOGLE_API_KEY=your-key-here" >> .env

Also applies to: 73-85

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mintlify-docs/configuration/ai-providers.mdx` around lines 31 - 33, The
example currently overwrites .env by using the single-redirection form (`echo
"GOOGLE_API_KEY=your-key-here" > .env`); change those snippets to append safely
(use append redirection or a non-destructive method like `printf` with >>) or
explicitly state that the command creates a brand-new file so readers aren't
surprised, and update the other occurrences mentioned (the block around lines
73-85) to the same non-clobbering approach; look for the exact snippet string
`echo "GOOGLE_API_KEY=your-key-here" > .env` and replace or annotate it
accordingly.
packages/report-web/src/ui/components/SummaryCard.tsx-21-24 (1)

21-24: ⚠️ Potential issue | 🟡 Minor

Use a block wrapper for the label/value pair.

A <span> cannot legally contain <div> children, and browsers may reparent the nodes. That can produce layout or hydration quirks.

Suggested fix
-      <span>
+      <div className="summary-card-copy">
         <div className="summary-card-label">{label}</div>
         <div className="summary-card-value">{value}</div>
-      </span>
+      </div>
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/report-web/src/ui/components/SummaryCard.tsx` around lines 21 - 24,
The wrapper around the label/value pair in SummaryCard.tsx is a <span>
containing <div> children which is invalid HTML; update the SummaryCard
component to use a block-level wrapper (e.g., replace the outer <span> with a
<div> or semantic container like <section> or <article>) so the inner <div
className="summary-card-label"> and <div className="summary-card-value"> remain
valid and hydration/layout quirks are avoided; adjust or add a wrapper class
name if needed to preserve styling.
packages/cli/src/workspace.ts-458-461 (1)

458-461: ⚠️ Potential issue | 🟡 Minor

Reject null feature overrides instead of skipping them.

features.planner: null currently disappears without validation, which hides malformed YAML and makes typos harder to spot. Let the plain-object check fail fast.

Suggested fix
   for (const [featureKey, rawOverride] of Object.entries(value)) {
-    if (rawOverride === undefined || rawOverride === null) {
+    if (rawOverride === undefined) {
       continue;
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/cli/src/workspace.ts` around lines 458 - 461, In the loop iterating
Object.entries(value) in workspace.ts (variables featureKey and rawOverride),
stop silently skipping nulls: change the check so only undefined is ignored and
explicitly throw a validation error when rawOverride === null (e.g., include
featureKey in the message), ensuring malformed YAML like features.planner: null
fails fast and triggers the existing plain-object validation path rather than
disappearing.
mintlify-docs/quickstart.mdx-34-39 (1)

34-39: ⚠️ Potential issue | 🟡 Minor

Create .finalrun/env before writing dev.yaml.

The guide tells users to write .finalrun/env/dev.yaml, but Step 2 only creates .finalrun/tests. A clean workspace will fail on the first redirect.

Suggested fix
     mkdir -p .finalrun/tests
+    mkdir -p .finalrun/env

Also applies to: 92-98

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mintlify-docs/quickstart.mdx` around lines 34 - 39, The documentation step
"Create the workspace structure" currently only creates `.finalrun/tests` but
omits creating the `.finalrun/env` directory required before writing `dev.yaml`;
update the step to also create the `.finalrun/env` directory (and optionally
create an empty `dev.yaml` placeholder) so the redirect and subsequent
instructions won't fail, and mirror the same change for the repeated section
referenced around the later block (the other instance covering lines 92-98).
mintlify-docs/configuration/workspace.mdx-7-27 (1)

7-27: ⚠️ Potential issue | 🟡 Minor

Rewrite the opening in second person.

The intro still reads like product copy instead of user-facing instructions. Please address the reader directly and keep the sentences shorter here.

As per coding guidelines, mintlify-docs/**/*.{md,mdx}: Use active voice and second person ('you') in documentation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mintlify-docs/configuration/workspace.mdx` around lines 7 - 27, Rewrite the
opening paragraph of workspace.mdx into second person and active voice so it
addresses the reader directly (use "you" and short sentences); update the lines
that describe the workspace root and `.finalrun/` folder and replace passive
phrases like "is anchored to" and "holds your configuration file" with active
phrasing (e.g., "Your FinalRun project lives in a workspace root — the directory
that contains the `.finalrun/` folder. You keep `config.yaml`, tests in
`tests/`, optional `suites/`, and `env/` there."); keep the workspace layout
block but ensure preceding explanatory sentences are concise and instructive,
referring to `config.yaml`, `tests/`, `suites/`, and `env/` by name so readers
can locate them.
packages/cli/src/apiKey.ts-41-53 (1)

41-53: ⚠️ Potential issue | 🟡 Minor

Normalize --api-key before treating it as set.

A whitespace-only value still passes the truthy check here, so an accidental --api-key=" " will bypass env lookup and fail later as an auth error. Trim the input first, and mirror the same normalization in resolveApiKey above.

🔧 Suggested fix
 export function resolveApiKey(params: {
   env: Pick<CliEnv, 'get'>;
   provider: string;
   providedApiKey?: string;
 }): string {
-  if (params.providedApiKey) {
-    return params.providedApiKey;
+  const providedApiKey = params.providedApiKey?.trim();
+  if (providedApiKey) {
+    return providedApiKey;
   }
@@
 export function resolveApiKeys(params: {
   env: Pick<CliEnv, 'get'>;
   providers: Iterable<string>;
   providedApiKey?: string;
 }): Record<string, string> {
@@
-  if (params.providedApiKey) {
+  const providedApiKey = params.providedApiKey?.trim();
+  if (providedApiKey) {
     if (providers.length > 1) {
       throw new Error(
@@
     }
-    return { [providers[0]!]: params.providedApiKey };
+    return { [providers[0]!]: providedApiKey };
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/cli/src/apiKey.ts` around lines 41 - 53, Normalize and trim the CLI
--api-key input before treating it as present: change the truthy check on
params.providedApiKey in the block that returns { [providers[0]!]:
params.providedApiKey } to use a trimmed version (mirror the normalization used
in resolveApiKey) so whitespace-only strings are treated as unset; then use the
trimmed value when returning the per-provider map and when performing the
providers.length > 1 validation to ensure consistent behavior with
resolveApiKey.
mintlify-docs/configuration/environments.mdx-7-40 (1)

7-40: ⚠️ Potential issue | 🟡 Minor

Use the project’s “tests” terminology consistently.

Lines 7 and 40 say “test specs”, but the rest of the product has been moving toward just “tests”. Keeping one term here will make the docs easier to scan and less internally inconsistent.

Based on learnings: Define and document project-specific terminology and preferred usage.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mintlify-docs/configuration/environments.mdx` around lines 7 - 40, Replace
the inconsistent phrase "test specs" with the preferred project term "tests" in
this document: update the sentence "Separating environments lets you run the
same test specs against development, staging, and production..." to use "tests",
and change "In your test specs, you can reference these values as..." to "In
your tests, you can reference these values as..."; scan the rest of the file for
any remaining "test specs" occurrences and standardize them to "tests" to
maintain consistent terminology.
packages/local-runtime/scripts/buildRuntimeTarball.mjs-35-35 (1)

35-35: ⚠️ Potential issue | 🟡 Minor

Use an OS-safe basename in the .sha256 sidecar.

Line 190 uses tarballPath.split('/').pop() which fails on Windows hosts where paths use backslashes. This causes the checksum file to record the full path instead of the tarball filename, breaking the standard checksum-file format. Use basename() from node:path instead, which handles both POSIX and Windows path separators correctly.

Suggested fix
-import { dirname, join, relative, resolve } from 'node:path';
+import { basename, dirname, join, relative, resolve } from 'node:path';
@@
 writeFileSync(
   `${tarballPath}.sha256`,
-  `${tarballSha}  ${tarballPath.split('/').pop()}\n`,
+  `${tarballSha}  ${basename(tarballPath)}\n`,
   'utf8',
 );
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/local-runtime/scripts/buildRuntimeTarball.mjs` at line 35, The
.sha256 sidecar currently uses tarballPath.split('/').pop(), which breaks on
Windows; import basename from node:path (add basename to the existing import
list) and replace tarballPath.split('/').pop() with basename(tarballPath) so the
sidecar records only the tarball filename across OSes; update the import in the
top-level import { dirname, join, relative, resolve } to include basename and
change the place where tarballPath is processed to call basename(tarballPath).
packages/cli/src/cloudRunner.ts-22-29 (1)

22-29: ⚠️ Potential issue | 🟡 Minor

Trim FINALRUN_API_KEY before validating it.

A whitespace-only value currently passes Line 23 and then fails downstream. Normalize before validation to fail fast with the intended message.

Minimal fix
 function requireApiKey(): string {
-  const key = process.env['FINALRUN_API_KEY'] ?? '';
+  const key = (process.env['FINALRUN_API_KEY'] ?? '').trim();
   if (!key) {
     throw new Error(
       'FINALRUN_API_KEY is not set. Get your API key from the FinalRun Cloud dashboard and set it:\n' +
       '  export FINALRUN_API_KEY=fr_your_key_here',
     );
   }
   return key;
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/cli/src/cloudRunner.ts` around lines 22 - 29, Trim and normalize the
FINALRUN_API_KEY value before validating it: when reading
process.env['FINALRUN_API_KEY'] into the variable key, call .trim() (or
otherwise remove surrounding whitespace) and then check if the trimmed key is
empty; if it is, throw the existing Error message. Update the code around the
variable key and its return so downstream callers receive a trimmed, validated
API key.
packages/cloud-core/src/upload.ts-29-34 (1)

29-34: ⚠️ Potential issue | 🟡 Minor

Timeout validation does not match the declared contract.

Line 29 currently allows fractional values, but the error message on Line 31 says the env var must be a positive integer. Enforce integer validation (or relax the message).

🔧 Proposed fix
-  if (!Number.isFinite(parsed) || parsed <= 0) {
+  if (!Number.isInteger(parsed) || parsed <= 0) {
     throw new Error(
       `Invalid ${envVar}=${JSON.stringify(raw)}: must be a positive integer (milliseconds).`,
     );
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/cloud-core/src/upload.ts` around lines 29 - 34, The timeout
validation currently allows fractional milliseconds (it only checks
Number.isFinite(parsed) and parsed > 0) but the error message expects a positive
integer; update the check to require an integer (e.g., use
Number.isInteger(parsed) && parsed > 0) so parsed is a positive integer, and
keep the error message using envVar and raw as-is; ensure the logic that returns
parsed still returns the integer value.
packages/cli/bin/finalrun.ts-250-250 (1)

250-250: ⚠️ Potential issue | 🟡 Minor

Cloud upload help text is too narrow for iOS artifact formats.

Line 250 mentions .apk or .app, but the upload path also supports iOS package-style inputs (for example .ipa / .app.zip). The current text can send users down the wrong path.

📝 Proposed fix
-  .requiredOption('--app <path>', 'Path to the .apk or .app to upload')
+  .requiredOption('--app <path>', 'Path to the app binary to upload (.apk, .ipa, or .app.zip)')
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/cli/bin/finalrun.ts` at line 250, Update the CLI help text for the
.requiredOption('--app <path>', ...) declaration to list all supported mobile
artifact formats (e.g., ".apk", ".ipa", ".app", ".app.zip" or "zipped .app")
instead of only ".apk or .app"; locate the call to .requiredOption('--app
<path>' in finalrun.ts and replace its description string with a concise,
inclusive phrase such as "Path to the Android (.apk) or iOS (.ipa, .app,
.app.zip) artifact to upload".
mintlify-docs/faq.mdx-7-102 (1)

7-102: ⚠️ Potential issue | 🟡 Minor

Align FAQ prose with doc voice guidelines (you + shorter sentences).

Lines 7-102 include multiple long, multi-idea sentences and product-centric phrasing (“FinalRun is…”) instead of directly addressing the reader. Please rewrite these sections in second person and split compound statements into shorter single-idea sentences.

As per coding guidelines: "Use active voice and second person ('you') in documentation" and "Keep sentences concise — one idea per sentence".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mintlify-docs/faq.mdx` around lines 7 - 102, The FAQ prose uses
product-centric, multi-idea sentences; rewrite the content inside each Accordion
(e.g., the sections titled "Do I need to create a FinalRun account?", "Which AI
providers does FinalRun support?", "Does FinalRun work with real devices or only
emulators?", etc.) into second-person voice ("you") and break compound sentences
into short, single-idea sentences—prefer active voice and concise lines (one
idea per sentence). Update wording across all Accordion blocks between the
opening <AccordionGroup> and closing tag to use "you" (e.g., "You don't need an
account.") and split long statements into separate sentences (e.g., provider
support, env vars, and override flags each on their own sentence), keeping
meaning unchanged.
packages/cloud-core/src/submit.ts-58-63 (1)

58-63: ⚠️ Potential issue | 🟡 Minor

Keep timeout validation consistent with the declared contract.

Line 60 says “positive integer”, but the check currently allows fractional values. Either update wording or enforce Number.isInteger(parsed).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/cloud-core/src/submit.ts` around lines 58 - 63, The validation for
FINALRUN_SUBMIT_TIMEOUT_MS currently allows fractional milliseconds though the
error says "positive integer"; update the check in submit.ts to require an
integer by adding Number.isInteger(parsed) to the condition (e.g., if
(!Number.isFinite(parsed) || !Number.isInteger(parsed) || parsed <= 0) throw new
Error(...)), and ensure the thrown message still references
FINALRUN_SUBMIT_TIMEOUT_MS and "positive integer (milliseconds)" via
JSON.stringify(raw).
🧹 Nitpick comments (3)
packages/report-web/index.html (1)

7-12: Consider self-hosting the font for stricter deployments.

Loading DM Sans from Google Fonts adds a runtime dependency on a third-party origin. If this app needs to work in offline, air-gapped, or strict-CSP environments, self-hosting the font would be more robust.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/report-web/index.html` around lines 7 - 12, The HTML currently loads
DM Sans from Google Fonts via the three <link> elements (preconnect to
fonts.googleapis.com and fonts.gstatic.com and the href to the Google-hosted
stylesheet); to self-host, download the DM Sans font files (woff/woff2) and
create a local CSS `@font-face` (e.g., assets/fonts/dm-sans.css) referencing those
files, place the font binaries in your static assets (e.g., assets/fonts/),
replace the external <link href="https://fonts.googleapis.com/..."> with a local
<link rel="stylesheet" href="/assets/fonts/dm-sans.css"> and remove the
preconnect links (or adjust CSP) and ensure your CSP/static file serving allows
those assets to be served.
packages/goal-executor/package.json (1)

20-29: Migrate .passthrough() to z.looseObject() for Zod 4 compatibility.

zod@^4.1.8 marks .passthrough() as deprecated (though still functional); the recommended replacement is z.looseObject(). The schemas file uses .passthrough() extensively (9 occurrences). typescript@^6.0.3 is compatible with the existing tsconfig.base.json (which already enforces strict: true and modern defaults). Update the schema definitions to use z.looseObject() where .passthrough() is currently applied to align with Zod 4 best practices and eliminate deprecation warnings.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/goal-executor/package.json` around lines 20 - 29, Several schema
files use the deprecated Zod method .passthrough(); search for the ~9
occurrences and replace each schema call that ends with .passthrough() with the
Zod 4-compatible z.looseObject() form (e.g., where you have
SomeSchema.passthrough(), convert to z.looseObject(SomeSchema) or apply
z.looseObject() to the object schema creation), ensuring the Zod import (z from
'zod') remains available and that any chained validators are correctly
preserved; run TypeScript compile to catch any type differences and adjust the
affected schema variable names (e.g., the schemas referenced in your codebase)
to match the new z.looseObject usage.
packages/cli/src/reportViewModel.ts (1)

8-9: Consider centralizing report view-model types to remove lockstep drift risk.

Both CLI and report-web define the same interfaces and rely on manual alignment. Moving these types to a shared package/module would prevent silent contract drift.

Also applies to: 24-60

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/cli/src/reportViewModel.ts` around lines 8 - 9, Extract the
duplicated report view-model interfaces defined in
packages/cli/src/reportViewModel.ts (lines ~24-60) and
packages/report-web/src/artifacts.ts into a single shared module (e.g.,
packages/report-types or packages/shared/reportViewModel) and export them; then
replace the local interface declarations in reportViewModel.ts and artifacts.ts
with imports from the new shared module, update package.json/tsconfig paths or
workspace references so both packages consume the shared package, and remove the
old duplicate definitions to prevent future drift.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9ba50296-12a3-4e37-9579-fd3466f86fc8

📥 Commits

Reviewing files that changed from the base of the PR and between 562ca36 and 08da896.

⛔ Files ignored due to path filters (3)
  • mintlify-docs/logo/finalrun-logo-dark-theme.png is excluded by !**/*.png
  • mintlify-docs/logo/finalrun-logo.png is excluded by !**/*.png
  • package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (143)
  • .changeset/README.md
  • .changeset/config.json
  • .changeset/publish-common-report-web.md
  • .github/release-notes-template.md
  • .github/workflows/release.yml
  • .gitignore
  • CHANGELOG.md
  • CONTRIBUTING.md
  • README.md
  • RELEASING.md
  • docs/cli-reference.md
  • docs/codebase-walkthrough.md
  • docs/configuration.md
  • docs/environment.md
  • docs/troubleshooting.md
  • mintlify-docs/.atlas-analysis.json
  • mintlify-docs/.mintignore
  • mintlify-docs/AGENTS.md
  • mintlify-docs/CONTRIBUTING.md
  • mintlify-docs/LICENSE
  • mintlify-docs/README.md
  • mintlify-docs/configuration/ai-providers.mdx
  • mintlify-docs/configuration/cloud-api-key.mdx
  • mintlify-docs/configuration/environments.mdx
  • mintlify-docs/configuration/workspace.mdx
  • mintlify-docs/docs.json
  • mintlify-docs/faq.mdx
  • mintlify-docs/index.mdx
  • mintlify-docs/installation.mdx
  • mintlify-docs/quickstart.mdx
  • mintlify-docs/running/ai-agent-skills.mdx
  • mintlify-docs/running/cli-reference.mdx
  • mintlify-docs/running/reports.mdx
  • mintlify-docs/tests/placeholders.mdx
  • mintlify-docs/tests/suites.mdx
  • mintlify-docs/tests/yaml-format.mdx
  • mintlify-docs/troubleshooting.mdx
  • package.json
  • packages/cli/LICENSE
  • packages/cli/README.md
  • packages/cli/bin/finalrun.ts
  • packages/cli/package.json
  • packages/cli/scripts/cleanupPackage.mjs
  • packages/cli/scripts/copyReportApp.mjs
  • packages/cli/scripts/installAssets.mjs
  • packages/cli/scripts/installAssets.test.mjs
  • packages/cli/scripts/preparePackage.mjs
  • packages/cli/scripts/runTests.mjs
  • packages/cli/src/apiKey.test.ts
  • packages/cli/src/apiKey.ts
  • packages/cli/src/cloudRunner.ts
  • packages/cli/src/contentTypes.ts
  • packages/cli/src/env.test.ts
  • packages/cli/src/env.ts
  • packages/cli/src/goalRunner.test.ts
  • packages/cli/src/localRuntime.ts
  • packages/cli/src/reportArtifactStream.ts
  • packages/cli/src/reportIndexTemplate.ts
  • packages/cli/src/reportServer.ts
  • packages/cli/src/reportTemplate.ts
  • packages/cli/src/reportTemplates.test.ts
  • packages/cli/src/reportViewModel.ts
  • packages/cli/src/reportWriter.ts
  • packages/cli/src/runtimePaths.ts
  • packages/cli/src/sessionRunner.ts
  • packages/cli/src/testRunner.test.ts
  • packages/cli/src/testRunner.ts
  • packages/cli/src/upgradeCommand.ts
  • packages/cli/src/workspace.test.ts
  • packages/cli/src/workspace.ts
  • packages/cloud-core/package.json
  • packages/cloud-core/src/index.ts
  • packages/cloud-core/src/submit.ts
  • packages/cloud-core/src/upload.ts
  • packages/cloud-core/tsconfig.json
  • packages/common/package.json
  • packages/common/src/constants.ts
  • packages/device-node/package.json
  • packages/goal-executor/package.json
  • packages/goal-executor/src/ActionExecutor.ts
  • packages/goal-executor/src/TestExecutor.ts
  • packages/goal-executor/src/ai/AIAgent.test.ts
  • packages/goal-executor/src/ai/AIAgent.ts
  • packages/goal-executor/src/ai/VisualGrounder.ts
  • packages/goal-executor/src/ai/schemas.test.ts
  • packages/goal-executor/src/ai/schemas.ts
  • packages/goal-executor/src/index.ts
  • packages/goal-executor/src/trace.ts
  • packages/local-runtime/.gitignore
  • packages/local-runtime/package.json
  • packages/local-runtime/scripts/buildRuntimeTarball.mjs
  • packages/report-web/app/artifacts/[...artifactPath]/route.ts
  • packages/report-web/app/health/route.ts
  • packages/report-web/app/route.ts
  • packages/report-web/app/runs/[runId]/route.ts
  • packages/report-web/index.html
  • packages/report-web/next-env.d.ts
  • packages/report-web/next.config.ts
  • packages/report-web/package.json
  • packages/report-web/scripts/build-styles.mjs
  • packages/report-web/scripts/scope-css.mjs
  • packages/report-web/src/artifactRoute.test.ts
  • packages/report-web/src/artifacts.test.ts
  • packages/report-web/src/artifacts.ts
  • packages/report-web/src/css.d.ts
  • packages/report-web/src/fetchers.ts
  • packages/report-web/src/main.tsx
  • packages/report-web/src/renderers.test.ts
  • packages/report-web/src/renderers.ts
  • packages/report-web/src/routes/StandaloneReportApp.tsx
  • packages/report-web/src/routes/index.ts
  • packages/report-web/src/ui/client/runDetailController.ts
  • packages/report-web/src/ui/components/DetailSectionCard.tsx
  • packages/report-web/src/ui/components/DeviceLogPanel.tsx
  • packages/report-web/src/ui/components/RunContextSummary.tsx
  • packages/report-web/src/ui/components/SegmentSummary.tsx
  • packages/report-web/src/ui/components/StatusPill.tsx
  • packages/report-web/src/ui/components/StepButton.tsx
  • packages/report-web/src/ui/components/SummaryCard.tsx
  • packages/report-web/src/ui/components/TestDetailSection.tsx
  • packages/report-web/src/ui/components/TintedPngIcon.tsx
  • packages/report-web/src/ui/components/VideoPanel.tsx
  • packages/report-web/src/ui/format.ts
  • packages/report-web/src/ui/icons.ts
  • packages/report-web/src/ui/index.ts
  • packages/report-web/src/ui/logs.ts
  • packages/report-web/src/ui/pages/RunDetailView.tsx
  • packages/report-web/src/ui/pages/RunIndexView.tsx
  • packages/report-web/src/ui/routes.ts
  • packages/report-web/src/ui/styles/run-detail.css
  • packages/report-web/src/ui/styles/run-index.css
  • packages/report-web/src/ui/styles/shared.css
  • packages/report-web/src/ui/viewModel.ts
  • packages/report-web/tsconfig.app.json
  • packages/report-web/tsconfig.json
  • packages/report-web/tsconfig.lib.json
  • packages/report-web/tsup.config.ts
  • packages/report-web/vite.config.ts
  • scripts/build-binary.sh
  • scripts/ensure-dev-install.mjs
  • scripts/install.ps1
  • scripts/install.sh
  • tsconfig.base.json
💤 Files with no reviewable changes (17)
  • packages/report-web/next-env.d.ts
  • packages/report-web/next.config.ts
  • packages/report-web/app/runs/[runId]/route.ts
  • packages/report-web/app/health/route.ts
  • packages/cli/src/reportIndexTemplate.ts
  • packages/cli/scripts/cleanupPackage.mjs
  • packages/report-web/src/artifacts.test.ts
  • packages/report-web/app/artifacts/[...artifactPath]/route.ts
  • packages/report-web/src/artifactRoute.test.ts
  • packages/cli/scripts/installAssets.test.mjs
  • packages/report-web/src/renderers.ts
  • packages/report-web/app/route.ts
  • packages/cli/src/reportTemplates.test.ts
  • packages/report-web/src/renderers.test.ts
  • packages/cli/scripts/preparePackage.mjs
  • packages/cli/scripts/installAssets.mjs
  • packages/cli/src/reportTemplate.ts

Comment thread .gitignore
# Dependencies
node_modules/
packages/*/node_modules/
package-lock.json
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Do not ignore the root package-lock.json.

This repo uses npm ci, so the lockfile needs to stay tracked for reproducible installs and workspace dependency updates. Ignoring it will make dependency bumps harder to review and can leave CI without the exact resolved tree.

Proposed fix
-package-lock.json
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
package-lock.json
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.gitignore at line 4, Remove the ignore entry "package-lock.json" from the
ignore list, ensure the repository tracks the root package-lock.json (if it was
previously ignored run git rm --cached package-lock.json then git add
package-lock.json and commit), and update the ignore rules so only any
intentionally ignored lockfiles (e.g., for nested workspaces) remain excluded.
This will restore tracking of the root lockfile used by npm ci for reproducible
installs.

Comment thread CONTRIBUTING.md
Comment on lines +13 to +19
End-user install (binary, not npm):

```sh
npm install -g @finalrun/finalrun-agent
curl -fsSL https://raw.githubusercontent.com/final-run/finalrun-agent/main/scripts/install.sh | bash
```

The CLI is no longer published to npm — it ships as a Bun-compiled binary plus a per-platform runtime tarball uploaded to GitHub Releases. See [RELEASING.md](./RELEASING.md) for how a release is cut.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid recommending curl | bash from main.

Linking end users to a mutable branch means the installer can change after this doc ships, which is a supply-chain risk. Pin the install script to a release tag and require checksum/signature verification before asking people to execute it.

Proposed fix
-curl -fsSL https://raw.githubusercontent.com/final-run/finalrun-agent/main/scripts/install.sh | bash
+curl -fsSL https://raw.githubusercontent.com/final-run/finalrun-agent/<release-tag>/scripts/install.sh | bash
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@CONTRIBUTING.md` around lines 13 - 19, Update the "End-user install" section
to stop recommending an unpinned curl | bash from main: replace the current curl
URL with an example pinned release URL (pointing at a release tag and the
specific scripts/install.sh file) and add explicit steps requiring users to
download the script, verify its checksum/signature (mention verifying an
accompanying SHA256 checksum file and/or GPG signature), and only then run it
(e.g., run sh ./install.sh after verification). Reference the existing installer
filename (scripts/install.sh) and the release artifact concept (GitHub Releases
/ release tag) so editors can implement a pinned URL and add checksum/signature
verification instructions instead of piping from main.

Comment thread mintlify-docs/installation.mdx Outdated
Comment thread packages/cli/package.json
Comment on lines +33 to +34
"build": "npm run clean && tsc && node ./scripts/copyReportApp.mjs && node -e \"require('fs').copyFileSync('package.json','dist/package.json')\"",
"clean": "rm -rf dist proto tsconfig.tsbuildinfo",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Make the clean step cross-platform.

Line 34 uses rm -rf, which breaks npm run build in the default Windows npm shell. That’s a real regression now that the CLI advertises Windows support.

Suggested fix
   "scripts": {
     "build": "npm run clean && tsc && node ./scripts/copyReportApp.mjs && node -e \"require('fs').copyFileSync('package.json','dist/package.json')\"",
-    "clean": "rm -rf dist proto tsconfig.tsbuildinfo",
+    "clean": "node -e \"const fs=require('fs'); for (const p of ['dist','proto','tsconfig.tsbuildinfo']) fs.rmSync(p,{recursive:true,force:true});\"",
     "start": "tsx bin/finalrun.ts",
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"build": "npm run clean && tsc && node ./scripts/copyReportApp.mjs && node -e \"require('fs').copyFileSync('package.json','dist/package.json')\"",
"clean": "rm -rf dist proto tsconfig.tsbuildinfo",
"build": "npm run clean && tsc && node ./scripts/copyReportApp.mjs && node -e \"require('fs').copyFileSync('package.json','dist/package.json')\"",
"clean": "node -e \"const fs=require('fs'); for (const p of ['dist','proto','tsconfig.tsbuildinfo']) fs.rmSync(p,{recursive:true,force:true});\"",
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/cli/package.json` around lines 33 - 34, The "clean" npm script uses
a Unix-only command ("rm -rf") which breaks on Windows; replace it with a
cross-platform remover and update package.json scripts accordingly: add a dev
dependency like "rimraf" (or use "node -e" fs.rmdirSync with recursive true) and
change the "clean" script value to use rimraf (e.g., "rimraf dist proto
tsconfig.tsbuildinfo") so the existing "build" script that calls "npm run clean"
works on all platforms; make the change to the "clean" script entry in
package.json and ensure the devDependency is added so CI and local devs can run
it.

Comment on lines +71 to +77
try {
writeJson(response, 200, await loadReportRunManifestViewModel(runId, context));
} catch (error) {
writeJson(response, 404, {
status: 'error',
message: error instanceof Error ? error.message : String(error),
});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid mapping all run-manifest loader failures to 404.

The current catch block converts every exception to a 404, which hides real server/data errors as “not found.” Restrict 404 mapping to the specific not-found error type and return 500 for unexpected failures.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/cli/src/reportServer.ts` around lines 71 - 77, The catch block
currently maps every failure from loadReportRunManifestViewModel to a 404;
change it so only the specific "not found" error maps to 404 and all other
errors return 500. Concretely, inside the catch for writeJson(...) /
loadReportRunManifestViewModel(runId, context) detect the not-found condition
(e.g. error instanceof NotFoundError or error.code === 'NOT_FOUND' or whatever
project-specific sentinel your loaders use), and call writeJson(response, 404,
{status: 'error', message: error.message}) only for that case; for all other
exceptions call writeJson(response, 500, {status: 'error', message: error
instanceof Error ? error.message : String(error)}) so real server/data errors
are surfaced correctly.

Comment on lines +34 to +60
export interface LLMCallTrace {
/** AI provider: 'openai' | 'google' | 'anthropic'. */
provider: string;
/** Full model name, e.g. 'gpt-4.1-mini', 'gemini-2.0-flash'. */
model: string;
/** Logical feature the call served: 'planner', 'grounder', 'visual_grounder', etc. */
feature: string;
/** Full prompt as the provider saw it — array of role/content messages (includes any base64 images inline). */
prompt: unknown;
/** Raw model response text. */
completion: string;
/** Normalized token counts (Langfuse canonical names — input/output/total). */
usage: {
input: number;
output: number;
total: number;
/** Only present if the provider reported cache-read input tokens > 0. */
input_cached_tokens?: number;
};
/** ISO-8601 timestamp when the call started. */
startedAt: string;
/** ISO-8601 timestamp when the call returned or errored. */
completedAt: string;
/** Wall-clock duration of the LLM call in ms. */
durationMs: number;
/** Provider error message, if the call threw. */
statusMessage?: string;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Keep raw prompt/completion payloads out of telemetry by default.

This trace type makes it easy to ship secrets, PII, and base64 images verbatim to observability backends, which widens the compliance scope and can balloon payload size. Make raw capture opt-in or redact before exporting.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/goal-executor/src/trace.ts` around lines 34 - 60, The LLMCallTrace
interface currently includes raw prompt and completion payloads (prompt,
completion) which can leak PII/secrets and large base64 images; make raw capture
opt-in and export redacted defaults by marking prompt and completion as
optional/removed from the default shape and adding
sanitizedPrompt/sanitizedCompletion (string) fields, then gate inclusion of raw
prompt/completion behind an explicit runtime flag (e.g., ENABLE_RAW_LLM_TRACING)
or an optIn parameter and ensure any trace exporter/serializer checks that flag
and performs redaction (strip base64 images, trim/limit length, remove PII)
before sending; update places that construct LLMCallTrace to populate sanitized*
by default and only attach raw prompt/completion when the opt-in flag is true.

Comment on lines +40 to +102
function scopeCss(src) {
const lines = src.split('\n');
const outLines = [];
// Buffer selector lines until we hit the `{` that closes the selector list.
// This handles both `selector {` on one line and multi-line comma lists.
let selectorBuf = '';

for (let i = 0; i < lines.length; i++) {
const line = lines[i];
const trimmed = line.trim();

if (selectorBuf) {
// Accumulating a multi-line selector list.
selectorBuf += ' ' + trimmed;
if (trimmed.endsWith('{')) {
outLines.push(prefixSelectorLine(selectorBuf));
selectorBuf = '';
}
continue;
}

// Start of an at-rule block — pass through untouched.
if (trimmed.startsWith('@')) {
outLines.push(line);
continue;
}

// Closing brace, blank, comment, or declaration inside a rule — pass through.
if (
!trimmed ||
trimmed.startsWith('}') ||
trimmed.startsWith('/*') ||
trimmed.startsWith('*') ||
!trimmed.endsWith('{') && !trimmed.endsWith(',')
) {
outLines.push(line);
continue;
}

// Single-line selector (`foo {`) or start of multi-line (`foo,`).
if (trimmed.endsWith('{')) {
outLines.push(prefixSelectorLine(trimmed));
} else {
// Line ends with `,` — start buffering.
selectorBuf = trimmed;
}
}

return outLines.join('\n');
}

function prefixSelectorLine(selectorLine) {
// Strip trailing `{`, split by comma, prefix each, rejoin.
const open = selectorLine.lastIndexOf('{');
const selectors = selectorLine.slice(0, open).trim();
const prefixed = selectors
.split(',')
.map((s) => s.trim())
.filter(Boolean)
.map((s) => `${PREFIX}${s}`)
.join(',\n');
return `${prefixed} {`;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use a real CSS parser here.

This line-based heuristic will mis-scope common selectors like :root and *, and it will also rewrite @keyframes frames as if they were normal selectors. That can leak styles or break animations. A parser-based transform is much safer than patching these cases piecemeal.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/report-web/scripts/scope-css.mjs` around lines 40 - 102, The current
line-based scopeCss implementation incorrectly rewrites complex selectors and
at-rule frames; replace the ad-hoc logic in scopeCss and prefixSelectorLine with
a proper CSS AST transform using a parser (e.g., postcss with
postcss-selector-parser or css-tree): parse the stylesheet, walk rules and only
prefix selectors on Rule nodes (skip AtRule nodes like `@keyframes`, `@font-face`
and their child nodes, and skip pseudo-only selectors like :root and the
universal selector when appropriate), update selector strings via a selector AST
to add PREFIX to each simple selector while preserving combinators, media
queries, comments and formatting, then stringify the AST back to CSS. Ensure
PREFIX is applied consistently and keep existing behavior for untouched lines by
returning the parser-produced CSS.

Comment on lines +96 to +105
const params = useParams<{ runId: string }>();
const navigate = useNavigate();
const runId = params.runId ?? '';

const state = useAsyncResource(() => dataSource.fetchRun(runId), [runId]);

if (!runId) {
return <ErrorShell title="Missing run id" message="No run id was provided in the URL." />;
}
if (state.status === 'pending') return <LoadingShell />;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid firing fetchRun('') when runId is missing.

The current order still invokes the async loader for an empty runId, then renders the “Missing run id” shell. This causes unnecessary network/error traffic.

✅ Suggested fix (add `enabled` flag)
 function RunDetailRoute({
@@
   const runId = params.runId ?? '';
-
-  const state = useAsyncResource(() => dataSource.fetchRun(runId), [runId]);
+  const hasRunId = runId.length > 0;
+  const state = useAsyncResource(() => dataSource.fetchRun(runId), [runId], hasRunId);
 
-  if (!runId) {
+  if (!hasRunId) {
     return <ErrorShell title="Missing run id" message="No run id was provided in the URL." />;
   }
@@
-function useAsyncResource<T>(load: () => Promise<T>, deps: unknown[]): AsyncState<T> {
+function useAsyncResource<T>(load: () => Promise<T>, deps: unknown[], enabled = true): AsyncState<T> {
   const [state, setState] = useState<AsyncState<T>>({ status: 'pending' });
 
   useEffect(() => {
+    if (!enabled) {
+      return;
+    }
     let cancelled = false;

Also applies to: 124-145

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/report-web/src/routes/StandaloneReportApp.tsx` around lines 96 -
105, The async loader is invoked with an empty runId because useAsyncResource(()
=> dataSource.fetchRun(runId), [runId]) is called unconditionally; change it to
only run when runId is present by either moving the runId empty-check before
calling useAsyncResource or by using the hook's "enabled" (or equivalent) flag
to disable execution when runId === '' so dataSource.fetchRun is not called with
an empty string; update the instances around useAsyncResource / state (and the
second occurrence noted at lines ~124-145) to reference runId and avoid firing
the fetch until runId is truthy.

Comment on lines +136 to +167
export function selectStep(testId: string, stepIndex: number): void {
const test = testMap[testId];
const step = test?.steps?.[stepIndex];
const container = document.querySelector('[data-step-detail="' + testId + '"]') as HTMLElement | null;
if (!container || !step) {
return;
}

setSelectedStep(testId, stepIndex);
syncRecording(container, test, step);
}

function setSelectedStep(testId: string, stepIndex: number): void {
for (const button of document.querySelectorAll('[data-test-id="' + testId + '"][data-step-index]')) {
button.classList.toggle('is-selected', Number((button as HTMLElement).dataset.stepIndex) === stepIndex);
}
}

function selectNearestStepForTime(testId: string, targetSeconds: number): void {
const test = testMap[testId];
if (!test) {
return;
}

const nearestStepIndex = findNearestStepIndex(test, targetSeconds);
if (nearestStepIndex === null) {
return;
}

const step = test.steps[nearestStepIndex];
const container = document.querySelector('[data-step-detail="' + testId + '"]') as HTMLElement | null;
if (!container || !step) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Escape testId before using it in CSS selectors.

Direct interpolation into selectors can break when IDs contain quotes/special characters, causing selection logic to fail.

🛠️ Suggested fix
 export function selectStep(testId: string, stepIndex: number): void {
   const test = testMap[testId];
   const step = test?.steps?.[stepIndex];
-  const container = document.querySelector('[data-step-detail="' + testId + '"]') as HTMLElement | null;
+  const safeTestId = CSS.escape(testId);
+  const container = document.querySelector(`[data-step-detail="${safeTestId}"]`) as HTMLElement | null;
@@
 function setSelectedStep(testId: string, stepIndex: number): void {
-  for (const button of document.querySelectorAll('[data-test-id="' + testId + '"][data-step-index]')) {
+  const safeTestId = CSS.escape(testId);
+  for (const button of document.querySelectorAll(`[data-test-id="${safeTestId}"][data-step-index]`)) {
     button.classList.toggle('is-selected', Number((button as HTMLElement).dataset.stepIndex) === stepIndex);
   }
 }

Also applies to: 149-150

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/report-web/src/ui/client/runDetailController.ts` around lines 136 -
167, The selectors in selectStep, setSelectedStep and selectNearestStepForTime
interpolate raw testId into querySelector/querySelectorAll which breaks for IDs
with quotes/special characters; update those calls to escape testId (use
CSS.escape(testId) or a small utility wrapper) before building selectors so
'[data-step-detail="' + CSS.escape(testId) + '"]' and '[data-test-id="' +
CSS.escape(testId) + '"][data-step-index]' are used; apply the same change to
all occurrences around the querySelector/querySelectorAll calls in selectStep,
setSelectedStep and selectNearestStepForTime.

Comment on lines +18 to +19
style={inlineStyleFromString(summaryIconStyle(tone))}
dangerouslySetInnerHTML={{ __html: iconSvg }}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Don't inject raw SVG HTML from a public prop.

dangerouslySetInnerHTML is an XSS boundary here. Unless every caller is trusted, iconSvg can carry arbitrary markup into the page. Prefer a trusted React icon node or sanitize the SVG before it reaches this component.

🧰 Tools
🪛 ast-grep (0.42.1)

[warning] 18-18: Usage of dangerouslySetInnerHTML detected. This bypasses React's built-in XSS protection. Always sanitize HTML content using libraries like DOMPurify before injecting it into the DOM to prevent XSS attacks.
Context: dangerouslySetInnerHTML
Note: [CWE-79] Improper Neutralization of Input During Web Page Generation [REFERENCES]
- https://reactjs.org/docs/dom-elements.html#dangerouslysetinnerhtml
- https://cwe.mitre.org/data/definitions/79.html

(react-unsafe-html-injection)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/report-web/src/ui/components/SummaryCard.tsx` around lines 18 - 19,
The component is injecting raw SVG via dangerouslySetInnerHTML using the iconSvg
prop (used alongside summaryIconStyle and inlineStyleFromString), which is an
XSS risk; change the API to accept a trusted ReactNode/icon component instead of
raw HTML (update the SummaryCard prop type and callers to pass a React element)
or, if you must accept strings, sanitize iconSvg inside SummaryCard (e.g., with
a vetted sanitizer like DOMPurify) before rendering and remove
dangerouslySetInnerHTML usage; ensure summaryIconStyle/inlineStyleFromString
still apply to the rendered element.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants