Skip to content

[evals] Onboarding flow#2103

Open
miguelg719 wants to merge 3 commits into
mainfrom
miguelgonzalez/stg-1901-evals-onboarding-and-readme
Open

[evals] Onboarding flow#2103
miguelg719 wants to merge 3 commits into
mainfrom
miguelgonzalez/stg-1901-evals-onboarding-and-readme

Conversation

@miguelg719
Copy link
Copy Markdown
Collaborator

@miguelg719 miguelg719 commented May 11, 2026

why

Improving the evals onboarding experience

what changed

Added a Welcome screen on first run after install on TUI and a net new evals doctor subcommand to triage common issues with installation/setup
Screenshot 2026-05-11 at 12 08 41 PM

test plan

  • 500+ LOC changes of new tests
  • Existing test suite passess

Summary by cubic

Adds a first-run onboarding flow to the evals REPL and a new evals doctor/health command to check environment, config, and discovery health. Adds a REPL-only quiet mode and persists a _meta first-run marker in evals.config.json so the welcome shows once.

  • New Features

    • REPL onboarding: one-time extended welcome; suppress with --quiet/-q (REPL-only) or EVALS_NO_WELCOME=1. Banner prints art only. When not showing the welcome, prints a single inline warning only if no provider keys are set; otherwise shows a compact tip.
    • First-run state: writes _meta.firstRunCompletedAt (with version) to evals.config.json; preserved across CLI rebuilds. REPL marks completion pre-prompt unless --quiet; argv marks only after real commands. Help-only paths and doctor (including nested help under config/experiments) do not mark.
    • evals doctor/health: human output and --json (always exits 0); verdicts ok|warn|fail with exit codes 0|0|1. Hidden --probe verifies an OpenAI key. Report includes runtime (Node, Stagehand, mode), config (path/defaults/core), task discovery, and a key matrix (OpenAI/Anthropic/Google/Browserbase/Braintrust) with source provenance and Browserbase alias hints. Listed in help and callable from the REPL.
  • Migration

    • Run evals doctor to verify setup. Set OPENAI_API_KEY/ANTHROPIC_API_KEY/GOOGLE_GENERATIVE_AI_API_KEY (or GEMINI_API_KEY) and BRAINTRUST_API_KEY; for env=browserbase, set BROWSERBASE_API_KEY and BROWSERBASE_PROJECT_ID (or BB_*).
    • To skip onboarding on launch, use --quiet or set EVALS_NO_WELCOME=1.

Written for commit 372afd3. Summary will update on new commits.

@miguelg719 miguelg719 marked this pull request as ready for review May 11, 2026 05:20
@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 11, 2026

⚠️ No Changeset found

Latest commit: 372afd3

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 issues found across 14 files

Confidence score: 3/5

  • There is moderate merge risk because packages/evals/tui/commands/doctor.ts surfaces raw exception messages to users/JSON output (severity 7/10, high confidence), which can expose unsanitized internal error details.
  • packages/evals/tui/welcomeStatus.ts has a logic/comment mismatch (|| vs the stated “all present values are alias-only” invariant), creating a concrete behavior regression risk in status reporting.
  • packages/evals/tui/repl.ts and packages/evals/tui/commands/doctor.ts each have user-facing consistency issues (first-run onboarding being permanently suppressed in quiet mode, and env snapshot vs probe disagreement for OPENAI_API_KEY).
  • Pay close attention to packages/evals/tui/commands/doctor.ts, packages/evals/tui/welcomeStatus.ts, and packages/evals/tui/repl.ts - sanitize surfaced errors and align boolean/env/onboarding logic with intended behavior.
Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/evals/tui/commands/doctor.ts">

<violation number="1" location="packages/evals/tui/commands/doctor.ts:163">
P1: Custom agent: **Exception and error message sanitization**

Raw exception messages are surfaced to users/JSON output without sanitization.</violation>

<violation number="2" location="packages/evals/tui/commands/doctor.ts:395">
P2: The probe reads `process.env.OPENAI_API_KEY` directly, but `snapshotEnv()` also checks the package `.env` file. If the key exists only in `packages/evals/.env`, the doctor will show it as "✓ set" while the probe fails with an auth error because it never sees the actual value.</violation>
</file>

<file name="packages/evals/tui/repl.ts">

<violation number="1" location="packages/evals/tui/repl.ts:103">
P2: `markFirstRunComplete(entryDir)` is called even in `quiet` mode, permanently suppressing the onboarding welcome for future interactive sessions despite the user never having seen it. Consider moving this call inside the `if (!quiet)` block so the marker is only set when the user actually had a chance to see (or dismiss) the welcome.</violation>
</file>

<file name="packages/evals/tui/welcomeStatus.ts">

<violation number="1" location="packages/evals/tui/welcomeStatus.ts:130">
P2: Logic does not match its stated invariant. Using `||` makes `viaAlias` true when *any* present value is alias-only, not when *all* present values are alias-only as the comment specifies. If the intent is "all present BB values come from aliases", replace `||` with a conjunction that checks each present value independently.</violation>
</file>
Architecture diagram
sequenceDiagram
    participant CLI as CLI entry
    participant Doctor as Doctor Command
    participant Welcome as Welcome State
    participant REPL as REPL
    participant Config as Config RW
    participant Env as Env Snapshot
    participant Discovery as Task Discovery
    participant Build as Build Script

    Note over CLI,Build: NEW: REPL first-run onboarding + doctor health command

    CLI->>CLI: parse args (--quiet/-q, EVALS_NO_WELCOME)

    alt REPL launch (no args or only --quiet/-q flags)
        CLI->>REPL: startRepl(entryDir, { quiet })
        alt --quiet or EVALS_NO_WELCOME=1
            REPL->>REPL: skip all chrome, show prompt directly
        else first run
            REPL->>Welcome: isFirstRun(entryDir)
            Welcome->>Config: readConfig(entryDir) → read _meta
            Config-->>Welcome: _meta (or empty)
            Welcome-->>REPL: true (no firstRunCompletedAt)
            REPL->>Env: snapshotEnv() → check provider+BB+braintrust keys
            REPL->>Discovery: discoverTasks(tasksRoot)
            Discovery-->>REPL: registry with task count
            REPL->>REPL: printExtendedWelcome(health, registry)
        else returning user (not first run)
            REPL->>Env: snapshotEnv()
            alt zero provider keys
                REPL->>REPL: renderInlineWarning() → yellow warning line
            end
            REPL->>REPL: printTipLine()
        end
        REPL->>Welcome: markFirstRunComplete(entryDir) → write _meta
        Welcome->>Config: readConfig + set _meta.firstRunCompletedAt
        Config-->>Welcome: config saved
        REPL-->>CLI: REPL loop started
    else doctor/health command
        CLI->>Doctor: handleDoctor(subArgs, entryDir)
        Doctor->>Config: readConfig → resolveConfigPath
        Doctor->>Env: snapshotEnv() → full key matrix
        Doctor->>Discovery: discoverTasks(tasksRoot)
        Doctor->>Doctor: readStagehandVersion()
        Doctor->>Doctor: computeVerdict(keys, config, discovery)
        alt verdict = fail
            Doctor->>Doctor: reasons = zero providers or missing BB for env=browserbase
        else verdict = warn
            Doctor->>Doctor: reasons = partial BB or no braintrust
        else verdict = ok
            Doctor->>Doctor: reasons = []
        end
        alt --json flag
            Doctor->>Doctor: print JSON report (always exit 0)
        else human output
            Doctor->>Doctor: print formatted report
            alt verdict = fail
                Doctor-->CLI: exit 1
            else
                Doctor-->CLI: exit 0
            end
        end
    end

    Note over CLI: First-run marker only written after real commands (not help/doctor)

    alt command was 'run', 'list', 'config' (non-help), 'new', or unknown target
        CLI->>CLI: shouldMarkFirstRun = true
        CLI->>CLI: execute command
        CLI->>Welcome: markFirstRunComplete(entryDir) [in finally block]
    else command was help invocation or doctor
        CLI->>CLI: shouldMarkFirstRun = false → skip marker
    end

    Note over Build: Build script preserves _meta across rebuilds

    Build->>Config: read dist evals.config.json
    alt existing._meta present
        Build->>Build: merge dist._meta into source config
        Build->>Config: write merged config (preserves first-run state)
    end
Loading

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.

Comment thread packages/evals/tui/commands/doctor.ts
Comment thread packages/evals/tui/repl.ts Outdated
Comment thread packages/evals/tui/welcomeStatus.ts Outdated
Comment thread packages/evals/tui/commands/doctor.ts Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant