[evals] Onboarding flow#2103
Open
miguelg719 wants to merge 3 commits into
Open
Conversation
|
Contributor
There was a problem hiding this comment.
4 issues found across 14 files
Confidence score: 3/5
- There is moderate merge risk because
packages/evals/tui/commands/doctor.tssurfaces raw exception messages to users/JSON output (severity 7/10, high confidence), which can expose unsanitized internal error details. packages/evals/tui/welcomeStatus.tshas a logic/comment mismatch (||vs the stated “all present values are alias-only” invariant), creating a concrete behavior regression risk in status reporting.packages/evals/tui/repl.tsandpackages/evals/tui/commands/doctor.tseach have user-facing consistency issues (first-run onboarding being permanently suppressed in quiet mode, and env snapshot vs probe disagreement forOPENAI_API_KEY).- Pay close attention to
packages/evals/tui/commands/doctor.ts,packages/evals/tui/welcomeStatus.ts, andpackages/evals/tui/repl.ts- sanitize surfaced errors and align boolean/env/onboarding logic with intended behavior.
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/evals/tui/commands/doctor.ts">
<violation number="1" location="packages/evals/tui/commands/doctor.ts:163">
P1: Custom agent: **Exception and error message sanitization**
Raw exception messages are surfaced to users/JSON output without sanitization.</violation>
<violation number="2" location="packages/evals/tui/commands/doctor.ts:395">
P2: The probe reads `process.env.OPENAI_API_KEY` directly, but `snapshotEnv()` also checks the package `.env` file. If the key exists only in `packages/evals/.env`, the doctor will show it as "✓ set" while the probe fails with an auth error because it never sees the actual value.</violation>
</file>
<file name="packages/evals/tui/repl.ts">
<violation number="1" location="packages/evals/tui/repl.ts:103">
P2: `markFirstRunComplete(entryDir)` is called even in `quiet` mode, permanently suppressing the onboarding welcome for future interactive sessions despite the user never having seen it. Consider moving this call inside the `if (!quiet)` block so the marker is only set when the user actually had a chance to see (or dismiss) the welcome.</violation>
</file>
<file name="packages/evals/tui/welcomeStatus.ts">
<violation number="1" location="packages/evals/tui/welcomeStatus.ts:130">
P2: Logic does not match its stated invariant. Using `||` makes `viaAlias` true when *any* present value is alias-only, not when *all* present values are alias-only as the comment specifies. If the intent is "all present BB values come from aliases", replace `||` with a conjunction that checks each present value independently.</violation>
</file>
Architecture diagram
sequenceDiagram
participant CLI as CLI entry
participant Doctor as Doctor Command
participant Welcome as Welcome State
participant REPL as REPL
participant Config as Config RW
participant Env as Env Snapshot
participant Discovery as Task Discovery
participant Build as Build Script
Note over CLI,Build: NEW: REPL first-run onboarding + doctor health command
CLI->>CLI: parse args (--quiet/-q, EVALS_NO_WELCOME)
alt REPL launch (no args or only --quiet/-q flags)
CLI->>REPL: startRepl(entryDir, { quiet })
alt --quiet or EVALS_NO_WELCOME=1
REPL->>REPL: skip all chrome, show prompt directly
else first run
REPL->>Welcome: isFirstRun(entryDir)
Welcome->>Config: readConfig(entryDir) → read _meta
Config-->>Welcome: _meta (or empty)
Welcome-->>REPL: true (no firstRunCompletedAt)
REPL->>Env: snapshotEnv() → check provider+BB+braintrust keys
REPL->>Discovery: discoverTasks(tasksRoot)
Discovery-->>REPL: registry with task count
REPL->>REPL: printExtendedWelcome(health, registry)
else returning user (not first run)
REPL->>Env: snapshotEnv()
alt zero provider keys
REPL->>REPL: renderInlineWarning() → yellow warning line
end
REPL->>REPL: printTipLine()
end
REPL->>Welcome: markFirstRunComplete(entryDir) → write _meta
Welcome->>Config: readConfig + set _meta.firstRunCompletedAt
Config-->>Welcome: config saved
REPL-->>CLI: REPL loop started
else doctor/health command
CLI->>Doctor: handleDoctor(subArgs, entryDir)
Doctor->>Config: readConfig → resolveConfigPath
Doctor->>Env: snapshotEnv() → full key matrix
Doctor->>Discovery: discoverTasks(tasksRoot)
Doctor->>Doctor: readStagehandVersion()
Doctor->>Doctor: computeVerdict(keys, config, discovery)
alt verdict = fail
Doctor->>Doctor: reasons = zero providers or missing BB for env=browserbase
else verdict = warn
Doctor->>Doctor: reasons = partial BB or no braintrust
else verdict = ok
Doctor->>Doctor: reasons = []
end
alt --json flag
Doctor->>Doctor: print JSON report (always exit 0)
else human output
Doctor->>Doctor: print formatted report
alt verdict = fail
Doctor-->CLI: exit 1
else
Doctor-->CLI: exit 0
end
end
end
Note over CLI: First-run marker only written after real commands (not help/doctor)
alt command was 'run', 'list', 'config' (non-help), 'new', or unknown target
CLI->>CLI: shouldMarkFirstRun = true
CLI->>CLI: execute command
CLI->>Welcome: markFirstRunComplete(entryDir) [in finally block]
else command was help invocation or doctor
CLI->>CLI: shouldMarkFirstRun = false → skip marker
end
Note over Build: Build script preserves _meta across rebuilds
Build->>Config: read dist evals.config.json
alt existing._meta present
Build->>Build: merge dist._meta into source config
Build->>Config: write merged config (preserves first-run state)
end
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
why
Improving the evals onboarding experience
what changed
Added a Welcome screen on first run after install on TUI and a net new

evals doctorsubcommand to triage common issues with installation/setuptest plan
Summary by cubic
Adds a first-run onboarding flow to the
evalsREPL and a newevals doctor/healthcommand to check environment, config, and discovery health. Adds a REPL-only quiet mode and persists a_metafirst-run marker inevals.config.jsonso the welcome shows once.New Features
--quiet/-q(REPL-only) orEVALS_NO_WELCOME=1. Banner prints art only. When not showing the welcome, prints a single inline warning only if no provider keys are set; otherwise shows a compact tip._meta.firstRunCompletedAt(with version) toevals.config.json; preserved across CLI rebuilds. REPL marks completion pre-prompt unless--quiet; argv marks only after real commands. Help-only paths anddoctor(including nested help underconfig/experiments) do not mark.evals doctor/health: human output and--json(always exits 0); verdictsok|warn|failwith exit codes0|0|1. Hidden--probeverifies an OpenAI key. Report includes runtime (Node, Stagehand, mode), config (path/defaults/core), task discovery, and a key matrix (OpenAI/Anthropic/Google/Browserbase/Braintrust) with source provenance and Browserbase alias hints. Listed in help and callable from the REPL.Migration
evals doctorto verify setup. SetOPENAI_API_KEY/ANTHROPIC_API_KEY/GOOGLE_GENERATIVE_AI_API_KEY(orGEMINI_API_KEY) andBRAINTRUST_API_KEY; forenv=browserbase, setBROWSERBASE_API_KEYandBROWSERBASE_PROJECT_ID(orBB_*).--quietor setEVALS_NO_WELCOME=1.Written for commit 372afd3. Summary will update on new commits.