Skip to content

feat(frontend): Run-on modes in the evaluator creation drawer (shared controls)#4557

Merged
mmabrouk merged 4 commits into
release/v0.102.0from
fe-feat/evaluator-drawer-run-on
Jun 8, 2026
Merged

feat(frontend): Run-on modes in the evaluator creation drawer (shared controls)#4557
mmabrouk merged 4 commits into
release/v0.102.0from
fe-feat/evaluator-drawer-run-on

Conversation

@mmabrouk

@mmabrouk mmabrouk commented Jun 5, 2026

Copy link
Copy Markdown
Member

Why

The Run on selector (test case / app output / trace) was only wired into the full-page evaluator playground. The evaluator-creation drawer still hardcoded runDisabled={!hasAppConnected} and only showed the test-set dropdown after an app was connected — so in the drawer you were forced to pick an app even when you wanted to run the evaluator directly on a test case. The drawer had silently drifted out of sync with the page.

What

Rather than paste the run-on wiring into the drawer (a fourth copy), this extracts the logic the page and drawer were already duplicating and shares it:

  • useEvaluatorRunControls() — one hook for the app adapter, app-select handler, run-on mode + handlePickRunOn, and the run gate (runDisabled = runOnMode === "app" && !hasAppConnected).
  • EvaluatorRunControls — the run-on selector + app picker + disconnect affordance + test-set dropdown, as one cluster used by both the page header and the drawer header, so they can't diverge again.

Result:

  • Page: behavior-preserving (just sources its controls from the shared hook/cluster).
  • Drawer: gains all three run-on modes, the run-on selector, a disconnect affordance, and an always-available test-set dropdown. Test-case mode now runs without forcing an app — the bug is fixed.
  • Removes the appWorkflowAdapter / handleAppSelect / evaluator-node-lookup triplication across the page body, drawer header, and drawer body.

Net: 218 insertions / 274 deletions across 5 files (2 new, 3 slimmed).

Notes

  • runOnMode stays persisted per project (shared by page and drawer); the per-evaluator question is tracked separately for a later PR, as discussed.
  • runDisabled only manifests where the run panel renders (the page and the expanded drawer); the collapsed/config-only drawer ignores it, unchanged.

Stacked on

Based on fe-fix/app-workflow-router-unification-regression-fix (the merged evaluator-playground branch, which already contains the page-side run-on feature from #4553).

Test plan

  • Open the New Evaluation flow → create-evaluator drawer → switch Run on to "Run directly on a test case": the test-case editor is usable and runs without selecting an app.
  • Switch to "Run on an app output" with no app: the run panel shows the "Select an app" empty state; pick an app → it runs.
  • Confirm the full-page evaluator playground is unchanged (modes, default, dark mode, disconnect).

The Run-on selector (test case / app output / trace) was only wired into the
full-page evaluator playground. The evaluator-creation drawer still hardcoded
`runDisabled={!hasAppConnected}` and only showed the test-set dropdown after an
app was connected, so it forced the user to pick an app even when they wanted to
run the evaluator directly on a test case.

Rather than copy the run-on wiring into the drawer (a fourth duplicate), extract
the shared logic the page and drawer were already duplicating:

- useEvaluatorRunControls(): app adapter, app-select handler, run-on mode +
  handlePickRunOn, and the runDisabled gate (runOnMode === 'app' && !appConnected).
- EvaluatorRunControls: the run-on selector + app picker + disconnect + test-set
  cluster, shared by the page header and the drawer header so they can't drift.

The page is behavior-preserving; the drawer gains all three modes, the run-on
selector, a disconnect affordance, and an always-available test-set dropdown.
This also removes the adapter/handleAppSelect/evaluator-node triplication across
the page body, drawer header, and drawer body.
@vercel

vercel Bot commented Jun 5, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agenta-documentation Ready Ready Preview, Comment Jun 8, 2026 12:10pm

Request Review

@dosubot dosubot Bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Jun 5, 2026
@coderabbitai

coderabbitai Bot commented Jun 5, 2026

Copy link
Copy Markdown

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: b24abe20-60b2-4159-a643-79b5089dea82

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fe-feat/evaluator-drawer-run-on

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Railway Preview Environment

Preview URL https://gateway-production-d7bf.up.railway.app/w
Project agenta-oss-pr-4557
Image tag pr-4557-1bda40a
Status Deployed
Railway logs Open logs
Workflow logs View workflow run
Updated at 2026-06-05T13:54:26.050Z

The creation drawer renders inside EvaluationRunsTableStoreProvider, a scoped
jotai store that mirrors only a handful of global atoms. The playground state,
however, runs on the default store (the playground package uses
getDefaultStore() throughout). So in the drawer the run-on mode was read/written
in the scoped store while the playground lived in the default store — the two
split, and switching to test-case mode never reached the run panel: it stayed
stuck on the 'Select an app' empty state.

Read and write all run-on / playground atoms through getDefaultStore() in
useEvaluatorRunControls, mirroring the existing workaround in
usePreviewVariantConfig and TestsetCells. On the full page (no scoped store)
this is a no-op; in the drawer it aligns run-on state with the playground so
test-case mode shows the inputs/outputs as it does on the page.
@dosubot dosubot Bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Jun 5, 2026
@ardaerzin ardaerzin self-requested a review June 8, 2026 08:52
@mmabrouk mmabrouk changed the base branch from fe-fix/app-workflow-router-unification-regression-fix to release/v0.102.0 June 8, 2026 09:08
@mmabrouk mmabrouk marked this pull request as draft June 8, 2026 09:09
@mmabrouk mmabrouk marked this pull request as ready for review June 8, 2026 09:10
The evaluator drawer rendered by WorkflowRevisionDrawerWrapper reimplemented
the run panel gate as `runDisabled={!hasAppConnected}`, ignoring the run-on
mode. So switching its Run-on selector to 'test case' updated the header while
the panel kept showing the 'Select an app' empty state and demanding an app —
the page and creation drawer respected the mode, only this third surface didn't.

Route it through the shared useEvaluatorRunControls hook (+ SelectAppEmptyState
and the prop-less EvaluatorPlaygroundHeader), the same wiring the page and the
creation drawer use, so the gate is `runOnMode === 'app' && !hasAppConnected`
everywhere and the three surfaces can't drift again. Removes this drawer's
duplicated app adapter / app-select / run-gate logic.

Also drop the getDefaultStore() patch from useEvaluatorRunControls: runtime
debugging proved these surfaces are not in a scoped store (the drawer that was
broken is WorkflowRevisionDrawerWrapper, not the scoped-store CreateEvaluator
drawer), so the override was a no-op based on a wrong hypothesis.
@mmabrouk mmabrouk merged commit fa7b3be into release/v0.102.0 Jun 8, 2026
19 of 20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Frontend size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants