Skip to content

Fix SDK eval breadcrumbs for SDK/custom runs#4578

Closed
Rajesh270712 wants to merge 3 commits into
Agenta-AI:mainfrom
Rajesh270712:reputation/REP-PR-005-sdk-eval-breadcrumb
Closed

Fix SDK eval breadcrumbs for SDK/custom runs#4578
Rajesh270712 wants to merge 3 commits into
Agenta-AI:mainfrom
Rajesh270712:reputation/REP-PR-005-sdk-eval-breadcrumb

Conversation

@Rajesh270712
Copy link
Copy Markdown

Summary

This is a follow-up to #4552 after the maintainer requested the repository template and visual proof.

This PR fixes the SDK/custom eval run details breadcrumb for #4549.

SDK eval result pages could fall back to the Auto Evals breadcrumb because the eval run details route normalized or omitted the custom run kind. The change preserves the shared evaluation run kind through the details page, maps custom to SDK Evals, and keeps SDK/custom runs on the non-human metric-column path while preserving human eval behavior.

Fixes #4549

Testing

Verified locally

  • npx --yes pnpm@11.1.2 exec tsx --tsconfig oss/tsconfig.json --test oss/src/components/EvalRunDetails/utils/evaluationMetricColumns.test.ts - passed, 2 tests, 0 failures.
  • npx --yes pnpm@11.1.2 --filter @agenta/oss lint - passed with Node 26 vs expected Node 24 and Next lint deprecation warnings.
  • npx --yes pnpm@11.1.2 exec prettier --check <changed files> - passed during QA.
  • git diff --check origin/main...HEAD - passed.

Added or updated tests

  • Added oss/src/components/EvalRunDetails/utils/evaluationMetricColumns.test.ts covering metric-column selection for SDK/custom and human evals.

QA follow-up

  • QA approved the branch as ready after branch, diff, identity, focused test, lint, and formatting checks.
  • Full npx --yes pnpm@11.1.2 --filter @agenta/oss types:check still exits 1 on existing repository baseline diagnostics in this environment; no remaining focused diagnostic was tied to the SDK/custom metric helper path.

Demo

This proof shows the patched SDK/custom breadcrumb rendering as SDK Evals and linking back with kind=custom. It is local component-level proof because seeded local auth/API data was unavailable for a full product-session recording.

Checklist

  • I have included a video or screen recording for UI changes, or marked Demo as N/A
  • Relevant tests pass locally
  • Relevant linting and formatting pass locally
  • I have signed the CLA, or I will sign it when the bot prompts me

@dosubot dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. bug Something isn't working Frontend tests labels Jun 8, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 8, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 4870a3b4-bcc8-4527-9463-f73aff49caae

📥 Commits

Reviewing files that changed from the base of the PR and between 98b8a9d and 2df47bf.

📒 Files selected for processing (14)
  • web/oss/src/components/EvalRunDetails/Table.tsx
  • web/oss/src/components/EvalRunDetails/atoms/metricProcessor.ts
  • web/oss/src/components/EvalRunDetails/atoms/table/types.ts
  • web/oss/src/components/EvalRunDetails/components/FocusDrawer.tsx
  • web/oss/src/components/EvalRunDetails/components/Page.tsx
  • web/oss/src/components/EvalRunDetails/components/columnVisibility/ColumnVisibilityPopoverContent.tsx
  • web/oss/src/components/EvalRunDetails/evaluationPreviewTableStore.ts
  • web/oss/src/components/EvalRunDetails/hooks/usePreviewColumns.tsx
  • web/oss/src/components/EvalRunDetails/state/evalType.ts
  • web/oss/src/components/EvalRunDetails/test.tsx
  • web/oss/src/components/EvalRunDetails/utils/buildPreviewColumns.tsx
  • web/oss/src/components/EvalRunDetails/utils/buildSkeletonColumns.ts
  • web/oss/src/components/EvalRunDetails/utils/evaluationMetricColumns.test.ts
  • web/oss/src/components/EvalRunDetails/utils/evaluationMetricColumns.ts

📝 Walkthrough

Summary by CodeRabbit

Release Notes

  • New Features

    • Added support for "custom" evaluation type in evaluation run details.
  • Refactor

    • Standardized metric column selection logic for consistent behavior across evaluation types.
  • Tests

    • Added unit tests for metric column selection functionality.

Walkthrough

This PR consolidates evaluation type handling in the EvalRunDetails component tree by replacing scattered string-literal unions with a shared EvaluationRunKind type, introducing centralized metric column selection utilities, and refactoring components to use typed helpers instead of inline branching on evaluation type.

Changes

Evaluation Type Standardization

Layer / File(s) Summary
Metric column selection utilities
web/oss/src/components/EvalRunDetails/utils/evaluationMetricColumns.ts, ...test.ts
New utility module exports usesHumanMetricColumns, usesAutoMetricColumns, and selectStaticMetricColumnsForEvaluationType to encapsulate metric column selection logic; tests verify selection for custom and human evaluation kinds.
Core type definitions
web/oss/src/components/EvalRunDetails/atoms/table/types.ts, state/evalType.ts, evaluationPreviewTableStore.ts, atoms/metricProcessor.ts
EvaluationTableColumn.visibleFor, PreviewEvaluationType, EvaluationPreviewMeta.evaluationType, and metric processor options all adopt EvaluationRunKind instead of local string unions.
Top-level component props
web/oss/src/components/EvalRunDetails/components/Page.tsx, test.tsx
EvalRunPreviewPage and EvalRunTestPage update their evaluation-type props to EvaluationRunKind; breadcrumb mapping adds support for the custom evaluation kind.
Column processing utilities and hooks
web/oss/src/components/EvalRunDetails/utils/buildPreviewColumns.tsx, buildSkeletonColumns.ts, hooks/usePreviewColumns.tsx
Utility functions and hooks refactored to use EvaluationRunKind types and delegate metric selection to selectStaticMetricColumnsForEvaluationType, removing inline evaluationType branching.
Table and presentation components
web/oss/src/components/EvalRunDetails/Table.tsx, components/columnVisibility/ColumnVisibilityPopoverContent.tsx, components/FocusDrawer.tsx
Table and column-visibility components now use EvaluationRunKind props and call metric selection helpers; FocusDrawer reorganizes imports to source MetricColumnDefinition from shared entities.

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 60.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Fix SDK eval breadcrumbs for SDK/custom runs' clearly and specifically summarizes the main change—fixing breadcrumb rendering for SDK/custom evaluation runs.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, explaining the problem (breadcrumb fallback), solution (preserving evaluation run kind), and testing/verification details.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@mmabrouk
Copy link
Copy Markdown
Member

mmabrouk commented Jun 8, 2026

Thank you @Rajesh270712

The GIF is a local reproduction of the issue outside of Agenta.

Please share a demo of a deployed version of Agenta where the issue is fixed. Unfortunately we don't accept PRs from contributors that did not deploy a version of the software and tested their contributions.

@mmabrouk mmabrouk closed this Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working Frontend size:L This PR changes 100-499 lines, ignoring generated files. tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix SDK eval breadcrumbs showing Auto Evals

2 participants