feat: add shared metric contract for scorer functions by SandyChapman · Pull Request #950 · NVIDIA-NeMo/Evaluator

SandyChapman · 2026-04-29T16:12:15Z

Expose MetricInput -> MetricResult types and adapt decorated scorers via to_metric() so Evaluator OSS scorers can share a runtime contract with platform integrations while preserving BYOB scorer compatibility.

copy-pr-bot · 2026-04-29T16:12:19Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-04-29T16:12:23Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 90a186e5-8aeb-4ff2-ab6f-3920a098e79a

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch schapman/feat/shared-metric-contract

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

SandyChapman · 2026-04-29T16:13:48Z

+        to_metric = getattr(self._defn.scorer_fn, "to_metric", None)
+        if callable(to_metric):
+            metric = cast(ScorerFunctionMetric[ScorerConfig], to_metric()).bind_raw_config(
+                config=self._defn.extra,
+                sandbox=sandbox,
+                target=expected,
+            )
+            metric_input = _metric_input_from_verify(
+                response=response,
+                metadata=meta,
+            )
+            result = await metric.compute_scores(metric_input)
+            return _metric_result_to_verify_result(
+                metric=metric,
+                result=result,
+                benchmark_name=self._defn.name,
+                response=response,
+            )


This is just illustrative of how the verify func could use the Metric version of the scorer.

SandyChapman · 2026-04-29T16:14:42Z

 class BenchmarkDefinition:
    name: str
-    dataset: str | Callable[[], list[dict]]
+    dataset: str | Callable[..., list[dict[str, Any]]]


There's a bit of diff noise in the PR as I address typechecking errors reported by ty and pyright.

SandyChapman · 2026-04-29T16:19:08Z

+) -> Callable[[ScorerCallable[ConfigT]], ScorerCallable[ConfigT]]: ...
+
+
+def scorer(


The adjustments to the scorer decorator provide a couple additional features:

the ability to specify a schema (via a Pydantic BaseModel) that allows validation of inputted config objects.

static type safety of the Config type as well (such that the to_metric func will return a ScorerFunctionMetric with the generic parameter matching the passed config.

structured definition of outputs is required for supporting the to_metric function call as score_names is a required part of the Metric protocol.

metric_type is an optional label for the metric which is also needed to refer to it (for instance) via an API call or in the DB. By default it's generated if not specified, but we provide the ability to manually specify it in case the code gets moved or a module renamed (as the generated name uses the package name).

metric_type to become required since it can introduce bugs

SandyChapman · 2026-04-29T16:21:00Z

+
 @dataclass
-class ScorerInput:
+class ScorerInput(Generic[ConfigT]):


Genericize ScoreInput to allow specifying a strongly typed config object. dict is still accepted.

wprazuch · 2026-04-30T12:59:21Z

+    model_config = ConfigDict(extra="forbid")
+
+    scores: list[ScoreOutputSpec] = Field(min_length=1)
+    annotations: list[AnnotationOutputSpec] = Field(default_factory=list)


todo: check if we can have one type

Expose MetricInput -> MetricResult types and adapt decorated scorers via to_metric() so Evaluator OSS scorers can share a runtime contract with platform integrations while preserving BYOB scorer compatibility.

SandyChapman requested a review from wprazuch April 29, 2026 16:12

github-actions Bot added the tests label Apr 29, 2026

SandyChapman commented Apr 29, 2026

View reviewed changes

SandyChapman force-pushed the schapman/feat/shared-metric-contract branch from 4b8f615 to dd8aeb8 Compare April 29, 2026 18:02

wprazuch reviewed Apr 30, 2026

View reviewed changes

feat: add shared metric contract for scorer functions

62efcfa

Expose MetricInput -> MetricResult types and adapt decorated scorers via to_metric() so Evaluator OSS scorers can share a runtime contract with platform integrations while preserving BYOB scorer compatibility.

SandyChapman force-pushed the schapman/feat/shared-metric-contract branch from dd8aeb8 to 62efcfa Compare April 30, 2026 16:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add shared metric contract for scorer functions#950

feat: add shared metric contract for scorer functions#950
SandyChapman wants to merge 1 commit intodev/0.3.0from
schapman/feat/shared-metric-contract

SandyChapman commented Apr 29, 2026

Uh oh!

copy-pr-bot Bot commented Apr 29, 2026

Uh oh!

coderabbitai Bot commented Apr 29, 2026 •

edited

Loading

Review skipped

Uh oh!

SandyChapman Apr 29, 2026

Uh oh!

SandyChapman Apr 29, 2026

Uh oh!

SandyChapman Apr 29, 2026

Uh oh!

wprazuch Apr 30, 2026

Uh oh!

SandyChapman Apr 29, 2026

Uh oh!

wprazuch Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		) -> Callable[[ScorerCallable[ConfigT]], ScorerCallable[ConfigT]]: ...


		def scorer(

Conversation

SandyChapman commented Apr 29, 2026

Uh oh!

copy-pr-bot Bot commented Apr 29, 2026

Uh oh!

coderabbitai Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

SandyChapman Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

SandyChapman Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

SandyChapman Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

wprazuch Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

SandyChapman Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

wprazuch Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai Bot commented Apr 29, 2026 •

edited

Loading