Skip to content

Commit 4b8f615

Browse files
committed
feat: add shared metric contract for scorer functions
Expose MetricInput -> MetricResult types and adapt decorated scorers via to_metric() so Evaluator OSS scorers can share a runtime contract with platform integrations while preserving BYOB scorer compatibility.
1 parent d9337e6 commit 4b8f615

9 files changed

Lines changed: 1224 additions & 28 deletions

File tree

CHANGELOG.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,13 @@
22

33
## 0.13.0 (unreleased)
44

5+
### Shared Metric Contract
6+
7+
- Added public `MetricInput -> MetricResult` scorer/metric runtime types and `ScorerFunctionMetric`.
8+
- Extended BYOB `@scorer` with typed scorer metadata and `to_metric()` while preserving current dict scorer behavior.
9+
- Added optional `config_schema` support for typed scorer configs while keeping raw dict configs as the default.
10+
- Split typed scorer config binding into strict `bind(config=ConfigModel(...))` and coercive `bind_raw_config(config={...})` paths.
11+
512
### Adapter Proxy (Breaking — replaces LiteLLM)
613

714
- **LiteLLM removed**: The `litellm` dependency, `proxy` and `proxy-full` extras, and `litellm_settings` config field are all removed. The adapter proxy is now built-in with zero external proxy dependencies.

src/nemo_evaluator/__init__.py

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,22 @@
3131
VLMSolver,
3232
)
3333
from nemo_evaluator.scoring import (
34+
AnnotationOutputSpec,
35+
CandidateOutput,
36+
DatasetRow,
37+
Metric,
38+
MetricDescriptor,
39+
MetricInput,
40+
MetricOutputSpec,
41+
MetricResult,
42+
MetricScorerFunction,
3443
ScorerInput,
44+
ScorerCallable,
45+
ScorerConfig,
46+
ScorerFunctionMetric,
47+
ScorerReturn,
48+
ScoreOutputSpec,
49+
ScoreValue,
3550
answer_line,
3651
code_sandbox,
3752
code_sandbox_async,
@@ -65,6 +80,21 @@
6580
"benchmark",
6681
"scorer",
6782
"ScorerInput",
83+
"Metric",
84+
"DatasetRow",
85+
"CandidateOutput",
86+
"MetricInput",
87+
"ScoreOutputSpec",
88+
"AnnotationOutputSpec",
89+
"MetricOutputSpec",
90+
"MetricDescriptor",
91+
"ScoreValue",
92+
"MetricResult",
93+
"MetricScorerFunction",
94+
"ScorerCallable",
95+
"ScorerConfig",
96+
"ScorerFunctionMetric",
97+
"ScorerReturn",
6898
# Scoring primitives
6999
"exact_match",
70100
"multichoice_regex",

0 commit comments

Comments
 (0)