Skip to content

Commit 62efcfa

Browse files
committed
feat: add shared metric contract for scorer functions
Expose MetricInput -> MetricResult types and adapt decorated scorers via to_metric() so Evaluator OSS scorers can share a runtime contract with platform integrations while preserving BYOB scorer compatibility.
1 parent d9337e6 commit 62efcfa

9 files changed

Lines changed: 1344 additions & 44 deletions

File tree

CHANGELOG.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,13 @@
22

33
## 0.13.0 (unreleased)
44

5+
### Shared Metric Contract
6+
7+
- Added public `MetricInput -> MetricResult` scorer/metric runtime types and `ScorerFunctionMetric`.
8+
- Extended BYOB `@scorer` with typed scorer metadata and `to_metric()` while preserving current dict scorer behavior.
9+
- Added optional `config_schema` support for typed scorer configs while keeping raw dict configs as the default.
10+
- Split typed scorer config binding into strict `bind(config=ConfigModel(...))` and coercive `bind_raw_config(config={...})` paths.
11+
512
### Adapter Proxy (Breaking — replaces LiteLLM)
613

714
- **LiteLLM removed**: The `litellm` dependency, `proxy` and `proxy-full` extras, and `litellm_settings` config field are all removed. The adapter proxy is now built-in with zero external proxy dependencies.

src/nemo_evaluator/__init__.py

Lines changed: 47 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -16,22 +16,30 @@
1616

1717
__version__ = "0.12.0"
1818

19+
from nemo_evaluator.engine.eval_loop import run_evaluation
20+
from nemo_evaluator.engine.model_client import ModelClient
1921
from nemo_evaluator.environments.base import EvalEnvironment, SeedResult, VerifyResult
2022
from nemo_evaluator.environments.custom import benchmark, scorer
2123
from nemo_evaluator.environments.registry import get_environment, list_environments, load_benchmark_file, register
22-
from nemo_evaluator.engine.eval_loop import run_evaluation
23-
from nemo_evaluator.engine.model_client import ModelClient
24-
from nemo_evaluator.solvers import (
25-
ChatSolver,
26-
CompletionSolver,
27-
NatSolver,
28-
OpenClawSolver,
29-
Solver,
30-
SolveResult,
31-
VLMSolver,
32-
)
3324
from nemo_evaluator.scoring import (
25+
BooleanValue,
26+
CandidateOutput,
27+
ContinuousScore,
28+
DatasetRow,
29+
DiscreteScore,
30+
Label,
31+
Metric,
32+
MetricDescriptor,
33+
MetricInput,
34+
MetricOutput,
35+
MetricOutputSpec,
36+
MetricResult,
37+
MetricScorerFunction,
38+
ScorerCallable,
39+
ScorerConfig,
40+
ScorerFunctionMetric,
3441
ScorerInput,
42+
ScorerReturn,
3543
answer_line,
3644
code_sandbox,
3745
code_sandbox_async,
@@ -40,6 +48,16 @@
4048
multichoice_regex,
4149
needs_judge,
4250
numeric_match,
51+
score_names_from_output_spec,
52+
)
53+
from nemo_evaluator.solvers import (
54+
ChatSolver,
55+
CompletionSolver,
56+
NatSolver,
57+
OpenClawSolver,
58+
Solver,
59+
SolveResult,
60+
VLMSolver,
4361
)
4462

4563
__all__ = [
@@ -65,6 +83,24 @@
6583
"benchmark",
6684
"scorer",
6785
"ScorerInput",
86+
"Metric",
87+
"BooleanValue",
88+
"DatasetRow",
89+
"CandidateOutput",
90+
"ContinuousScore",
91+
"DiscreteScore",
92+
"Label",
93+
"MetricInput",
94+
"MetricOutput",
95+
"MetricOutputSpec",
96+
"MetricDescriptor",
97+
"MetricResult",
98+
"MetricScorerFunction",
99+
"ScorerCallable",
100+
"ScorerConfig",
101+
"ScorerFunctionMetric",
102+
"ScorerReturn",
103+
"score_names_from_output_spec",
68104
# Scoring primitives
69105
"exact_match",
70106
"multichoice_regex",

0 commit comments

Comments
 (0)