Commit a394a42
committed
feat(scoring): add metric abstractions for NEL/NMP interop
Introduces src/nemo_evaluator/scoring/contracts.py (~460 LOC) — the shared
contract layer between NEL and downstream metric providers (NMP's
nemo_evaluator_sdk, third-party plugins).
This implements the reshape discussed in steps.md:
* DROPPED from Metric Protocol: metric(item, sample, trace) -> float | bool
Per Sandy Chapman + Voytek Prazuch — redundant with compute_scores.
Concrete classes that keep it as a private helper still satisfy the
Protocol structurally; consumers must not rely on it.
* CHANGED signature: compute_scores now takes a single MetricInput
(aliased to ScorerInput — NEL's native BYOB input dataclass) instead
of a pair of item/sample dicts. Unifies function-style and object-
style runtime inputs.
* ADDED TemplateMetric base class: subclasses declare a Pydantic config
and implement _score(MetricInput) -> float. Default compute_scores
wraps _score in a single-score MetricResult; default score_names
returns [self.type]. Reduces per-metric boilerplate to ~20-30 LOC.
* ADDED @register_metric class decorator + get_metric / list_metrics
lookup helpers. Registers classes by their 'type' identifier (read
from Pydantic field default or plain ClassVar attribute).
* ADDED metric_as_scorer(metric) bridge: adapts an object-style Metric
to NEL's function-style Scorer callable, so object-style metrics can
register in NEL's _SCORER_REGISTRY without glue code. Uses a thread
with a fresh event loop when called inside an existing loop
(notebook-safe).
ERD (narrative):
MetricInput (= ScorerInput)
|-- consumed by --> Scorer: Callable[[MetricInput], dict]
|-- consumed by --> Metric: Protocol(type, compute_scores, score_names)
MetricResult (= MetricOutput)
|<-- returned by --- Metric.compute_scores
TemplateMetric (Pydantic BaseModel, implements Metric)
|-- concrete base; users subclass for ~20-30 LOC metrics
Tests: 24 new tests in tests/test_scoring/test_contracts.py covering
input/output aliases, Pydantic result types with NaN serialization, all
four Protocols (Metric, CorpusMetric, MetricWithSecrets,
MetricWithPreflight), TemplateMetric default + override, register_metric
/ get_metric / list_metrics, and the metric_as_scorer bridge for both
single-score and multi-score metrics. All pass.
Design rationale + migration plan for NMP Platform is in
.claude/tasks/nel_nmp_integration/approach3_design.md.
No breaking changes in NEL — this is a new module next to existing
scoring/ utilities. The breaking change lands on NMP SDK (a follow-up PR
on NVIDIA-NeMo/Platform will import these contracts and adapt SDK's
concrete metrics to the new compute_scores signature).
Signed-off-by: Wojciech Prazuch <wprazuch@nvidia.com>1 parent d9337e6 commit a394a42
3 files changed
Lines changed: 831 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
25 | 46 | | |
26 | 47 | | |
27 | 48 | | |
| |||
0 commit comments