feat(scoring): add ergonomics helpers for TemplateMetric authoring by wprazuch · Pull Request #931 · NVIDIA-NeMo/Evaluator

wprazuch · 2026-04-23T09:49:31Z

Completes the ~20-30 LOC authoring promise by giving TemplateMetric subclasses ready-made helpers. Stacked on wprazuch/metric-abstractions.

New:

TemplateMetric._render(template, input) — Jinja2 rendering with both NEL-native ({{ response }}, {{ target }}, {{ metadata.* }}) and SDK-native ({{ output_text }}, {{ reference }}) variable names, so templates authored against either vocabulary work unchanged. Strict undefined-variable handling raises instead of silently rendering empty.
CorpusTemplateMetric(TemplateMetric) — base class for metrics with both row-level and corpus-level scores. Subclasses implement _score() and _corpus_score(); defaults wrap each in a MetricResult. score_names() includes both '' and '_corpus'. Empty inputs -> None.
SecretsMixin — mixin that satisfies MetricWithSecrets protocol. Subclasses declare secret_env_vars: ClassVar[tuple[str, ...]]. Secrets are eagerly loaded from os.environ at construction, with async resolve_secrets() as a fallback path (NMP Platform flow). Resolved values are stored as SecretStr private attrs; get_secret(env_var) returns the plaintext value or None.

Tests (+18 new, 42 total):

_render: NEL-native names, SDK-native aliases, metadata/config access, StrictUndefined raises on missing variables.
CorpusTemplateMetric: satisfies both Metric + CorpusMetric protocols, row-level default, corpus-level default, empty-inputs, score_names includes both.
SecretsMixin: satisfies MetricWithSecrets, declares env vars, reads env at construction, async resolver fills gaps, resolver is skipped when already loaded.
Target ergonomics proof: _TinyLengthMetric in ~15 LOC of user code demonstrates the authoring pattern.

NMP can pick these up as they land. The contract API is stable — helpers are additive (subclassing-based), so SDK concrete metrics can adopt them incrementally without breakage.

copy-pr-bot · 2026-04-23T09:49:35Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-04-23T09:49:38Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: e25a603c-6b7c-4a79-88f5-ecf99466765e

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch wprazuch/metric-abstractions-ergonomics

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Completes the ~20-30 LOC authoring promise by giving TemplateMetric subclasses ready-made helpers. Stacked on wprazuch/metric-abstractions. New: - TemplateMetric._render(template, input) — Jinja2 rendering with both NEL-native ({{ response }}, {{ target }}, {{ metadata.* }}) and SDK-native ({{ output_text }}, {{ reference }}) variable names, so templates authored against either vocabulary work unchanged. Strict undefined-variable handling raises instead of silently rendering empty. - CorpusTemplateMetric(TemplateMetric) — base class for metrics with both row-level and corpus-level scores. Subclasses implement _score() and _corpus_score(); defaults wrap each in a MetricResult. score_names() includes both '<type>' and '<type>_corpus'. Empty inputs -> None. - SecretsMixin — mixin that satisfies MetricWithSecrets protocol. Subclasses declare secret_env_vars: ClassVar[tuple[str, ...]]. Secrets are eagerly loaded from os.environ at construction, with async resolve_secrets() as a fallback path (NMP Platform flow). Resolved values are stored as SecretStr private attrs; get_secret(env_var) returns the plaintext value or None. Tests (+18 new, 42 total): - _render: NEL-native names, SDK-native aliases, metadata/config access, StrictUndefined raises on missing variables. - CorpusTemplateMetric: satisfies both Metric + CorpusMetric protocols, row-level default, corpus-level default, empty-inputs, score_names includes both. - SecretsMixin: satisfies MetricWithSecrets, declares env vars, reads env at construction, async resolver fills gaps, resolver is skipped when already loaded. - Target ergonomics proof: _TinyLengthMetric in ~15 LOC of user code demonstrates the authoring pattern. NMP can pick these up as they land. The contract API is stable — helpers are additive (subclassing-based), so SDK concrete metrics can adopt them incrementally without breakage. Signed-off-by: Wojciech Prazuch <wprazuch@nvidia.com>

github-actions Bot added the tests label Apr 23, 2026

wprazuch force-pushed the wprazuch/metric-abstractions branch from a394a42 to c89d9b9 Compare April 23, 2026 11:46

wprazuch force-pushed the wprazuch/metric-abstractions-ergonomics branch from e5f90a3 to 78bccb8 Compare April 23, 2026 11:46

wprazuch closed this May 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(scoring): add ergonomics helpers for TemplateMetric authoring#931

feat(scoring): add ergonomics helpers for TemplateMetric authoring#931
wprazuch wants to merge 1 commit into
wprazuch/metric-abstractionsfrom
wprazuch/metric-abstractions-ergonomics

wprazuch commented Apr 23, 2026

Uh oh!

copy-pr-bot Bot commented Apr 23, 2026

Uh oh!

coderabbitai Bot commented Apr 23, 2026 •

edited

Loading

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

wprazuch commented Apr 23, 2026

Uh oh!

copy-pr-bot Bot commented Apr 23, 2026

Uh oh!

coderabbitai Bot commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Apr 23, 2026 •

edited

Loading