feat(scoring): expose item/sample/jinja_context on ScorerInput by wprazuch · Pull Request #937 · NVIDIA-NeMo/Evaluator

wprazuch · 2026-04-24T09:07:50Z

Phase 1 of the ScorerInput → NMP-shape migration ("Option B"): add derived properties so NMP-style metrics can read the input in their native vocabulary without breaking any existing NEL scorer code.

Changes on ScorerInput:

New property .item: returns {"reference": target, **metadata}. Gives NMP metric code natural access: sample.item["reference"], sample.item["custom_dataset_field"], etc.
New property .sample: returns {"output_text": response, "response": response}. NMP's canonical inference-payload shape.
New method .jinja_context(): builds a Jinja rendering context exposing both NEL-native ({{ response }}, {{ target }}, {{ metadata.x }}) and NMP-native ({{ item.reference }}, {{ sample.output_text }}) vocabularies. Both template idioms render against the same context.

Changes on TemplateMetric:

_render(template, input) now uses input.jinja_context() internally. Concrete TemplateMetric subclasses authored against either vocabulary work without modification.

No breaking change. Existing fields (response, target, metadata, config, sandbox) remain authoritative — item/sample are derived views. 109 ScorerInput-dependent tests pass (contracts, ergonomics, byob, scorer_input_nmp_shape, benchmark_definitions, scoring_code_execution, environments).

Roadmap: the primary fields will flip at v1.0 — item/sample become authoritative, response/target/metadata/config become legacy accessors. This PR is the safe intermediate step that unblocks NMP's SDK migration to use input.item[...] / input.sample[...] directly from day one.

Tests: 14 new tests in test_scorer_input_nmp_shape.py covering:

.item / .sample derivation + field isolation
.jinja_context() exposes all NEL + NMP vocabularies
NEL-native, NMP-native, and mixed templates render correctly
All existing .response / .target / .metadata / .config / .sandbox access works unchanged

…hape) Phase 1 of the ScorerInput → NMP-shape migration ("Option B"): add derived properties so NMP-style metrics can read the input in their native vocabulary without breaking any existing NEL scorer code. Changes on ScorerInput: - New property .item: returns {"reference": target, **metadata}. Gives NMP metric code natural access: sample.item["reference"], sample.item["custom_dataset_field"], etc. - New property .sample: returns {"output_text": response, "response": response}. NMP's canonical inference-payload shape. - New method .jinja_context(): builds a Jinja rendering context exposing both NEL-native ({{ response }}, {{ target }}, {{ metadata.x }}) and NMP-native ({{ item.reference }}, {{ sample.output_text }}) vocabularies. Both template idioms render against the same context. Changes on TemplateMetric: - _render(template, input) now uses input.jinja_context() internally. Concrete TemplateMetric subclasses authored against either vocabulary work without modification. No breaking change. Existing fields (response, target, metadata, config, sandbox) remain authoritative — item/sample are derived views. 109 ScorerInput-dependent tests pass (contracts, ergonomics, byob, scorer_input_nmp_shape, benchmark_definitions, scoring_code_execution, environments). Roadmap: the primary fields will flip at v1.0 — item/sample become authoritative, response/target/metadata/config become legacy accessors. This PR is the safe intermediate step that unblocks NMP's SDK migration to use input.item[...] / input.sample[...] directly from day one. Tests: 14 new tests in test_scorer_input_nmp_shape.py covering: - .item / .sample derivation + field isolation - .jinja_context() exposes all NEL + NMP vocabularies - NEL-native, NMP-native, and mixed templates render correctly - All existing .response / .target / .metadata / .config / .sandbox access works unchanged Signed-off-by: Wojciech Prazuch <wprazuch@nvidia.com>

copy-pr-bot · 2026-04-24T09:07:53Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-04-24T09:07:58Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: abc4134c-5abe-4771-b3d7-72a1fae7c8bf

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch wprazuch/metric-input-item-sample

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions Bot added the tests label Apr 24, 2026

wprazuch changed the title ~~feat(scoring): expose item/sample/jinja_context on ScorerInput (NMP-s…~~ feat(scoring): expose item/sample/jinja_context on ScorerInput Apr 24, 2026

wprazuch closed this May 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(scoring): expose item/sample/jinja_context on ScorerInput #937

feat(scoring): expose item/sample/jinja_context on ScorerInput #937
wprazuch wants to merge 1 commit intowprazuch/metric-abstractions-byobfrom
wprazuch/metric-input-item-sample

wprazuch commented Apr 24, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented Apr 24, 2026

Uh oh!

coderabbitai Bot commented Apr 24, 2026

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

wprazuch commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

copy-pr-bot Bot commented Apr 24, 2026

Uh oh!

coderabbitai Bot commented Apr 24, 2026

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

wprazuch commented Apr 24, 2026 •

edited

Loading