Skip to content

feat(scoring): expose item/sample/jinja_context on ScorerInput #937

Closed
wprazuch wants to merge 1 commit intowprazuch/metric-abstractions-byobfrom
wprazuch/metric-input-item-sample
Closed

feat(scoring): expose item/sample/jinja_context on ScorerInput #937
wprazuch wants to merge 1 commit intowprazuch/metric-abstractions-byobfrom
wprazuch/metric-input-item-sample

Conversation

@wprazuch
Copy link
Copy Markdown
Contributor

@wprazuch wprazuch commented Apr 24, 2026

Phase 1 of the ScorerInput → NMP-shape migration ("Option B"): add derived properties so NMP-style metrics can read the input in their native vocabulary without breaking any existing NEL scorer code.

Changes on ScorerInput:

  • New property .item: returns {"reference": target, **metadata}. Gives NMP metric code natural access: sample.item["reference"], sample.item["custom_dataset_field"], etc.

  • New property .sample: returns {"output_text": response, "response": response}. NMP's canonical inference-payload shape.

  • New method .jinja_context(): builds a Jinja rendering context exposing both NEL-native ({{ response }}, {{ target }}, {{ metadata.x }}) and NMP-native ({{ item.reference }}, {{ sample.output_text }}) vocabularies. Both template idioms render against the same context.

Changes on TemplateMetric:

  • _render(template, input) now uses input.jinja_context() internally. Concrete TemplateMetric subclasses authored against either vocabulary work without modification.

No breaking change. Existing fields (response, target, metadata, config, sandbox) remain authoritative — item/sample are derived views. 109 ScorerInput-dependent tests pass (contracts, ergonomics, byob, scorer_input_nmp_shape, benchmark_definitions, scoring_code_execution, environments).

Roadmap: the primary fields will flip at v1.0 — item/sample become authoritative, response/target/metadata/config become legacy accessors. This PR is the safe intermediate step that unblocks NMP's SDK migration to use input.item[...] / input.sample[...] directly from day one.

Tests: 14 new tests in test_scorer_input_nmp_shape.py covering:

  • .item / .sample derivation + field isolation
  • .jinja_context() exposes all NEL + NMP vocabularies
  • NEL-native, NMP-native, and mixed templates render correctly
  • All existing .response / .target / .metadata / .config / .sandbox access works unchanged

…hape)

Phase 1 of the ScorerInput → NMP-shape migration ("Option B"): add
derived properties so NMP-style metrics can read the input in their
native vocabulary without breaking any existing NEL scorer code.

Changes on ScorerInput:

- New property .item: returns {"reference": target, **metadata}.
  Gives NMP metric code natural access: sample.item["reference"],
  sample.item["custom_dataset_field"], etc.

- New property .sample: returns {"output_text": response, "response": response}.
  NMP's canonical inference-payload shape.

- New method .jinja_context(): builds a Jinja rendering context exposing
  both NEL-native ({{ response }}, {{ target }}, {{ metadata.x }}) and
  NMP-native ({{ item.reference }}, {{ sample.output_text }}) vocabularies.
  Both template idioms render against the same context.

Changes on TemplateMetric:

- _render(template, input) now uses input.jinja_context() internally.
  Concrete TemplateMetric subclasses authored against either vocabulary
  work without modification.

No breaking change. Existing fields (response, target, metadata, config,
sandbox) remain authoritative — item/sample are derived views. 109
ScorerInput-dependent tests pass (contracts, ergonomics, byob,
scorer_input_nmp_shape, benchmark_definitions, scoring_code_execution,
environments).

Roadmap: the primary fields will flip at v1.0 — item/sample become
authoritative, response/target/metadata/config become legacy accessors.
This PR is the safe intermediate step that unblocks NMP's SDK migration
to use input.item[...] / input.sample[...] directly from day one.

Tests: 14 new tests in test_scorer_input_nmp_shape.py covering:
- .item / .sample derivation + field isolation
- .jinja_context() exposes all NEL + NMP vocabularies
- NEL-native, NMP-native, and mixed templates render correctly
- All existing .response / .target / .metadata / .config / .sandbox access
  works unchanged

Signed-off-by: Wojciech Prazuch <wprazuch@nvidia.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 24, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 24, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: abc4134c-5abe-4771-b3d7-72a1fae7c8bf

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch wprazuch/metric-input-item-sample

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added the tests label Apr 24, 2026
@wprazuch wprazuch changed the title feat(scoring): expose item/sample/jinja_context on ScorerInput (NMP-s… feat(scoring): expose item/sample/jinja_context on ScorerInput Apr 24, 2026
@wprazuch wprazuch closed this May 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant