feat(scoring): expose item/sample/jinja_context on ScorerInput #937
Closed
wprazuch wants to merge 1 commit intowprazuch/metric-abstractions-byobfrom
Closed
feat(scoring): expose item/sample/jinja_context on ScorerInput #937wprazuch wants to merge 1 commit intowprazuch/metric-abstractions-byobfrom
wprazuch wants to merge 1 commit intowprazuch/metric-abstractions-byobfrom
Conversation
…hape)
Phase 1 of the ScorerInput → NMP-shape migration ("Option B"): add
derived properties so NMP-style metrics can read the input in their
native vocabulary without breaking any existing NEL scorer code.
Changes on ScorerInput:
- New property .item: returns {"reference": target, **metadata}.
Gives NMP metric code natural access: sample.item["reference"],
sample.item["custom_dataset_field"], etc.
- New property .sample: returns {"output_text": response, "response": response}.
NMP's canonical inference-payload shape.
- New method .jinja_context(): builds a Jinja rendering context exposing
both NEL-native ({{ response }}, {{ target }}, {{ metadata.x }}) and
NMP-native ({{ item.reference }}, {{ sample.output_text }}) vocabularies.
Both template idioms render against the same context.
Changes on TemplateMetric:
- _render(template, input) now uses input.jinja_context() internally.
Concrete TemplateMetric subclasses authored against either vocabulary
work without modification.
No breaking change. Existing fields (response, target, metadata, config,
sandbox) remain authoritative — item/sample are derived views. 109
ScorerInput-dependent tests pass (contracts, ergonomics, byob,
scorer_input_nmp_shape, benchmark_definitions, scoring_code_execution,
environments).
Roadmap: the primary fields will flip at v1.0 — item/sample become
authoritative, response/target/metadata/config become legacy accessors.
This PR is the safe intermediate step that unblocks NMP's SDK migration
to use input.item[...] / input.sample[...] directly from day one.
Tests: 14 new tests in test_scorer_input_nmp_shape.py covering:
- .item / .sample derivation + field isolation
- .jinja_context() exposes all NEL + NMP vocabularies
- NEL-native, NMP-native, and mixed templates render correctly
- All existing .response / .target / .metadata / .config / .sandbox access
works unchanged
Signed-off-by: Wojciech Prazuch <wprazuch@nvidia.com>
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Enterprise Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Phase 1 of the ScorerInput → NMP-shape migration ("Option B"): add derived properties so NMP-style metrics can read the input in their native vocabulary without breaking any existing NEL scorer code.
Changes on ScorerInput:
New property .item: returns {"reference": target, **metadata}. Gives NMP metric code natural access: sample.item["reference"], sample.item["custom_dataset_field"], etc.
New property .sample: returns {"output_text": response, "response": response}. NMP's canonical inference-payload shape.
New method .jinja_context(): builds a Jinja rendering context exposing both NEL-native ({{ response }}, {{ target }}, {{ metadata.x }}) and NMP-native ({{ item.reference }}, {{ sample.output_text }}) vocabularies. Both template idioms render against the same context.
Changes on TemplateMetric:
No breaking change. Existing fields (response, target, metadata, config, sandbox) remain authoritative — item/sample are derived views. 109 ScorerInput-dependent tests pass (contracts, ergonomics, byob, scorer_input_nmp_shape, benchmark_definitions, scoring_code_execution, environments).
Roadmap: the primary fields will flip at v1.0 — item/sample become authoritative, response/target/metadata/config become legacy accessors. This PR is the safe intermediate step that unblocks NMP's SDK migration to use input.item[...] / input.sample[...] directly from day one.
Tests: 14 new tests in test_scorer_input_nmp_shape.py covering: