Skip to content

feat(scoring): add ergonomics helpers for TemplateMetric authoring#931

Closed
wprazuch wants to merge 1 commit into
wprazuch/metric-abstractionsfrom
wprazuch/metric-abstractions-ergonomics
Closed

feat(scoring): add ergonomics helpers for TemplateMetric authoring#931
wprazuch wants to merge 1 commit into
wprazuch/metric-abstractionsfrom
wprazuch/metric-abstractions-ergonomics

Conversation

@wprazuch
Copy link
Copy Markdown
Contributor

Completes the ~20-30 LOC authoring promise by giving TemplateMetric subclasses ready-made helpers. Stacked on wprazuch/metric-abstractions.

New:

  • TemplateMetric._render(template, input) — Jinja2 rendering with both NEL-native ({{ response }}, {{ target }}, {{ metadata.* }}) and SDK-native ({{ output_text }}, {{ reference }}) variable names, so templates authored against either vocabulary work unchanged. Strict undefined-variable handling raises instead of silently rendering empty.

  • CorpusTemplateMetric(TemplateMetric) — base class for metrics with both row-level and corpus-level scores. Subclasses implement _score() and _corpus_score(); defaults wrap each in a MetricResult. score_names() includes both '' and '_corpus'. Empty inputs -> None.

  • SecretsMixin — mixin that satisfies MetricWithSecrets protocol. Subclasses declare secret_env_vars: ClassVar[tuple[str, ...]]. Secrets are eagerly loaded from os.environ at construction, with async resolve_secrets() as a fallback path (NMP Platform flow). Resolved values are stored as SecretStr private attrs; get_secret(env_var) returns the plaintext value or None.

Tests (+18 new, 42 total):

  • _render: NEL-native names, SDK-native aliases, metadata/config access, StrictUndefined raises on missing variables.
  • CorpusTemplateMetric: satisfies both Metric + CorpusMetric protocols, row-level default, corpus-level default, empty-inputs, score_names includes both.
  • SecretsMixin: satisfies MetricWithSecrets, declares env vars, reads env at construction, async resolver fills gaps, resolver is skipped when already loaded.
  • Target ergonomics proof: _TinyLengthMetric in ~15 LOC of user code demonstrates the authoring pattern.

NMP can pick these up as they land. The contract API is stable — helpers are additive (subclassing-based), so SDK concrete metrics can adopt them incrementally without breakage.

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 23, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 23, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: e25a603c-6b7c-4a79-88f5-ecf99466765e

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch wprazuch/metric-abstractions-ergonomics

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added the tests label Apr 23, 2026
@wprazuch wprazuch force-pushed the wprazuch/metric-abstractions branch from a394a42 to c89d9b9 Compare April 23, 2026 11:46
Completes the ~20-30 LOC authoring promise by giving TemplateMetric
subclasses ready-made helpers. Stacked on wprazuch/metric-abstractions.

New:
- TemplateMetric._render(template, input) — Jinja2 rendering with both
  NEL-native ({{ response }}, {{ target }}, {{ metadata.* }}) and
  SDK-native ({{ output_text }}, {{ reference }}) variable names, so
  templates authored against either vocabulary work unchanged. Strict
  undefined-variable handling raises instead of silently rendering empty.

- CorpusTemplateMetric(TemplateMetric) — base class for metrics with
  both row-level and corpus-level scores. Subclasses implement _score()
  and _corpus_score(); defaults wrap each in a MetricResult. score_names()
  includes both '<type>' and '<type>_corpus'. Empty inputs -> None.

- SecretsMixin — mixin that satisfies MetricWithSecrets protocol.
  Subclasses declare secret_env_vars: ClassVar[tuple[str, ...]]. Secrets
  are eagerly loaded from os.environ at construction, with async
  resolve_secrets() as a fallback path (NMP Platform flow). Resolved
  values are stored as SecretStr private attrs; get_secret(env_var)
  returns the plaintext value or None.

Tests (+18 new, 42 total):
- _render: NEL-native names, SDK-native aliases, metadata/config access,
  StrictUndefined raises on missing variables.
- CorpusTemplateMetric: satisfies both Metric + CorpusMetric protocols,
  row-level default, corpus-level default, empty-inputs, score_names
  includes both.
- SecretsMixin: satisfies MetricWithSecrets, declares env vars, reads
  env at construction, async resolver fills gaps, resolver is skipped
  when already loaded.
- Target ergonomics proof: _TinyLengthMetric in ~15 LOC of user code
  demonstrates the authoring pattern.

NMP can pick these up as they land. The contract API is stable — helpers
are additive (subclassing-based), so SDK concrete metrics can adopt them
incrementally without breakage.

Signed-off-by: Wojciech Prazuch <wprazuch@nvidia.com>
@wprazuch wprazuch force-pushed the wprazuch/metric-abstractions-ergonomics branch from e5f90a3 to 78bccb8 Compare April 23, 2026 11:46
@wprazuch wprazuch closed this May 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant