feat(evaluator): add Trial→Intake boundary mapping module (D8) by SandyChapman · Pull Request #443 · NVIDIA-NeMo/nemo-platform

SandyChapman · 2026-06-24T19:35:34Z

What

Adds plugins/nemo-evaluator/src/nemo_evaluator/intake/mapping.py — the single pure layer (D8, AALGO-289) that translates Evaluator vocabulary into the platform SDK's typed Intake request params. D3/D4/D5 obtain their request shapes and field names only from here, so a later glossary rename is a one-file change.

Three pure functions:

trial_to_atif_ingest(trial, ...) -> AtifCreateParams — minimal single-step trajectory from trial.output until D2 trace normalization; defaults agent.version (design §3.9 feat(evaluator): port metric output protocol #6).
score_to_evaluator_results(score, *, session_id, span_id) -> list[EvaluatorResultCreateParams] — one row per MetricOutput, name="{metric_type}.{output}". span_id is a caller-supplied parameter because it's server-assigned and only knowable after the ATIF ingest + span lookup — the orchestration (loop trials → POST atif → resolve span → coerce scores) belongs to the D3/D9 adapter, not this pure module.
run_task_to_experiment_context(trial, *, experiment_id) -> ExperimentContextParam — lean {experiment_id, test_case_id}.

Design notes

Returns the generated nemo-platform-sdk *CreateParams TypedDicts, not hand-shaped dicts. At runtime they're plain dicts the adapter splats into client.intake.ingest.atif.create(**body); statically, ty checks our field names / literals / nested shapes against the real generated schema, so an API change surfaces as a type error instead of drifting silently. Imports the SDK client types (nemo_platform.types.intake.*, already a plugin dep), never the Intake service (nmp.intake.*).
CATEGORICAL coercion is deferred — a category and free text are indistinguishable at the value level today (both arrive as str/Label), so everything string-valued maps to TEXT until a real signal exists.

Tests

All four/used data_type coercions + the MetricOutput.value.root unwrap.
trial_to_atif_ingest shape, version defaulting, missing-output, final_metrics.
score_to_evaluator_results naming, one-row-per-output, comment-from-diagnostic.
Import-hygiene guardrail: nothing under nemo_evaluator.intake imports the Intake service.

Verification

ruff (style + format), copyright headers, no-nmp_common-in-plugins guard, ty, and 21 unit tests all green.

Refs: AALGO-289. Informs D3/D4/D5/D9.

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Added support for sending evaluator runs and scores into Intake with richer metadata, including session tracking, experiment context, agent details, and final metrics.
- Metric outputs are now converted more reliably across boolean, numeric, and text values.
Bug Fixes
- Improved handling of missing outputs and optional fields so generated ingestion payloads are more consistent.

Adds plugins/nemo-evaluator/src/nemo_evaluator/intake/mapping.py: the single pure layer that translates Evaluator vocabulary (AgentEvalTrial, AgentEvalTaskScore, MetricOutput) into the platform SDK's typed Intake request params, so the D3/D4/D5 write-adapters share one source of request shapes. - trial_to_atif_ingest -> AtifCreateParams (minimal single-step trajectory until D2 trace normalization; defaults agent.version per design §3.9 #6). - score_to_evaluator_results -> list[EvaluatorResultCreateParams], one row per MetricOutput, name='{metric_type}.{output}', span_id supplied by the caller (resolved post-ingest; the adapter owns that orchestration). - run_task_to_experiment_context -> ExperimentContextParam (lean {experiment_id, test_case_id}). Returns the generated nemo-platform-sdk *CreateParams TypedDicts (runtime dicts, statically checked against the real schema) rather than hand-shaped dicts; imports the SDK client types, never the Intake service (nmp.intake.*). CATEGORICAL coercion is intentionally deferred (strings -> TEXT) until a real signal exists. Includes unit tests for all coercions + the .root unwrap and an import-hygiene guardrail. Refs: AALGO-289 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Sandy Chapman <schapman@nvidia.com>

coderabbitai · 2026-06-24T19:39:10Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: e1d8c744-4095-4b46-819e-3750fc02b81b

📥 Commits

Reviewing files that changed from the base of the PR and between d9e1851 and f937fa5.

📒 Files selected for processing (3)

plugins/nemo-evaluator/src/nemo_evaluator/intake/mapping.py
plugins/nemo-evaluator/tests/intake/test_import_hygiene.py
plugins/nemo-evaluator/tests/intake/test_mapping.py

📝 Walkthrough

Walkthrough

Adds Intake mapping helpers that convert evaluator trials and scores into ATIF ingest payloads and evaluator result rows, plus tests for session IDs, experiment context, value coercion, and import restrictions.

Changes

Intake mapping boundary

Layer / File(s)	Summary
Module contract and trial ingest `plugins/nemo-evaluator/src/nemo_evaluator/intake/mapping.py`	Defines ATIF constants, session and experiment-context helpers, and the trial-to-ATIF ingest payload builder.
Score result coercion `plugins/nemo-evaluator/src/nemo_evaluator/intake/mapping.py`	Maps task scores into evaluator result rows and classifies metric values into BOOLEAN, NUMERIC, or TEXT fields.
Mapping unit tests `plugins/nemo-evaluator/tests/intake/test_mapping.py`	Covers session IDs, experiment context, ATIF ingest payloads, and evaluator result coercion and diagnostics.
Import hygiene guardrail `plugins/nemo-evaluator/tests/intake/test_import_hygiene.py`	Scans the intake package for forbidden Intake service, transport, and HTTPX imports and fails on any matches.

Sequence Diagram(s)

sequenceDiagram
  participant AgentEvalTrial
  participant trial_to_atif_ingest
  participant run_task_to_experiment_context
  participant AtifCreateParams
  participant AgentEvalTaskScore
  participant score_to_evaluator_results
  participant _coerce_metric_value
  participant EvaluatorResultCreateParams

  AgentEvalTrial->>trial_to_atif_ingest: build ATIF ingest payload
  trial_to_atif_ingest->>run_task_to_experiment_context: derive experiment_context
  run_task_to_experiment_context-->>trial_to_atif_ingest: experiment_id, test_case_id
  trial_to_atif_ingest->>AtifCreateParams: assemble schema, agent, step, metrics

  AgentEvalTaskScore->>score_to_evaluator_results: map score outputs
  loop each output
    score_to_evaluator_results->>_coerce_metric_value: unwrap and classify value
    _coerce_metric_value-->>score_to_evaluator_results: data_type, value, string_value
    score_to_evaluator_results->>EvaluatorResultCreateParams: emit result row
  end

Possibly related PRs

NVIDIA-NeMo/nemo-platform#339: Adds the AgentEvalTrial and AgentEvalTaskScore domain models consumed by this mapping layer.

Suggested labels

feat

Suggested reviewers

ngoncharenko
asutermo

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 22.73% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: adding the Trial→Intake boundary mapping module.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch aalgo-289-intake-mapping-module/schapman

_{Comment @coderabbitai help to get the list of available commands.}

github-actions · 2026-06-24T19:45:20Z

Suite	Lines Covered	Line Rate	Branch Rate
Unit Tests	20908/27474	76.1%	61.2%
Integration Tests	12109/26243	46.1%	19.5%

SandyChapman requested review from a team as code owners June 24, 2026 19:35

github-actions Bot added the feat label Jun 24, 2026

SandyChapman requested review from arpitsardhana and ngoncharenko June 24, 2026 19:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(evaluator): add Trial→Intake boundary mapping module (D8)#443

feat(evaluator): add Trial→Intake boundary mapping module (D8)#443
SandyChapman wants to merge 1 commit into
mainfrom
aalgo-289-intake-mapping-module/schapman

SandyChapman commented Jun 24, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 24, 2026

Walkthrough

Changes

Sequence Diagram(s)

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

SandyChapman commented Jun 24, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Design notes

Tests

Verification

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 24, 2026

Walkthrough

Changes

Sequence Diagram(s)

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

SandyChapman commented Jun 24, 2026 •

edited by coderabbitai Bot

Loading