Fix MLflow auto-export in evaluation skill template by cjluo-nv · Pull Request #1586 · NVIDIA/Model-Optimizer

cjluo-nv · 2026-06-01T18:45:05Z

What does this PR do?

Type of change: documentation

Fixes two issues in the evaluation skill that caused completed eval runs to silently not upload to MLflow (found while running a real GPQA eval — the run finished successfully but never appeared in MLflow):

Missing auto-export trigger. NEL only auto-exports when execution.auto_export.destinations is set; the export.mlflow block alone just configures the exporter. The skill's shortcut said to "copy the export.mlflow block" without calling out the trigger, so a config could end up with the block but no upload.
Interpolated export values break at submit time. With auto-export enabled, NEL resolves the export.mlflow block at submit time in a scope that does not include deployment / evaluation, so ${deployment.served_model_name} / ${evaluation.*} cross-references fail hard with Interpolation key '...' not found. (example_eval.yaml shipped this interpolated pattern with auto-export, so it would hit the error.)

Changes:

example_eval.yaml: mark the auto_export.destinations trigger as REQUIRED; switch the export.mlflow block to literal values (CHANGEME-served-model-name placeholders) with a comment explaining why cross-refs break. ${oc.env:USER} (env interpolation) is retained since it resolves fine.
SKILL.md: rewrite the MLflow shortcut step to require both the trigger and literal export values, and warn against ${deployment.*} / ${evaluation.*} cross-references.

Usage

N/A — documentation/skill-template only.

Testing

pre-commit run passes on both files (markdownlint + YAML format). Verified the literal-export approach uploads correctly by exporting a real run to MLflow.

Before your PR is "Ready for review"

Is this change backward compatible?: ✅
If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
Did you write any new necessary tests?: N/A (documentation)
Did you update Changelog?: N/A (skill docs)
Did you get Claude approval on this PR?: ❌ (pending)

Additional Information

Separate from #1583 (parallelism reference); both touch the evaluation skill but are independent.

🤖 Generated with Claude Code

Summary by CodeRabbit

Documentation
- Clarified MLflow auto-export: auto-export must be explicitly enabled for uploads and the MLflow export block is ignored otherwise.
- Example config now uses literal experiment name, description, and tags (no cross-scope interpolation) and shows concrete sampling values.
- Warned to keep sampling params consistent between tags and runtime params and to set the tracking URI.

The export.mlflow block alone does not upload runs: NEL only auto-exports when execution.auto_export.destinations is set. And with auto-export on, NEL resolves the export block at submit time in a scope without deployment / evaluation, so ${deployment.*} / ${evaluation.*} cross-references fail with "Interpolation key '...' not found". - example_eval.yaml: mark the auto_export.destinations trigger REQUIRED; switch the export.mlflow block to literal values (CHANGEME placeholders) with a note on why interpolation cross-refs break at submit time. - SKILL.md: rewrite the MLflow shortcut step to require both the trigger and literal export values, and warn against deployment/evaluation cross-refs (${oc.env:USER} is fine). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>

coderabbitai · 2026-06-01T18:45:19Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 98f135e0-2479-4d0c-8622-8d9e4e606b13

📥 Commits

Reviewing files that changed from the base of the PR and between e3d5530 and 5c2e28b.

📒 Files selected for processing (2)

.claude/skills/evaluation/SKILL.md
.claude/skills/evaluation/recipes/examples/example_eval.yaml

✅ Files skipped from review due to trivial changes (1)

.claude/skills/evaluation/SKILL.md

🚧 Files skipped from review as they are similar to previous changes (1)

.claude/skills/evaluation/recipes/examples/example_eval.yaml

📝 Walkthrough

Walkthrough

Updated documentation and example YAML to clarify MLflow auto-export configuration. Documentation specifies the two required config pieces (auto_export trigger and export.mlflow block), prohibits cross-reference interpolation in export metadata, and reminds users to set tracking_uri. Example YAML now uses literal placeholder and parameter values instead of interpolating from deployment and evaluation config.

Changes

MLflow Auto-Export Documentation and Configuration

Layer / File(s)	Summary
Documentation of auto-export defaults and requirements `.claude/skills/evaluation/SKILL.md`	Step 1 instructions updated to explicitly document that MLflow auto-export is enabled by default, requires `execution.auto_export.destinations: [mlflow]` trigger and `export.mlflow` block with literal metadata values, forbids `${deployment.}`/`${evaluation.}` interpolation, and reminds users to set `tracking_uri` in Step 4.
Example auto-export trigger configuration `.claude/skills/evaluation/recipes/examples/example_eval.yaml`	`auto_export` block adjusted with comments clarifying it is the required trigger and explaining that `export.mlflow` is ignored unless auto-export is enabled.
Example MLflow export metadata with literal values `.claude/skills/evaluation/recipes/examples/example_eval.yaml`	`export.mlflow` block re-templated to replace `${deployment.served_model_name}` and `${evaluation.nemo_evaluator_config.config.params.*}` interpolations with literal `CHANGEME-served-model-name` placeholder and literal sampling parameter values (`'1.0'`, `'0.95'`, `'65536'`) in experiment_name, description, and tags fields.

🎯 2 (Simple) | ⏱️ ~8 minutes

🚥 Pre-merge checks | ✅ 6

✅ Passed checks (6 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: fixing MLflow auto-export configuration in the evaluation skill template by addressing missing auto-export triggers and interpolation failures.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns	✅ Passed	PR modifies only documentation (SKILL.md) and configuration (example_eval.yaml) files; no Python code changes present, so security anti-pattern check does not apply.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch skill/mlflow-autoexport-fix

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

.claude/skills/evaluation/recipes/examples/example_eval.yaml (1)
106-106: ⚡ Quick win

Consider adding validation for CHANGEME placeholder values.

The CHANGEME-served-model-name placeholder is clear and prominent, but if users forget to replace it, MLflow will accept the run with incorrect metadata. While the documentation instructs users to substitute actual values, runtime validation could prevent this mistake.

Optional: Consider whether NEL or a pre-submit hook could validate that CHANGEME placeholders have been replaced, or add a more prominent warning in the YAML comments.

Also applies to: 114-114
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.claude/skills/evaluation/recipes/examples/example_eval.yaml at line 106,
The experiment_name field currently uses the literal placeholder
"CHANGEME-served-model-name" which may be left unchanged; add runtime validation
to reject or warn when experiment_name contains the substring "CHANGEME" (or
exactly matches that pattern) before creating the MLflow run. Update the code
path that reads/uses experiment_name (where experiment_name is referenced) to
detect this placeholder and either throw an error or log a fatal message
instructing the user to replace it (and do the same check for the other
occurrence noted at the same file); optionally add a pre-submit hook or CI lint
rule to scan the YAML for "CHANGEME" placeholders.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In @.claude/skills/evaluation/recipes/examples/example_eval.yaml:
- Line 106: The experiment_name field currently uses the literal placeholder
"CHANGEME-served-model-name" which may be left unchanged; add runtime validation
to reject or warn when experiment_name contains the substring "CHANGEME" (or
exactly matches that pattern) before creating the MLflow run. Update the code
path that reads/uses experiment_name (where experiment_name is referenced) to
detect this placeholder and either throw an error or log a fatal message
instructing the user to replace it (and do the same check for the other
occurrence noted at the same file); optionally add a pre-submit hook or CI lint
rule to scan the YAML for "CHANGEME" placeholders.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 1da619c2-ad96-43fa-811c-a1ee882a7ffa

📥 Commits

Reviewing files that changed from the base of the PR and between b3d83ba and e3d5530.

📒 Files selected for processing (2)

.claude/skills/evaluation/SKILL.md
.claude/skills/evaluation/recipes/examples/example_eval.yaml

The export.mlflow temperature/top_p/max_new_tokens literals duplicate the top-level evaluation params and can drift. They are the only queryable record of sampling in MLflow (NEL does not log them as run params), so they are worth keeping — but since they cannot be interpolated under auto-export, the comment in example_eval.yaml and the SKILL.md guidance now require updating them in the same edit as the params, else MLflow misreports the run. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>

codecov · 2026-06-01T19:11:26Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.64%. Comparing base (f99d83e) to head (5c2e28b).
⚠️ Report is 8 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1586      +/-   ##
==========================================
- Coverage   76.67%   76.64%   -0.04%     
==========================================
  Files         478      478              
  Lines       52393    52988     +595     
==========================================
+ Hits        40174    40614     +440     
- Misses      12219    12374     +155

Flag	Coverage Δ
unit	`53.61% <ø> (+0.09%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

meenchen

Bot review — DM the bot to share feedback.

Docs-only fix to the evaluation skill template. The two changes are consistent and well-explained: (1) auto_export.destinations is annotated as the required trigger, and (2) the export.mlflow block switches from ${deployment.*} / ${evaluation.*} cross-refs (which fail at NEL submit time, where that scope isn't available) to CHANGEME-… literals plus ${oc.env:USER} (env-var interpolation, which still resolves). SKILL.md Step 2 shortcut is updated to match, including the drift-warning to keep the literal temperature / top_p / max_new_tokens tags in sync with the top-level params. Author reports verifying upload with a real run. No licensing, code, or architecture impact.

github-actions · 2026-06-01T20:12:02Z

PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-06-01 20:11 UTC

coderabbitai Bot reviewed Jun 1, 2026

View reviewed changes

coderabbitai Bot approved these changes Jun 1, 2026

View reviewed changes

cjluo-nv requested review from chadvoegele, kaix-nv and meenchen June 1, 2026 20:07

meenchen approved these changes Jun 1, 2026

View reviewed changes

cjluo-nv merged commit 2b7668d into main Jun 1, 2026
33 checks passed

cjluo-nv deleted the skill/mlflow-autoexport-fix branch June 1, 2026 20:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix MLflow auto-export in evaluation skill template#1586

Fix MLflow auto-export in evaluation skill template#1586
cjluo-nv merged 2 commits into
mainfrom
skill/mlflow-autoexport-fix

cjluo-nv commented Jun 1, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 1, 2026 •

edited

Loading

Walkthrough

Changes

Uh oh!

coderabbitai Bot left a comment

Uh oh!

codecov Bot commented Jun 1, 2026

Uh oh!

meenchen left a comment

Uh oh!

Uh oh!

github-actions Bot commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

cjluo-nv commented Jun 1, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Jun 1, 2026

Codecov Report

Uh oh!

meenchen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cjluo-nv commented Jun 1, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 1, 2026 •

edited

Loading