add multi-gpu COMET judge by lilithgrigoryan · Pull Request #1456 · NVIDIA-NeMo/Skills

lilithgrigoryan · 2026-05-20T19:46:11Z

This PR introduces multi-gpu inference of COMET models.
It additionally sets default COMET judge scoring to bfloat16 instead of float32

Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>

coderabbitai · 2026-05-20T19:48:38Z

📝 Walkthrough

Walkthrough

This PR extends COMET evaluation with precision control and distributed execution support. The COMET evaluator gains PyTorch 2.6+ compatibility, a precision-to-dtype mapping, and distributed-aware output writing. A new --precision CLI parameter allows dtype selection. The judge pipeline system is updated to pass precision and batch-size configuration through to the COMET evaluator during task orchestration.

Changes

COMET Evaluator Precision and Distributed Execution

Layer / File(s)	Summary
PyTorch Compatibility, Precision Mapping, and Distributed Infrastructure `nemo_skills/evaluation/evaluator/comet.py`	Monkeypatches `torch.load` to force `weights_only=False` for PyTorch 2.6+ compatibility. Introduces `_PRECISION_TO_DTYPE` mapping and global-rank-zero helper. Updates `load_comet_model` to accept optional `dtype` parameter, load checkpoint on CPU, and cast weights before device placement. Updates `process_file` to validate and load JSONL records directly from `input_file`.
COMET Inference with GPU Awareness and Distributed Output `nemo_skills/evaluation/evaluator/comet.py`	Sets GPU count from `torch.cuda.device_count()` and passes it to `comet_model.predict`. Implements distributed-aware output: skips result writing on non-zero global ranks and writes JSONL with embedded `comet` scores only on rank 0.
CLI Precision Parameter and Main Integration `nemo_skills/evaluation/evaluator/comet.py`	Adds `--precision` argument with values matching the dtype mapping. Updates `main()` to call `load_comet_model` with dtype derived from parsed `--precision` selection.
Judge Pipeline Configuration and COMET Evaluator Invocation `nemo_skills/pipeline/eval.py`, `nemo_skills/pipeline/judges/comet_judge.py`	eval.py parses `judge_pipeline_kwargs` and merges into `judge_pipeline_args` before invoking custom `judge_step_fn`. comet_judge derives `precision` and `batch_size` from `judge_pipeline_args` and appends corresponding CLI flags. Command execution uses lock/ready-file wrapper for safe concurrent pip-install, then runs evaluator with assembled arguments. Slurm task sets `num_tasks` based on `judge_server_gpus`.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 71.43% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Title check	✅ Passed	The title 'add multi-gpu COMET judge' accurately captures the main purpose of the pull request, which is enabling multi-GPU support for COMET model inference across all three modified files.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch lgrigoryan/multi-device-comet

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

nemo_skills/evaluation/evaluator/comet.py (1)

124-125: 💤 Low value

Unused loop variable sample.

The loop unpacks sample but only uses idx to index into data. Use _ to indicate the variable is intentionally unused.

Suggested fix

-    for idx, sample in enumerate(data):
+    for idx, _ in enumerate(data):
         data[idx]["comet"] = comet_scores[idx]

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@nemo_skills/evaluation/evaluator/comet.py` around lines 124 - 125, The
for-loop in comet.py currently unpacks an unused variable `sample` (for idx,
sample in enumerate(data))—replace `sample` with `_` so it reads `for idx, _ in
enumerate(data):` to signal the variable is intentionally unused; keep the rest
of the body that assigns `data[idx]["comet"] = comet_scores[idx]` unchanged and
ensure variable names `idx`, `data`, and `comet_scores` are left intact.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@nemo_skills/evaluation/evaluator/comet.py`:
- Around line 124-125: The for-loop in comet.py currently unpacks an unused
variable `sample` (for idx, sample in enumerate(data))—replace `sample` with `_`
so it reads `for idx, _ in enumerate(data):` to signal the variable is
intentionally unused; keep the rest of the body that assigns `data[idx]["comet"]
= comet_scores[idx]` unchanged and ensure variable names `idx`, `data`, and
`comet_scores` are left intact.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 11b5cbb3-ba72-41bb-8b71-adef82956d7c

📥 Commits

Reviewing files that changed from the base of the PR and between 467b057 and 5d03073.

📒 Files selected for processing (3)

nemo_skills/evaluation/evaluator/comet.py
nemo_skills/pipeline/eval.py
nemo_skills/pipeline/judges/comet_judge.py

lilithgrigoryan added 11 commits May 13, 2026 17:38

add test benchmark

6cc348a

Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>

adjusting the prompt

2116a34

Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>

hardcode

ef3f194

Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>

8gpu run

fa12d26

Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>

set num gpus to 8

fe11b9f

Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>

add lock for installing unbabel comet

d23bfb1

Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>

fix loading model and IO

2caafae

Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>

fix writing

91875d5

Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>

add bfloat16

561b142

Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>

clean up

b8b5955

Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>

restore batch size, add proper hook

5d03073

Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>

lilithgrigoryan added the run GPU tests label May 20, 2026

coderabbitai Bot reviewed May 20, 2026

View reviewed changes

lilithgrigoryan changed the title ~~Lgrigoryan/multi device comet~~ add multi-gpu COMET judge May 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add multi-gpu COMET judge#1456

add multi-gpu COMET judge#1456
lilithgrigoryan wants to merge 11 commits into
mainfrom
lgrigoryan/multi-device-comet

lilithgrigoryan commented May 20, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 20, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lilithgrigoryan commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

lilithgrigoryan commented May 20, 2026 •

edited

Loading

coderabbitai Bot commented May 20, 2026 •

edited

Loading