Skip to content

add multi-gpu COMET judge#1456

Open
lilithgrigoryan wants to merge 11 commits into
mainfrom
lgrigoryan/multi-device-comet
Open

add multi-gpu COMET judge#1456
lilithgrigoryan wants to merge 11 commits into
mainfrom
lgrigoryan/multi-device-comet

Conversation

@lilithgrigoryan
Copy link
Copy Markdown
Collaborator

@lilithgrigoryan lilithgrigoryan commented May 20, 2026

This PR introduces multi-gpu inference of COMET models.
It additionally sets default COMET judge scoring to bfloat16 instead of float32

Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>
Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>
Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>
Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>
Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>
Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>
Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>
Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>
Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>
Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>
Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 20, 2026

📝 Walkthrough

Walkthrough

This PR extends COMET evaluation with precision control and distributed execution support. The COMET evaluator gains PyTorch 2.6+ compatibility, a precision-to-dtype mapping, and distributed-aware output writing. A new --precision CLI parameter allows dtype selection. The judge pipeline system is updated to pass precision and batch-size configuration through to the COMET evaluator during task orchestration.

Changes

COMET Evaluator Precision and Distributed Execution

Layer / File(s) Summary
PyTorch Compatibility, Precision Mapping, and Distributed Infrastructure
nemo_skills/evaluation/evaluator/comet.py
Monkeypatches torch.load to force weights_only=False for PyTorch 2.6+ compatibility. Introduces _PRECISION_TO_DTYPE mapping and global-rank-zero helper. Updates load_comet_model to accept optional dtype parameter, load checkpoint on CPU, and cast weights before device placement. Updates process_file to validate and load JSONL records directly from input_file.
COMET Inference with GPU Awareness and Distributed Output
nemo_skills/evaluation/evaluator/comet.py
Sets GPU count from torch.cuda.device_count() and passes it to comet_model.predict. Implements distributed-aware output: skips result writing on non-zero global ranks and writes JSONL with embedded comet scores only on rank 0.
CLI Precision Parameter and Main Integration
nemo_skills/evaluation/evaluator/comet.py
Adds --precision argument with values matching the dtype mapping. Updates main() to call load_comet_model with dtype derived from parsed --precision selection.
Judge Pipeline Configuration and COMET Evaluator Invocation
nemo_skills/pipeline/eval.py, nemo_skills/pipeline/judges/comet_judge.py
eval.py parses judge_pipeline_kwargs and merges into judge_pipeline_args before invoking custom judge_step_fn. comet_judge derives precision and batch_size from judge_pipeline_args and appends corresponding CLI flags. Command execution uses lock/ready-file wrapper for safe concurrent pip-install, then runs evaluator with assembled arguments. Slurm task sets num_tasks based on judge_server_gpus.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 71.43% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The title 'add multi-gpu COMET judge' accurately captures the main purpose of the pull request, which is enabling multi-GPU support for COMET model inference across all three modified files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch lgrigoryan/multi-device-comet

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
nemo_skills/evaluation/evaluator/comet.py (1)

124-125: 💤 Low value

Unused loop variable sample.

The loop unpacks sample but only uses idx to index into data. Use _ to indicate the variable is intentionally unused.

Suggested fix
-    for idx, sample in enumerate(data):
+    for idx, _ in enumerate(data):
         data[idx]["comet"] = comet_scores[idx]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@nemo_skills/evaluation/evaluator/comet.py` around lines 124 - 125, The
for-loop in comet.py currently unpacks an unused variable `sample` (for idx,
sample in enumerate(data))—replace `sample` with `_` so it reads `for idx, _ in
enumerate(data):` to signal the variable is intentionally unused; keep the rest
of the body that assigns `data[idx]["comet"] = comet_scores[idx]` unchanged and
ensure variable names `idx`, `data`, and `comet_scores` are left intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@nemo_skills/evaluation/evaluator/comet.py`:
- Around line 124-125: The for-loop in comet.py currently unpacks an unused
variable `sample` (for idx, sample in enumerate(data))—replace `sample` with `_`
so it reads `for idx, _ in enumerate(data):` to signal the variable is
intentionally unused; keep the rest of the body that assigns `data[idx]["comet"]
= comet_scores[idx]` unchanged and ensure variable names `idx`, `data`, and
`comet_scores` are left intact.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 11b5cbb3-ba72-41bb-8b71-adef82956d7c

📥 Commits

Reviewing files that changed from the base of the PR and between 467b057 and 5d03073.

📒 Files selected for processing (3)
  • nemo_skills/evaluation/evaluator/comet.py
  • nemo_skills/pipeline/eval.py
  • nemo_skills/pipeline/judges/comet_judge.py

@lilithgrigoryan lilithgrigoryan changed the title Lgrigoryan/multi device comet add multi-gpu COMET judge May 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant