docs: Add Tutorial proposal for RLHF using genrm_compare resources server. by ffrujeri · Pull Request #1046 · NVIDIA-NeMo/Gym

ffrujeri · 2026-04-09T21:44:28Z

What does this PR do?

Adds documentation and tooling for GRPO / RLHF-style training with NeMo Gym’s GenRM compare server, HelpSteer3 → JSONL conversion, a Nemotron 3 Super NeMo RL example recipe, tests, and doc/navigation updates.

To build the docs and visualize locally please do:

cd docs
make docs-live

Issues

Closes #354

Usage

Convert HelpSteer3 (preference subset) to NeMo Gym JSONL (from repo root, with Gym dev env / uv):

uv run python resources_servers/genrm_compare/scripts/helpsteer3_to_nemo_gym_jsonl.py \
  --output-dir data/helpsteer3_gym

Run unit tests for the conversion helpers:

uv run pytest tests/test_helpsteer3_to_nemo_gym_jsonl.py -q

NeMo RL: Use or compose from the example recipe (paths relative to the bundled Gym tree):

examples/nemo_rl/grpo_nemotron3_super_genrm_helpsteer3_pipeclean.yaml

Set data.train.data_path / data.validation.data_path to the converted JSONL files and configure the GenRM checkpoint in env.nemo_gym.genrm_model (or your launcher’s genrm_model_name).

Docs: Build or read the tutorial:

Source: docs/training-tutorials/nemo-rl-grpo/rlhf-genrm-helpsteer3.md
After make docs-html under docs/: training-tutorials/nemo-rl-grpo/rlhf-genrm-helpsteer3.html

Additional Information

Tutorial (rlhf-genrm-helpsteer3.md): HelpSteer3 formatting, wiring resources_servers/genrm_compare/configs/genrm_compare.yaml + genrm_model, batch / num_rollouts_per_prompt alignment, Arena-Hard v2 evaluation pointers, Nemotron 3 Nano vs Super note.
Example config (examples/nemo_rl/grpo_nemotron3_super_genrm_helpsteer3_pipeclean.yaml): Pipeclean-scale NeMo RL YAML aligned with large-cluster GRPO + GenRM judge layout; policy default HF id nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16.
Script (resources_servers/genrm_compare/scripts/helpsteer3_to_nemo_gym_jsonl.py): Truncates context to the last user turn (GenRM requirement), HTML-unescapes content, optional --max-samples for smoke tests.
Tests (tests/test_helpsteer3_to_nemo_gym_jsonl.py): Covers trim, unescape, JSON-string context, and full row shape.
Navigation: Training tutorial index + NeMo RL GRPO toctree card; docs/conf.py redirect tutorials/nemo-rl-grpo/rlhf-genrm-helpsteer3.html; model recipes “See also”; resources_servers/genrm_compare/README.md links to the tutorial and example YAML.

Signed-off-by: Felipe Vieira Frujeri <ffrujeri@nvidia.com>

copy-pr-bot · 2026-04-09T21:44:33Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Add Tutorial proposal for RLHF using genrm_compare resources server.

43d5bc6

Signed-off-by: Felipe Vieira Frujeri <ffrujeri@nvidia.com>

ffrujeri marked this pull request as draft April 9, 2026 21:44

ffrujeri changed the title ~~Add Tutorial proposal for RLHF using genrm_compare resources server.~~ docs: Add Tutorial proposal for RLHF using genrm_compare resources server. Apr 9, 2026

ffrujeri mentioned this pull request Apr 9, 2026

Docs + Environment pattern: RLHF #354

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: Add Tutorial proposal for RLHF using genrm_compare resources server.#1046

docs: Add Tutorial proposal for RLHF using genrm_compare resources server.#1046
ffrujeri wants to merge 1 commit intomainfrom
ffrujeri/rlhf-genrm-tutorial

ffrujeri commented Apr 9, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ffrujeri commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Issues

Usage

Additional Information

Uh oh!

copy-pr-bot Bot commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ffrujeri commented Apr 9, 2026 •

edited

Loading