Skip to content

docs: Add Tutorial proposal for RLHF using genrm_compare resources server.#1046

Draft
ffrujeri wants to merge 1 commit intomainfrom
ffrujeri/rlhf-genrm-tutorial
Draft

docs: Add Tutorial proposal for RLHF using genrm_compare resources server.#1046
ffrujeri wants to merge 1 commit intomainfrom
ffrujeri/rlhf-genrm-tutorial

Conversation

@ffrujeri
Copy link
Copy Markdown
Contributor

@ffrujeri ffrujeri commented Apr 9, 2026

What does this PR do?

Adds documentation and tooling for GRPO / RLHF-style training with NeMo Gym’s GenRM compare server, HelpSteer3 → JSONL conversion, a Nemotron 3 Super NeMo RL example recipe, tests, and doc/navigation updates.

To build the docs and visualize locally please do:

cd docs
make docs-live

Issues

Closes #354

Usage

Convert HelpSteer3 (preference subset) to NeMo Gym JSONL (from repo root, with Gym dev env / uv):

uv run python resources_servers/genrm_compare/scripts/helpsteer3_to_nemo_gym_jsonl.py \
  --output-dir data/helpsteer3_gym

Run unit tests for the conversion helpers:

uv run pytest tests/test_helpsteer3_to_nemo_gym_jsonl.py -q

NeMo RL: Use or compose from the example recipe (paths relative to the bundled Gym tree):

  • examples/nemo_rl/grpo_nemotron3_super_genrm_helpsteer3_pipeclean.yaml

Set data.train.data_path / data.validation.data_path to the converted JSONL files and configure the GenRM checkpoint in env.nemo_gym.genrm_model (or your launcher’s genrm_model_name).

Docs: Build or read the tutorial:

  • Source: docs/training-tutorials/nemo-rl-grpo/rlhf-genrm-helpsteer3.md
  • After make docs-html under docs/: training-tutorials/nemo-rl-grpo/rlhf-genrm-helpsteer3.html

Additional Information

  • Tutorial (rlhf-genrm-helpsteer3.md): HelpSteer3 formatting, wiring resources_servers/genrm_compare/configs/genrm_compare.yaml + genrm_model, batch / num_rollouts_per_prompt alignment, Arena-Hard v2 evaluation pointers, Nemotron 3 Nano vs Super note.
  • Example config (examples/nemo_rl/grpo_nemotron3_super_genrm_helpsteer3_pipeclean.yaml): Pipeclean-scale NeMo RL YAML aligned with large-cluster GRPO + GenRM judge layout; policy default HF id nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16.
  • Script (resources_servers/genrm_compare/scripts/helpsteer3_to_nemo_gym_jsonl.py): Truncates context to the last user turn (GenRM requirement), HTML-unescapes content, optional --max-samples for smoke tests.
  • Tests (tests/test_helpsteer3_to_nemo_gym_jsonl.py): Covers trim, unescape, JSON-string context, and full row shape.
  • Navigation: Training tutorial index + NeMo RL GRPO toctree card; docs/conf.py redirect tutorials/nemo-rl-grpo/rlhf-genrm-helpsteer3.html; model recipes “See also”; resources_servers/genrm_compare/README.md links to the tutorial and example YAML.

Signed-off-by: Felipe Vieira Frujeri <ffrujeri@nvidia.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 9, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@ffrujeri ffrujeri marked this pull request as draft April 9, 2026 21:44
@ffrujeri ffrujeri changed the title Add Tutorial proposal for RLHF using genrm_compare resources server. docs: Add Tutorial proposal for RLHF using genrm_compare resources server. Apr 9, 2026
@ffrujeri ffrujeri mentioned this pull request Apr 9, 2026
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Docs + Environment pattern: RLHF

1 participant