Skip to content

Docs + Environment pattern: RLHF #354

@bxyu-nvidia

Description

@bxyu-nvidia

Use cases, pain points, and background

Description:

Design:
We probably need to make some generic reward model client that can be shared infra for all RLHF environments.

Out of scope:

Acceptance Criteria:

  • Gym spins up a reward model locally like in the local vLLM model flow
  • Replicate the current Nemotron RLHF process

Metadata

Metadata

Assignees

Labels

core-infraHelpful infrastructure
No fields configured for Feature.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions