**Use cases, pain points, and background** **Description**: **Design**: We probably need to make some generic reward model client that can be shared infra for all RLHF environments. **Out of scope**: **Acceptance Criteria**: - [ ] Gym spins up a reward model locally like in the local vLLM model flow - [ ] Replicate the current Nemotron RLHF process
Use cases, pain points, and background
Description:
Design:
We probably need to make some generic reward model client that can be shared infra for all RLHF environments.
Out of scope:
Acceptance Criteria: