Skip to content
This repository was archived by the owner on Nov 19, 2025. It is now read-only.

feat: minor adjustments to the reward model training#540

Merged
ko3n1g merged 1 commit intomainfrom
jveronvialar/rm-minor-adjustments
Apr 28, 2025
Merged

feat: minor adjustments to the reward model training#540
ko3n1g merged 1 commit intomainfrom
jveronvialar/rm-minor-adjustments

Conversation

@jveronvialard
Copy link
Copy Markdown
Collaborator

What does this PR do ?

This PR makes minor adjustments to the reward model training related to the usage of validation_drop_last.

Changelog

  • Please update the CHANGELOG.md under next version with high level changes in this PR.

Before your PR is "Ready for review"

Pre checks:

Checklist when contributing a new algorithm

  • Does the trainer resume and restore model state all states?
  • Does the trainer support all parallelism techniques(PP, TP, DP)?
  • Does the trainer support max_steps=-1 and validation?
  • Does the trainer only call APIs defined in alignable_interface.py?
  • Does the trainer have proper logging?

Additional Information

  • Related to # (issue)

Signed-off-by: Julien Veron Vialard <jveronvialar@nvidia.com>
@jveronvialard jveronvialard added the Run CICD Set + un-set to retrigger (add after r*.*.* labels) label Apr 23, 2025
@jveronvialard jveronvialard changed the title minor adjustments for reward model training feat: minor adjustments to the reward model training Apr 23, 2025
@jveronvialard jveronvialard added Run CICD Set + un-set to retrigger (add after r*.*.* labels) and removed Run CICD Set + un-set to retrigger (add after r*.*.* labels) labels Apr 23, 2025
Copy link
Copy Markdown
Collaborator

@odelalleau odelalleau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks!

@jveronvialard jveronvialard added Run CICD Set + un-set to retrigger (add after r*.*.* labels) and removed Run CICD Set + un-set to retrigger (add after r*.*.* labels) labels Apr 24, 2025
@jveronvialard jveronvialard added Run CICD Set + un-set to retrigger (add after r*.*.* labels) and removed Run CICD Set + un-set to retrigger (add after r*.*.* labels) labels Apr 25, 2025
@ko3n1g ko3n1g merged commit 2fbaed1 into main Apr 28, 2025
44 of 52 checks passed
@ko3n1g ko3n1g deleted the jveronvialar/rm-minor-adjustments branch April 28, 2025 16:49
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Algorithms Run CICD Set + un-set to retrigger (add after r*.*.* labels)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants