Skip to content

feat: Adopt resumption feature of online data mixing#617

Merged
dushyantbehl merged 9 commits into
foundation-model-stack:mainfrom
kmehant:odm-plugin-resume-3
Oct 9, 2025
Merged

feat: Adopt resumption feature of online data mixing#617
dushyantbehl merged 9 commits into
foundation-model-stack:mainfrom
kmehant:odm-plugin-resume-3

Conversation

@kmehant
Copy link
Copy Markdown
Collaborator

@kmehant kmehant commented Oct 8, 2025

Changes

In this PR, we move construction of the resume_from_checkpoint to the top of the function so that it can used in odm_config. Further depends on merge of foundation-model-stack/fms-acceleration#155

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Oct 8, 2025

Thanks for making a pull request! 😃
One of the maintainers will review and advise on the next steps.

@github-actions github-actions Bot added the feat label Oct 8, 2025
Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>
Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>
Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>
Comment thread tuning/sft_trainer.py

resume_from_checkpoint = None
if train_args.output_dir:
os.makedirs(train_args.output_dir, exist_ok=True)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to make the output directory doesn't the trainer make it already?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_last_checkpoint has a dependency of having output_dir available, it was just the code that already exists I move it above in this function :)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>
@kmehant kmehant force-pushed the odm-plugin-resume-3 branch from 7df2365 to 82495d3 Compare October 9, 2025 06:42
Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>
Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>
Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>
Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>
@dushyantbehl dushyantbehl merged commit 452c13b into foundation-model-stack:main Oct 9, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants