Guidance on adapting a fine-tuning recipe into a pretraining recipe

Hi NeMo AutoModel team,

Thanks a lot for this excellent repository. I think the project is very well designed, and the provided recipes are very useful.

I noticed that there are already many fine-tuning recipes in the repo, while the number of pretraining recipes seems relatively limited. My impression is that fine-tuning and pretraining recipes may share a lot of common structure, so I am wondering how to properly adapt a fine-tuning recipe into a pretraining recipe.

Could you please share some suggestions on what parts should typically be changed?

For instance, I assume the modifications may involve things like:

* training data format and pipeline
* objective / loss configuration
* optimizer and scheduler
* model initialization
* masking / packing strategy

I would really appreciate any guidance, recommendations, or pointers to relevant examples.

Thank you very much for your time and for maintaining this great repo.




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Guidance on adapting a fine-tuning recipe into a pretraining recipe #1746

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Guidance on adapting a fine-tuning recipe into a pretraining recipe #1746

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions