Hi NeMo AutoModel team,
Thanks a lot for this excellent repository. I think the project is very well designed, and the provided recipes are very useful.
I noticed that there are already many fine-tuning recipes in the repo, while the number of pretraining recipes seems relatively limited. My impression is that fine-tuning and pretraining recipes may share a lot of common structure, so I am wondering how to properly adapt a fine-tuning recipe into a pretraining recipe.
Could you please share some suggestions on what parts should typically be changed?
For instance, I assume the modifications may involve things like:
- training data format and pipeline
- objective / loss configuration
- optimizer and scheduler
- model initialization
- masking / packing strategy
I would really appreciate any guidance, recommendations, or pointers to relevant examples.
Thank you very much for your time and for maintaining this great repo.
Hi NeMo AutoModel team,
Thanks a lot for this excellent repository. I think the project is very well designed, and the provided recipes are very useful.
I noticed that there are already many fine-tuning recipes in the repo, while the number of pretraining recipes seems relatively limited. My impression is that fine-tuning and pretraining recipes may share a lot of common structure, so I am wondering how to properly adapt a fine-tuning recipe into a pretraining recipe.
Could you please share some suggestions on what parts should typically be changed?
For instance, I assume the modifications may involve things like:
I would really appreciate any guidance, recommendations, or pointers to relevant examples.
Thank you very much for your time and for maintaining this great repo.