Skip to content

Guidance on adapting a fine-tuning recipe into a pretraining recipe #1746

@wswaq

Description

@wswaq

Hi NeMo AutoModel team,

Thanks a lot for this excellent repository. I think the project is very well designed, and the provided recipes are very useful.

I noticed that there are already many fine-tuning recipes in the repo, while the number of pretraining recipes seems relatively limited. My impression is that fine-tuning and pretraining recipes may share a lot of common structure, so I am wondering how to properly adapt a fine-tuning recipe into a pretraining recipe.

Could you please share some suggestions on what parts should typically be changed?

For instance, I assume the modifications may involve things like:

  • training data format and pipeline
  • objective / loss configuration
  • optimizer and scheduler
  • model initialization
  • masking / packing strategy

I would really appreciate any guidance, recommendations, or pointers to relevant examples.

Thank you very much for your time and for maintaining this great repo.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions