Skip to content

[support] SFT config recipe for different hardware (H100 instead of B200) to tune Nemotron-3-super #3463

@pthongpramoo

Description

@pthongpramoo

User problem

Dear experts,
We are in the process of SFT the nemotron-3-super model using megatron-bridge using 16 nodes of H100x8GPUs.
Their  set up for SEQ_LENGTH is 262144 which is longer than the default configuration.
Can you please advice by using this amount of hardware (16 nodes of H100x8GPUs) and the setup SEQ_LENGTH=262144.
What is the proper parallelism suppose to be or what activity can we start with.

Thank you

Desired outcome

Please give us guideline to setup the configuration and parallelism.
Or ways to calculate the configuration.

Alternatives or workarounds considered

No response

Affected area

area:model

Urgency / use case

Blocking current work

Environment

16 nodes of 8x H100 80GB.
Training environment that we're using is nvcr.io/nvidia/nemo:26.02.nemotron_3_super

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions