User problem
Dear experts,
We are in the process of SFT the nemotron-3-super model using megatron-bridge using 16 nodes of H100x8GPUs.
Their set up for SEQ_LENGTH is 262144 which is longer than the default configuration.
Can you please advice by using this amount of hardware (16 nodes of H100x8GPUs) and the setup SEQ_LENGTH=262144.
What is the proper parallelism suppose to be or what activity can we start with.
Thank you
Desired outcome
Please give us guideline to setup the configuration and parallelism.
Or ways to calculate the configuration.
Alternatives or workarounds considered
No response
Affected area
area:model
Urgency / use case
Blocking current work
Environment
16 nodes of 8x H100 80GB.
Training environment that we're using is nvcr.io/nvidia/nemo:26.02.nemotron_3_super
User problem
Dear experts,
We are in the process of SFT the nemotron-3-super model using megatron-bridge using 16 nodes of H100x8GPUs.
Their set up for SEQ_LENGTH is 262144 which is longer than the default configuration.
Can you please advice by using this amount of hardware (16 nodes of H100x8GPUs) and the setup SEQ_LENGTH=262144.
What is the proper parallelism suppose to be or what activity can we start with.
Thank you
Desired outcome
Please give us guideline to setup the configuration and parallelism.
Or ways to calculate the configuration.
Alternatives or workarounds considered
No response
Affected area
area:model
Urgency / use case
Blocking current work
Environment
16 nodes of 8x H100 80GB.
Training environment that we're using is nvcr.io/nvidia/nemo:26.02.nemotron_3_super