Skip to content

fix: stop default collater from adding padding_mask#1652

Open
zeel2104 wants to merge 2 commits intoNVIDIA-NeMo:mainfrom
zeel2104:zeel2104/fix-padding-mask-collator-clean
Open

fix: stop default collater from adding padding_mask#1652
zeel2104 wants to merge 2 commits intoNVIDIA-NeMo:mainfrom
zeel2104:zeel2104/fix-padding-mask-collator-clean

Conversation

@zeel2104
Copy link
Copy Markdown
Contributor

@zeel2104 zeel2104 commented Apr 2, 2026

What does this PR do ?

Stop default_collater from creating a padding_mask for generic padded batches, so models that do not accept that argument no longer need downstream filtering and do not crash.

Changelog

  • Remove synthesized padding_mask creation from nemo_automodel.components.datasets.utils.default_collater
  • Update the default collator unit test to validate the new batch contract
  • Keep the fix scoped to the generic collator path without changing THD/context-parallel handling

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

Signed-off-by: Zeel <desaizeel2128@gmail.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Apr 2, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Copy link
Copy Markdown
Contributor

@hemildesai hemildesai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks a lot for your contribution.

We need padding mask in the batch for correctly calculating aux_loss for MoEs. Rn, padding mask needs to be passed explicitly to handle some corner cases. As a result, I think an appropriate fix would be to add padding_mask to the batch if the model supports it, skip it otherwise.

Signed-off-by: Zeel <desaizeel2128@gmail.com>
@zeel2104
Copy link
Copy Markdown
Contributor Author

zeel2104 commented Apr 2, 2026

@hemildesai
Thanks for the clarification. I’ve updated the fix accordingly.

default_collater still produces padding_mask, but the dataloader path now only keeps/passes it when the target model supports padding_mask in forward(). For models that do not support it, the batch drops padding_mask before the forward path.

This keeps the MoE aux-loss path intact while avoiding crashes for models that cannot consume the argument.

I also added targeted coverage for:

  • detecting whether a model supports padding_mask
  • keeping padding_mask for supporting models
  • stripping it for unsupported models
    Let me know if anything more needs to be done

@chtruong814 chtruong814 added the needs-follow-up Issue needs follow-up label Apr 4, 2026
@akoumpa
Copy link
Copy Markdown
Contributor

akoumpa commented Apr 6, 2026

@adil-a + @hemildesai can you provide guidance? Thank you.

@chtruong814 chtruong814 removed the needs-follow-up Issue needs follow-up label Apr 6, 2026
@chtruong814 chtruong814 added waiting-for-customer Waiting for response from the original author needs-follow-up Issue needs follow-up and removed waiting-for-customer Waiting for response from the original author labels Apr 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants