Fix missing attention_mask in calibration dataloader#1261
Open
Fix missing attention_mask in calibration dataloader#1261
Conversation
When include_labels=False (the default for PTQ calibration), get_dataset_dataloader was only returning input_ids and discarding the attention_mask produced by the tokenizer. This caused HF models to create a full causal mask, allowing padding tokens to participate in attention during calibration and skewing quantization statistics. Include attention_mask alongside input_ids so the model correctly ignores padding tokens during calibration forward passes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>
Contributor
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1261 +/- ##
==========================================
- Coverage 76.91% 76.90% -0.02%
==========================================
Files 350 350
Lines 40481 40859 +378
==========================================
+ Hits 31137 31423 +286
- Misses 9344 9436 +92
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
sugunav14
approved these changes
Apr 14, 2026
meenchen
approved these changes
Apr 14, 2026
ChenhanYu
approved these changes
Apr 14, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
include_labels=False(the default for PTQ calibration),get_dataset_dataloaderwas discarding theattention_maskproduced by the tokenizer and only returninginput_ids.attention_mask, HuggingFace models create a full causal mask, causing padding tokens to participate in attention during calibration and skewing quantization statistics.attention_maskalongsideinput_idsso the model correctly ignores padding tokens during calibration forward passes.Details
In
modelopt/torch/utils/dataset_utils.py, the tokenizer call at line 387 withpadding=Trueproduces bothinput_idsandattention_mask. Theinclude_labels=Truepath (line 406) already preserves the fullbatch_encodeddict includingattention_mask. However, theinclude_labels=Falsepath was only keepinginput_ids"for backward compatibility."During the calibration forward loop (
_forward_loop→_process_batch), the batch dict is unpacked as**kwargsintomodel.forward(). Withoutattention_mask, HF models default to attending to all positions including padding, which pollutes calibration statistics.Practical impact: With
batch_size=1there is no padding so the bug is invisible. With larger batch sizes and variable-length samples, shorter sequences get padded and the effect grows.Test plan
tests/unit/torch/utils/test_dataset_utils.py)🤖 Generated with Claude Code