Fix missing attention_mask in calibration dataloader

cjluo-nv · claude · cjluo-nv · commit c40a374c5a7b · 2026-04-14T12:45:46.000-07:00
When include_labels=False (the default for PTQ calibration),
get_dataset_dataloader was only returning input_ids and discarding
the attention_mask produced by the tokenizer. This caused HF models
to create a full causal mask, allowing padding tokens to participate
in attention during calibration and skewing quantization statistics.

Include attention_mask alongside input_ids so the model correctly
ignores padding tokens during calibration forward passes.

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
Signed-off-by: Chenjie Luo &lt;chenjiel@nvidia.com&gt;
diff --git a/modelopt/torch/utils/dataset_utils.py b/modelopt/torch/utils/dataset_utils.py
@@ -405,8 +405,15 @@ def get_dataset_dataloader(
         )
         tokenized_dataset = _CustomDataset(batch_encoded)
     else:
-        # For backward compatibility, if labels are not needed, we only return the input_ids.
-        tokenized_dataset = _CustomDataset({"input_ids": batch_encoded["input_ids"]})
+        # Always include attention_mask so the model correctly ignores padding tokens
+        # during calibration. Without it, HF models create a full causal mask and
+        # padding tokens participate in attention, skewing calibration statistics.
+        tokenized_dataset = _CustomDataset(
+            {
+                "input_ids": batch_encoded["input_ids"],
+                "attention_mask": batch_encoded["attention_mask"],
+            }
+        )
 
     calib_dataloader = DataLoader(tokenized_dataset, batch_size=batch_size, shuffle=False)