Commit c40a374
Fix missing attention_mask in calibration dataloader
When include_labels=False (the default for PTQ calibration),
get_dataset_dataloader was only returning input_ids and discarding
the attention_mask produced by the tokenizer. This caused HF models
to create a full causal mask, allowing padding tokens to participate
in attention during calibration and skewing quantization statistics.
Include attention_mask alongside input_ids so the model correctly
ignores padding tokens during calibration forward passes.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>1 parent 5523505 commit c40a374
1 file changed
Lines changed: 9 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
405 | 405 | | |
406 | 406 | | |
407 | 407 | | |
408 | | - | |
409 | | - | |
| 408 | + | |
| 409 | + | |
| 410 | + | |
| 411 | + | |
| 412 | + | |
| 413 | + | |
| 414 | + | |
| 415 | + | |
| 416 | + | |
410 | 417 | | |
411 | 418 | | |
412 | 419 | | |
| |||
0 commit comments