feat: add QAT pipline#265
Merged
Merged
Conversation
irisliu10
reviewed
Mar 19, 2026
yghstill
reviewed
Mar 19, 2026
| from torch.utils.data import Dataset, IterableDataset | ||
|
|
||
|
|
||
| class QATDataset(IterableDataset): |
| @@ -0,0 +1,350 @@ | |||
| # Copyright 2025 Tencent Inc. All Rights Reserved. | |||
|
|
||
|
|
||
| @dataclass | ||
| class TrainingConfig: |
Collaborator
There was a problem hiding this comment.
name: Union[str, List[str]]
QAT: Optional[QATTrainingConfig] = None
QATTrainingConfig里放QAT专用超参
其他共用
yghstill
approved these changes
Mar 23, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR introduces Quantization Aware Training (QAT) as a new compression method in AngelSlim. QAT enables models to learn quantization parameters during training by inserting fake-quantization operations into the forward pass, resulting in significantly better accuracy under low-bit quantization scenarios (e.g., W4A8, INT4) compared to post-training quantization (PTQ) alone.
Training Modes
End-to-End (end2end): Uses HuggingFace Seq2SeqTrainer with a custom AdamW optimizer targeting only quantization parameters (scale, zero_point).
Blockwise (blockwise): Trains each Transformer block independently using MSE loss between full-precision and quantized block outputs.
Model Conversion & Save
convert(): Replaces all QuantLinear modules with inference-ready QDQModule (from the existing PTQ module), extracting learned weight_scale and input_scale from trained quantizers.
save(): Supports two formats:
"fake": Saves raw state_dict via torch.save() — useful for checkpoint resumption.
"real": Delegates to the model-specific save function (e.g., vLLM/TRT-LLM compatible formats).
Key configuration sections:
training.plugin_config: Plugin toggles (enable_scale, enable_rotation), per-plugin quant_config overrides
training.hf_args (end2end): Full HuggingFace Seq2SeqTrainingArguments
training.block_wise_config (blockwise): epochs, batch_size, quant_lr, weight_lr, min_lr_factor
Documentation
A comprehensive user guide has been added at docs/source/features/quantization/qat.md.