Commit 3e8fc7b
authored
Quant in checkpoint dtype (#18781)
Switches order in etLLM so we quantize in checkpoint dtype and then cast
to dtype-override. This can prevent underflowing on scales.
Also exposes ability to turn HQQ on/off.
Export:
```
python -m extension.llm.export.export_llm \
base.model_class=phi_4_mini \
base.params=examples/models/phi_4_mini/config/config.json \ model.use_kv_cache=true \
model.use_sdpa_with_kv_cache=true \
model.dtype_override=fp32 \ export.output_dir=/tmp/phi_4_mini_no_hqq \ export.output_name=model.pte \ export.max_seq_length=2048 \ export.max_context_length=2048 \ quantization.qmode=8da4w \ quantization.group_size=32 "quantization.embedding_quantize='8,0'" quantization.use_hqq=False \
backend.xnnpack.enabled=true \
backend.xnnpack.extended_ops=true
```
Phi4 output:
```
<|im_start|>system
You are a highly capable, helpful, and honest AI assistant designed to provide clear, accurate, and thoughtful responses to a wide range of questions. Your primary goal is to assist users by offering information, explanations, and guidance in a manner that is respectful, unbiased, and safe. Always strive to be as helpful as possible, but never provide content that is harmful, unethical, offensive, or illegal. If a question is unclear, nonsensical, or based on incorrect premises, politely explain the issue rather than attempting to answer inaccurately. If you do not know the answer to a question, it is better to admit uncertainty than to provide false or misleading information. When appropriate, include examples, analogies, or step-by-step reasoning to enhance understanding. Your responses should be positive, inclusive, and supportive, fostering a constructive and informative interaction.<|im_end|>
<|im_start|>user
Please answer the following question in detail and provide relevant context, examples, and explanations where possible: What are some of the most important considerations when designing a machine learning system for real-world applications? Discuss potential challenges, best practices, and how to ensure ethical and responsible use.<|im_end|>
<|im_start|>assistant
Designing a machine learning system for real-world applications involves various considerations to ensure the system is effective, fair, and secure. Some of the most important considerations include data quality and sourcing, model choice and design, evaluation and validation, interpretability and transparency, and ensuring fairness and avoiding biases.
Data quality and sourcing involve ensuring data is of high quality, representative of the target application, and properly curated and preprocessed to remove noise and biases.
Model choice and design involve selecting an appropriate model for the application, understanding the strengths and limitations of different models, and understanding the application domain and data.
Model evaluation and validation involve properly training and tuning the model on a training set and properly validating and testing the model on a separate validation set to avoid data leakage and
```
Related work: improvement in torchao's HQQ algorithm that helps with
Phi4's model distribution: pytorch/ao#42591 parent 74403e2 commit 3e8fc7b
File tree
3 files changed
+18
-9
lines changed- examples/models/llama
- source_transformation
- extension/llm/export/config
3 files changed
+18
-9
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
743 | 743 | | |
744 | 744 | | |
745 | 745 | | |
746 | | - | |
747 | | - | |
748 | | - | |
749 | | - | |
| 746 | + | |
| 747 | + | |
| 748 | + | |
750 | 749 | | |
751 | 750 | | |
752 | 751 | | |
| |||
791 | 790 | | |
792 | 791 | | |
793 | 792 | | |
| 793 | + | |
794 | 794 | | |
795 | 795 | | |
796 | 796 | | |
| 797 | + | |
| 798 | + | |
| 799 | + | |
| 800 | + | |
797 | 801 | | |
798 | 802 | | |
799 | 803 | | |
| |||
1736 | 1740 | | |
1737 | 1741 | | |
1738 | 1742 | | |
1739 | | - | |
1740 | | - | |
| 1743 | + | |
1741 | 1744 | | |
1742 | 1745 | | |
1743 | 1746 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
755 | 755 | | |
756 | 756 | | |
757 | 757 | | |
| 758 | + | |
| 759 | + | |
| 760 | + | |
| 761 | + | |
| 762 | + | |
| 763 | + | |
| 764 | + | |
| 765 | + | |
758 | 766 | | |
759 | 767 | | |
760 | 768 | | |
761 | 769 | | |
762 | 770 | | |
763 | 771 | | |
764 | 772 | | |
765 | | - | |
766 | 773 | | |
767 | 774 | | |
768 | 775 | | |
| |||
817 | 824 | | |
818 | 825 | | |
819 | 826 | | |
820 | | - | |
821 | 827 | | |
822 | 828 | | |
823 | 829 | | |
824 | 830 | | |
825 | 831 | | |
826 | | - | |
827 | 832 | | |
828 | 833 | | |
829 | 834 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
429 | 429 | | |
430 | 430 | | |
431 | 431 | | |
| 432 | + | |
432 | 433 | | |
433 | 434 | | |
434 | 435 | | |
| |||
0 commit comments