Releases: foundation-model-stack/fms-model-optimizer
Releases · foundation-model-stack/fms-model-optimizer
v0.3.0
Highlights
- AIU support: new example added for model conversion for AIU (see
examples/AIU_CONVERSIONfolder) and new add-ons forfms - triton kernel for specialized matmul HW simulation and verification
- microscaling format support by integrating functionalities from microsoft
mxpackage (seeexamples/MXfor more details) - other upgrades and improvements:
qmodel_preptracing speed improvement, e.g., for Llama3-70B the time has been reduced from ~20min to ~2min now- Upgrade base dependencies to
torch 2.5,python 3.12and migrated fromauto_gptqtogptqmodel
What's Changed
- Add spell checker to alleviate spelling errors by @hickeyma in #32
- chore: Remove Makefile by @hickeyma in #37
- chore: Add GitHub badges to project README by @hickeyma in #38
- ci: Replace coverage with pytest-cov plugin for code coverage by @hickeyma in #39
- fix: Update the quantization notebook tutorial by @hickeyma in #41
- Small updates to the docs by @hickeyma in #40
- fix: Error in quantization notebook tutorial when retrieving image by @hickeyma in #42
- OptArguments by @tharapalanivel in #43
- Add logging and tests for run_quant.py by @tharapalanivel in #44
- Aiu addons by @andrea-fasoli in #46
- fix Qbmm tracing issue by @chichun-charlie-liu in #47
- Add build backend by @tharapalanivel in #50
- Add mypy static checker tool by @hickeyma in #49
- [utils] check if folder exists before attempting to create directory by @kcirred in #52
- fms_mo docker image by @tharapalanivel in #48
- Lint for fx by @tharapalanivel in #54
- Update accelerate requirement from !=0.34,<1.1,>=0.20.3 to >=0.20.3,!=0.34,<1.4 by @dependabot in #56
- Update transformers requirement from <4.48,>=4.45 to >=4.45,<4.49 by @dependabot in #55
- Add FP/INT triton kernels and unit tests, also update QAT example by @chichun-charlie-liu in #58
- ci: Add workflow for PR labels by @tharapalanivel in #57
- fix: Fix labelpr workflow by @tharapalanivel in #63
- feat: added granite support; fixed adapters to ignore model_config by @JRosenkranz in #53
- fix: Triton kernel bug fix by @chichun-charlie-liu in #61
- feat: Support for int8 smoothquant by @andrea-fasoli in #65
- test: Unit test int8 by @andrea-fasoli in #62
- fix: bug fix and minor changes on triton kernel: by @chichun-charlie-liu in #69
- fix: handle linear_type callable at int8 linear instantiation by @andrea-fasoli in #68
- fix: multiple bug fixes: by @chichun-charlie-liu in #70
- feat: improve transformers tracing for last layers by @chichun-charlie-liu in #72
- fix: in DQ example, when nbits_kvcache=8, context manager will detect incorrect frame by @chichun-charlie-liu in #74
- fix: Fix build and check packages flow by @tharapalanivel in #79
- fix: make triton optional for systems without GPUs by @chichun-charlie-liu in #78
- fix: a bug that prevented dynamo from working with PT 2.5.1 has been fixed by @chichun-charlie-liu in #81
- test: int8 unit tests for aiu add-ons by @iqbal-saraf in #77
- feat: confirmed py3.12 with pt2.5.1 by @chichun-charlie-liu in #83
- fix: finish missed items for upgrading to python 3.12 by @chichun-charlie-liu in #84
- fix: minor fix from last PR regarding py3.12 upgrades by @chichun-charlie-liu in #85
- feat: Update accelerate requirement from !=0.34,<1.4,>=0.20.3 to >=0.20.3,!=0.34,<1.7 by @dependabot in #86
- feat: Update transformers requirement from <4.49,>=4.45 to >=4.45,<4.51 by @dependabot in #80
- feat: triton matmul kernel adjusted, now is closer to HW behavior by @chichun-charlie-liu in #82
- fix: fix QBmm detection and default behavior by @chichun-charlie-liu in #87
- feat: expand detection of data types in model size estimation by @andrea-fasoli in #88
- fix: Fix push to pypi flow by @tharapalanivel in #90
- feat: int8 granite addon by @andrea-fasoli in #92
- feat: INT8 LLM TP>1 enablement by @andrea-fasoli in #94
- dependencies: Update transformers requirement from <4.51,>=4.45 to >=4.45,<4.52 by @dependabot in #91
- dependencies: Update triton requirement from <3.2,>=3.0 to >=3.0,<3.4 by @dependabot in #93
- feat: Update syntax of custom torch ops by @andrea-fasoli in #96
- feat: add granite architecture support for DQ with smoothquant by @andrea-fasoli in #101
- feat: trimming config save by @BrandonGroth in #103
- feat: Add int8 sd conversion function for aiu by @andrea-fasoli in #95
- fix: Config save cleanup by @BrandonGroth in #113
- feat: add verbosity to smoothquant during conversion for AIU by @andrea-fasoli in #115
- feat: Conversion example by @andrea-fasoli in #118
- feat: adjust int8 triton to enable msb/lsb truncation by @chichun-charlie-liu in #120
- feat: mx integration by @chichun-charlie-liu in #110
- feat: GPTQModel Migration by @tharapalanivel in #102
- fix: disable granite in custom gptq as gptqmodel already supports it, fix … by @chichun-charlie-liu in #130
- test: Add tests for save_for_aiu functionality w/ tiny models by @BrandonGroth in #126
- fix: Update GPTQ example README.md for typo by @chichun-charlie-liu in #132
- docs: Fix README typo by @tharapalanivel in #135
- build: Update test/verification section of PR template by @tharapalanivel in #136
- fix: Fix versioning by @tharapalanivel in #137
New Contributors
- @chichun-charlie-liu made their first contribution in #47
- @kcirred made their first contribution in #52
- @JRosenkranz made their first contribution in #53
- @iqbal-saraf made their first contribution in #77
- @BrandonGroth made their first contribu...
v0.2.0
This is the first release of FMS Model Optimizer. It provides the core functionality:
- Python API to enable model quantization: With the addition of a few lines of codes, module-level and/or function-level operations replacement will be performed.
- Robust: Verified for INT 8/4-bit quantization on important vision/speech/NLP/object detection/LLMs.
- Flexible: Options to analyze the network using PyTorch Dynamo, apply best practices, such as clip_val initialization, layer-level precision setting, optimizer param group setting, etc. during quantization.
- State-of-the-art INT and FP quantization techniques for weights and activations, such as SmoothQuant, SAWB+ and PACT+.
- Supports key compute-intensive operations like Conv2d, Linear, LSTM, MM and BMM
What's Changed
- Initial setup by @tharapalanivel in #1
- Initial commit for optimization techniques by @tharapalanivel in #9
- Add dynamic build versioning by @hickeyma in #12
- [ci]: Restructure GitHub workflows by @hickeyma in #13
- Clear notebook output by @tharapalanivel in #15
- Improve README readability by @tharapalanivel in #19
- Change project name to correspond to pypi package name by @hickeyma in #18
- Set smoothq_alpha as buffer by @andrea-fasoli in #20
- Fix device for smoothquant activation scales by @andrea-fasoli in #21
- test: Add checks for unit tests that require Nvidia GPU by @hickeyma in #14
- tox: Add base Python version to tox environment by @hickeyma in #24
- Fix symmetric behavior (issue #22) by @andrea-fasoli in #26
- ci: Add Ruff for lint and code formatting by @hickeyma in #30
- Update pre-commit requirement from <4.0,>=3.0.4 to >=3.0.4,<5.0 by @dependabot in #16
- doc: Update dev env section of the contributing guide by @hickeyma in #29
New Contributors
- @tharapalanivel made their first contribution in #1
- @hickeyma made their first contribution in #12
- @andrea-fasoli made their first contribution in #20
- @dependabot made their first contribution in #16
Full Changelog: https://github.com/foundation-model-stack/fms-model-optimizer/commits/v0.2.0