Skip to content

Releases: foundation-model-stack/fms-model-optimizer

v0.3.0

10 Jun 16:01
7467f68

Choose a tag to compare

Highlights

  1. AIU support: new example added for model conversion for AIU (see examples/AIU_CONVERSION folder) and new add-ons for fms
  2. triton kernel for specialized matmul HW simulation and verification
  3. microscaling format support by integrating functionalities from microsoft mx package (see examples/MX for more details)
  4. other upgrades and improvements:
    • qmodel_prep tracing speed improvement, e.g., for Llama3-70B the time has been reduced from ~20min to ~2min now
    • Upgrade base dependencies to torch 2.5, python 3.12 and migrated from auto_gptq to gptqmodel

What's Changed

New Contributors

Read more

v0.2.0

13 Dec 17:50
e8bc88e

Choose a tag to compare

This is the first release of FMS Model Optimizer. It provides the core functionality:

  • Python API to enable model quantization: With the addition of a few lines of codes, module-level and/or function-level operations replacement will be performed.
  • Robust: Verified for INT 8/4-bit quantization on important vision/speech/NLP/object detection/LLMs.
  • Flexible: Options to analyze the network using PyTorch Dynamo, apply best practices, such as clip_val initialization, layer-level precision setting, optimizer param group setting, etc. during quantization.
  • State-of-the-art INT and FP quantization techniques for weights and activations, such as SmoothQuant, SAWB+ and PACT+.
  • Supports key compute-intensive operations like Conv2d, Linear, LSTM, MM and BMM

What's Changed

New Contributors

Full Changelog: https://github.com/foundation-model-stack/fms-model-optimizer/commits/v0.2.0