Skip to content

Commit 4ab07b5

Browse files
authored
release v4.1 (ModelCloud#1757)
* release v4.1 * Update version.py
1 parent da42682 commit 4ab07b5

2 files changed

Lines changed: 2 additions & 5 deletions

File tree

README.md

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -17,10 +17,7 @@
1717
</p>
1818

1919
## Latest News
20-
* 09/03/2025 4.1.0-dev `main`: ✨ Meituan LongCat Flash Chat model support.
21-
* 09/02/2025 4.1.0-dev `main`: ✨ Llama 4 (BF16 converted) model support.
22-
* 09/01/2025 4.1.0-dev `main`: ✨ GPT-OSS (BF16 converted) model support.
23-
* 08/25/2025 4.1.0-dev `main`: ✨ GLM-4.5-Air model support. New experiemental `mock_quantization` config to skip complex computational code paths during quantization to accelerate model quant testing.
20+
* 09/04/2025 [4.1.0](https://github.com/ModelCloud/GPTQModel/releases/tag/v4.1.0): ✨ Meituan LongCat Flash Chat, Llama 4, GPT-OSS (BF16), and GLLM-4.5-Air support. New experiemental `mock_quantization` config to skip complex computational code paths during quantization to accelerate model quant testing.
2421
* 08/21/2025 [4.0.0](https://github.com/ModelCloud/GPTQModel/releases/tag/v4.0.0): 🎉 New Group Aware Reordering (GAR) support. New models support: Bytedance Seed-OSS, Baidu Ernie, Huawei PanGu, Gemma3, Xiaomi Mimo, Qwen 3/MoE, Falcon H1, GPT-Neo. Memory leak and multiple model compatibility fixes related to Transformers >= 4.54. Python >= 3.13t free-threading support added with near N x GPU linear scaling for quantization of MoE models and also linear N x Cpu Core scaling of packing stage. Early access Pytorch 2.8 fused-ops on Intel XPU for up to 50% speedup.
2522
* 08/19/2025 4.0.0-dev `main`: Fix quantization memory usage due to some model's incorrect application of `config.use_cache` during inference. Fixed `Transformers` >= 4.54.0 compat which changed layer forward return signature for some models.
2623
* 08/18/2025 4.0.0-dev `main`: GPT-Neo model support. Memory leak fix in error capture (stacktrace) and fixed `lm_head` quantization compatibility for many models.

gptqmodel/version.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,4 +14,4 @@
1414
# See the License for the specific language governing permissions and
1515
# limitations under the License.
1616

17-
__version__ = "4.1.0-dev"
17+
__version__ = "4.1.0"

0 commit comments

Comments
 (0)