diff --git a/README.md b/README.md index 03f6980188..5bf765fcae 100644 --- a/README.md +++ b/README.md @@ -26,6 +26,7 @@ ______________________________________________________________________
2026 +- \[2026/04\] The LMDeploy project on PyPI has reached its storage quota, so pre-built wheels for new releases cannot be uploaded for the time being. You can download packages from the [GitHub Releases](https://github.com/InternLM/lmdeploy/releases) page or install from source instead. We will update this notice when wheel uploads to PyPI resume. Affected versions: >=0.12.2 - \[2026/02\] Support [Qwen3.5](https://huggingface.co/collections/Qwen/qwen35) - \[2026/02\] Support [vllm-project/llm-compressor](https://github.com/vllm-project/llm-compressor) 4bit symmetric/asymmetric quantization. Refer [here](./docs/en/quantization/llm_compressor.md) for detailed guide @@ -228,7 +229,7 @@ Since v0.3.0, the default prebuilt package is compiled on **CUDA 12**. Starting If you are using a GeForce RTX 50 series graphics card, please install the LMDeploy prebuilt package compiled with **CUDA 12.8** as follows: ```shell -export LMDEPLOY_VERSION=0.12.2 +export LMDEPLOY_VERSION=0.12.3 export PYTHON_VERSION=312 pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu128-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu128 ``` diff --git a/README_zh-CN.md b/README_zh-CN.md index c8eb2735cd..bfbbfb17c1 100644 --- a/README_zh-CN.md +++ b/README_zh-CN.md @@ -26,6 +26,7 @@ ______________________________________________________________________
2026 +- \[2026/04\] 由于 LMDeploy 在 PyPI 上的项目存储配额已满,新版本目前无法上传预编译安装包(wheels)。用户可以通过 [GitHub Releases](https://github.com/InternLM/lmdeploy/releases) 页面下载安装包,或者通过源码安装等方式使用最新版本;预编译包恢复上传后我们会另行通知。受影响版本:>=0.12.2 - \[2026/02\] 支持 [Qwen3.5](https://huggingface.co/collections/Qwen/qwen35) - \[2026/02\] 支持 [vllm-project/llm-compressor](https://github.com/vllm-project/llm-compressor) 4bit 对称和非对称量化。 具体操作指南详见[此处](./docs/zh_cn/quantization/llm_compressor.md) @@ -230,7 +231,7 @@ pip install lmdeploy 若使用 GeForce RTX 50 系列显卡,请按照如下方式安装基于 **CUDA 12.8** 编译的 LMDeploy 预编译包。 ```shell -export LMDEPLOY_VERSION=0.12.2 +export LMDEPLOY_VERSION=0.12.3 export PYTHON_VERSION=312 pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu128-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu128 ``` diff --git a/lmdeploy/version.py b/lmdeploy/version.py index 0fc4e9fb8a..79be99e310 100644 --- a/lmdeploy/version.py +++ b/lmdeploy/version.py @@ -1,6 +1,6 @@ # Copyright (c) OpenMMLab. All rights reserved. -__version__ = '0.12.2' +__version__ = '0.12.3' short_version = __version__