Skip to content

[ROCm] Enable bitsandbytes quantization support on ROCm#34688

Merged
vllm-bot merged 9 commits into
vllm-project:mainfrom
Abdennacer-Badaoui:bnb-support-in-rocm
Feb 21, 2026
Merged

[ROCm] Enable bitsandbytes quantization support on ROCm#34688
vllm-bot merged 9 commits into
vllm-project:mainfrom
Abdennacer-Badaoui:bnb-support-in-rocm

Conversation

@Abdennacer-Badaoui
Copy link
Copy Markdown
Contributor

@Abdennacer-Badaoui Abdennacer-Badaoui commented Feb 17, 2026

Description:

Summary

Test plan

  • pytest tests/models/test_transformers.py::test_quantization passes locally on MI325X (gfx942)

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Feb 17, 2026

Documentation preview: https://vllm--34688.org.readthedocs.build/en/34688/

@mergify mergify Bot added documentation Improvements or additions to documentation ci/build rocm Related to AMD ROCm labels Feb 17, 2026
@github-project-automation github-project-automation Bot moved this to Todo in AMD Feb 17, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enables bitsandbytes quantization support on ROCm by updating the bitsandbytes dependency to a version that supports it, removing test skips, and adjusting version checks in the code. The changes look good and are consistent with the goal of the PR. I've identified one area for improvement regarding code duplication in the version checking logic, which should be refactored to improve maintainability.

Comment on lines 446 to 460
min_version = "0.49.2" if current_platform.is_rocm() else "0.46.1"
try:
import bitsandbytes

if version.parse(bitsandbytes.__version__) < version.parse("0.46.1"):
if version.parse(bitsandbytes.__version__) < version.parse(min_version):
raise ImportError(
"bitsandbytes version is wrong. Please "
"install bitsandbytes>=0.46.1."
f"install bitsandbytes>={min_version}."
)
except ImportError as err:
raise ImportError(
"Please install bitsandbytes>=0.46.1 via "
"`pip install bitsandbytes>=0.46.1` to use "
f"Please install bitsandbytes>={min_version} via "
f"`pip install bitsandbytes>={min_version}` to use "
"bitsandbytes quantizer."
) from err
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This version check logic is duplicated from BitsAndBytesLinearMethod.__init__ (lines 186-200). To improve maintainability and avoid future inconsistencies, consider extracting this logic into a shared helper function at the module level.

For example:

def _check_bitsandbytes_version():
    min_version = "0.49.2" if current_platform.is_rocm() else "0.46.1"
    try:
        import bitsandbytes
        from packaging import version

        if version.parse(bitsandbytes.__version__) < version.parse(min_version):
            raise ImportError(
                "bitsandbytes version is wrong. Please "
                f"install bitsandbytes>={min_version}."
            )
    except ImportError as err:
        raise ImportError(
            f"Please install bitsandbytes>={min_version} via "
            f"`pip install bitsandbytes>={min_version}` to use "
            "bitsandbytes quantizer."
        ) from err

Then both __init__ methods can simply call _check_bitsandbytes_version().

@hmellor
Copy link
Copy Markdown
Member

hmellor commented Feb 17, 2026

I like Gemini's suggestion, could you extract the check to a method as described in its review comment?

@Abdennacer-Badaoui
Copy link
Copy Markdown
Contributor Author

Yes of course :)

@hmellor hmellor enabled auto-merge (squash) February 17, 2026 11:12
@github-actions github-actions Bot added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 17, 2026
@hmellor
Copy link
Copy Markdown
Member

hmellor commented Feb 17, 2026

cc @AndreasKaratzas can you confirm if any of the AMD failures are caused by this PR or if they already existed?

@Titus-von-Koeller
Copy link
Copy Markdown

Titus-von-Koeller commented Feb 19, 2026

Hey all, thanks to everyone!

Lgtm for me as well.

@AndreasKaratzas
Copy link
Copy Markdown
Member

cc @AndreasKaratzas can you confirm if any of the AMD failures are caused by this PR or if they already existed?

Sry for delay. Will look try and into it today.

@AndreasKaratzas
Copy link
Copy Markdown
Member

@Abdennacer-Badaoui Could you please rebase before I take a look into AMD CI failures?

@AndreasKaratzas
Copy link
Copy Markdown
Member

AndreasKaratzas commented Feb 19, 2026

Also, can we add a test that runs a bits and bytes model if bits and bytes package is found? This will immediately give us a feedback regarding correctness of bitsandbytes on ROCm.

Also, lets add the package requirement on rocm-test.txt as well in this PR.

EDIT: Oops missed the transformers test there with bitsandbytes. Do you think we need any other bitsandbytes correctness test?

auto-merge was automatically disabled February 20, 2026 10:25

Head branch was pushed to by a user without write access

Comment thread tests/models/test_transformers.py Outdated
Comment on lines +175 to +180
@pytest.mark.parametrize(
"model",
[
("unsloth/tinyllama-bnb-4bit"),
],
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add this as a parametrisation to the test_quantization test instead of creating a new one? Then we have a case for online quantisation and pre-quantised for bnb

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it would be cleaner. Thanks

@Abdennacer-Badaoui
Copy link
Copy Markdown
Contributor Author

Abdennacer-Badaoui commented Feb 20, 2026

@AndreasKaratzas
test_quantization (in the transformers tests) now covers both inflight quantization and pre-quantized 4-bit checkpoints for bitsandbytes, comparing auto vs transformers backends for logprob consistency. I think this gives us a good coverage for now.

Comment thread vllm/model_executor/layers/quantization/bitsandbytes.py Outdated
@AndreasKaratzas
Copy link
Copy Markdown
Member

This appears to have been done already

@hmellor Version is there, but not pinned.

Signed-off-by: badaoui <abdennacerbadaoui0@gmail.com>
Signed-off-by: badaoui <abdennacerbadaoui0@gmail.com>
Signed-off-by: badaoui <abdennacerbadaoui0@gmail.com>
Signed-off-by: badaoui <abdennacerbadaoui0@gmail.com>
Signed-off-by: badaoui <abdennacerbadaoui0@gmail.com>
Signed-off-by: badaoui <abdennacerbadaoui0@gmail.com>
Signed-off-by: badaoui <abdennacerbadaoui0@gmail.com>
Signed-off-by: badaoui <abdennacerbadaoui0@gmail.com>
Signed-off-by: badaoui <abdennacerbadaoui0@gmail.com>
@hmellor
Copy link
Copy Markdown
Member

hmellor commented Feb 21, 2026

This appears to have been done already

@hmellor Version is there, but not pinned.

Oh my mistake, I misunderstood what you meant

@vllm-bot vllm-bot merged commit 8dc8a99 into vllm-project:main Feb 21, 2026
109 of 114 checks passed
@github-project-automation github-project-automation Bot moved this from Todo to Done in AMD Feb 21, 2026
@dosubot
Copy link
Copy Markdown

dosubot Bot commented Feb 21, 2026

Related Documentation

Checked 0 published document(s) in 1 knowledge base(s). No updates required.

How did I do? Any feedback?  Join Discord

yugong333 pushed a commit to yugong333/vllm that referenced this pull request Feb 22, 2026
…#34688)

Signed-off-by: badaoui <abdennacerbadaoui0@gmail.com>
jmamou pushed a commit to jmamou/vllm that referenced this pull request Feb 23, 2026
…#34688)

Signed-off-by: badaoui <abdennacerbadaoui0@gmail.com>
Copilot AI pushed a commit to machov/vllm that referenced this pull request Mar 10, 2026
…#34688)

Signed-off-by: badaoui <abdennacerbadaoui0@gmail.com>
jiangkuaixue123 pushed a commit to jiangkuaixue123/vllm that referenced this pull request Apr 28, 2026
…#34688)

Signed-off-by: badaoui <abdennacerbadaoui0@gmail.com>
mystous pushed a commit to mystous/vllm_hybrid that referenced this pull request May 10, 2026
…#34688)

Signed-off-by: badaoui <abdennacerbadaoui0@gmail.com>
my-other-github-account pushed a commit to my-other-github-account/vllm that referenced this pull request May 15, 2026
…#34688)

Signed-off-by: badaoui <abdennacerbadaoui0@gmail.com>
my-other-github-account pushed a commit to my-other-github-account/vllm that referenced this pull request May 15, 2026
…#34688)

Signed-off-by: badaoui <abdennacerbadaoui0@gmail.com>
0826joyce pushed a commit to 0826joyce/vllm-serving-optimization that referenced this pull request May 19, 2026
…#34688)

Signed-off-by: badaoui <abdennacerbadaoui0@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build documentation Improvements or additions to documentation ready ONLY add when PR is ready to merge/full CI is needed rocm Related to AMD ROCm

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants