System Info
Issues: The installation of the latest multi-backend-refactor branch failed in the AMD GPU. While switching to the Rocm-bitsandbytes repo, by using the rocm_enabled_multi_backend branch, the installation was successfully. Could you please check if the right branch was selected, thanks so much!
Official Repo:
https://github.com/bitsandbytes-foundation/bitsandbytes/tree/multi-backend-refactor
Test Environment:
AMD MI300X GPU
Docker image:
Rocm6.4 is the latest version for the AMD Rocm release.
docker pull rocm/pytorch:rocm6.4_ubuntu24.04_py3.12_pytorch_release_2.4.1
Reproduction
How to reproduce:
Step by Step:
Rocm6.4 is the latest version for the AMD Rocm release.
docker pull rocm/pytorch:rocm6.4_ubuntu24.04_py3.12_pytorch_release_2.4.1
docker run -it --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri -v /:/workspace --group-add video --ipc=host --name bitsandbytes01 rocm/pytorch:rocm6.4_ubuntu24.04_py3.12_pytorch_release_2.4.1
Inside the docker:
git clone -b multi-backend-refactor https://github.com/bitsandbytes-foundation/bitsandbytes.git && cd bitsandbytes/
Compile & install
apt-get install -y build-essential cmake
cmake -DCOMPUTE_BACKEND=hip -S .
make
pip install -e .
After installation done:
Successfully built bitsandbytes
Installing collected packages: bitsandbytes
Attempting uninstall: bitsandbytes
Found existing installation: bitsandbytes 1.0.0
Uninstalling bitsandbytes-1.0.0:
Successfully uninstalled bitsandbytes-1.0.0
Successfully installed bitsandbytes-1.0.0
Verify the installation once done.
python -m bitsandbytes
root@93db47d5b637:/var/lib/jenkins/bitsandbytes# python -m bitsandbytes
Could not load bitsandbytes native library: /var/lib/jenkins/bitsandbytes/bitsandbytes/libbitsandbytes_rocm64.so: undefined symbol: _Z36__device_stub__kOptimizer32bit1StateI12hip_bfloat16Li2EEvPT_S2_PfS3_ffffffiffbi. If you use Intel CPU or XPU, please pip install intel_extension_for_pytorch
Traceback (most recent call last):
File "/var/lib/jenkins/bitsandbytes/bitsandbytes/cextension.py", line 115, in
lib = get_native_library()
^^^^^^^^^^^^^^^^^^^^
File "/var/lib/jenkins/bitsandbytes/bitsandbytes/cextension.py", line 86, in get_native_library
dll = ct.cdll.LoadLibrary(str(binary_path))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/envs/py_3.12/lib/python3.12/ctypes/init.py", line 460, in LoadLibrary
return self._dlltype(name)
^^^^^^^^^^^^^^^^^^^
File "/opt/conda/envs/py_3.12/lib/python3.12/ctypes/init.py", line 379, in init
self._handle = _dlopen(self._name, mode)
^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: /var/lib/jenkins/bitsandbytes/bitsandbytes/libbitsandbytes_rocm64.so: undefined symbol: _Z36__device_stub__kOptimizer32bit1StateI12hip_bfloat16Li2EEvPT_S2_PfS3_ffffffiffbi
ROCm Setup failed despite ROCm being available. Please run the following command to get more information:
python -m bitsandbytes
Inspect the output of the command and see if you can locate ROCm libraries. You might need to add them
to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
and open an issue at: https://github.com/bitsandbytes-foundation/bitsandbytes/issues
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++ BUG REPORT INFORMATION ++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++ OTHER +++++++++++++++++++++++++++
ROCm specs: rocm_version_string='64', rocm_version_tuple=(6, 4)
PyTorch settings found: ROCM_VERSION=64
The directory listed in your path is found to be non-existent: /opt/ompi/lib
The directory listed in your path is found to be non-existent: /opt/ompi
The directory listed in your path is found to be non-existent: /opt/ucx
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++ DEBUG INFO END ++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Checking that the library is importable and ROCm is callable...
Couldn't load the bitsandbytes library, likely due to missing binaries.
Please ensure bitsandbytes is properly installed.
For source installations, compile the binaries with cmake -DCOMPUTE_BACKEND=hip -S ..
See the documentation for more details if needed.
Trying a simple check anyway, but this will likely fail...
Traceback (most recent call last):
File "/var/lib/jenkins/bitsandbytes/bitsandbytes/diagnostics/main.py", line 73, in main
sanity_check()
File "/var/lib/jenkins/bitsandbytes/bitsandbytes/diagnostics/main.py", line 42, in sanity_check
adam.step()
File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/optim/optimizer.py", line 484, in wrapper
out = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/var/lib/jenkins/bitsandbytes/bitsandbytes/optim/optimizer.py", line 292, in step
self.update_step(group, p, gindex, pindex)
File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/var/lib/jenkins/bitsandbytes/bitsandbytes/optim/optimizer.py", line 522, in update_step
F.optimizer_update_32bit(
File "/var/lib/jenkins/bitsandbytes/bitsandbytes/functional.py", line 1266, in optimizer_update_32bit
return backends[g.device.type].optimizer_update_32bit(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/var/lib/jenkins/bitsandbytes/bitsandbytes/backends/cuda.py", line 780, in optimizer_update_32bit
optim_func = str2optimizer32bit[optimizer_name][0]
^^^^^^^^^^^^^^^^^^
NameError: name 'str2optimizer32bit' is not defined
Above we output some debug information.
Please provide this info when creating an issue via https://github.com/TimDettmers/bitsandbytes/issues/new/choose
WARNING: Please be sure to sanitize sensitive info from the output before posting it.
Debugging details:
If we use the latest /ROCm/bitsandbytes to install, the installation was successful.
docker run -it --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri -v /:/workspace --group-add video --ipc=host --name bitsandbytes01 rocm/pytorch:rocm6.4_ubuntu24.04_py3.12_pytorch_release_2.4.1
Inside the docker:
git clone -b rocm_enabled_multi_backend https://github.com/ROCm/bitsandbytes.git
cd bitsandbytes
git checkout rocm_enabled_multi_backend
pip install -r requirements-dev.txt
cmake -DCOMPUTE_BACKEND=hip -S . #Use -DBNB_ROCM_ARCH="gfx90a;gfx942" to target specific gpu arch
make
pip install .
Verify the installation once done,
python -m bitsandbytes
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++ BUG REPORT INFORMATION ++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++ OTHER +++++++++++++++++++++++++++
ROCm specs: rocm_version_string='64', rocm_version_tuple=(6, 4)
PyTorch settings found: ROCM_VERSION=64
The directory listed in your path is found to be non-existent: /opt/ompi/lib
The directory listed in your path is found to be non-existent: /opt/ompi
The directory listed in your path is found to be non-existent: /opt/ucx
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++ DEBUG INFO END ++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Checking that the library is importable and ROCm is callable...
SUCCESS!
Installation was successful!
Expected behavior
Could you please check the branch difference, so that the multi-backend-refactor could be installed successfully in the latest Rocm6.4 environment, thanks so much !
System Info
Issues: The installation of the latest multi-backend-refactor branch failed in the AMD GPU. While switching to the Rocm-bitsandbytes repo, by using the rocm_enabled_multi_backend branch, the installation was successfully. Could you please check if the right branch was selected, thanks so much!
Official Repo:
https://github.com/bitsandbytes-foundation/bitsandbytes/tree/multi-backend-refactor
Test Environment:
AMD MI300X GPU
Docker image:
Rocm6.4 is the latest version for the AMD Rocm release.
docker pull rocm/pytorch:rocm6.4_ubuntu24.04_py3.12_pytorch_release_2.4.1
Reproduction
How to reproduce:
Step by Step:
Rocm6.4 is the latest version for the AMD Rocm release.
docker pull rocm/pytorch:rocm6.4_ubuntu24.04_py3.12_pytorch_release_2.4.1
docker run -it --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri -v /:/workspace --group-add video --ipc=host --name bitsandbytes01 rocm/pytorch:rocm6.4_ubuntu24.04_py3.12_pytorch_release_2.4.1
Inside the docker:
git clone -b multi-backend-refactor https://github.com/bitsandbytes-foundation/bitsandbytes.git && cd bitsandbytes/
Compile & install
apt-get install -y build-essential cmake
cmake -DCOMPUTE_BACKEND=hip -S .
make
pip install -e .
After installation done:
Successfully built bitsandbytes
Installing collected packages: bitsandbytes
Attempting uninstall: bitsandbytes
Found existing installation: bitsandbytes 1.0.0
Uninstalling bitsandbytes-1.0.0:
Successfully uninstalled bitsandbytes-1.0.0
Successfully installed bitsandbytes-1.0.0
Verify the installation once done.
python -m bitsandbytes
root@93db47d5b637:/var/lib/jenkins/bitsandbytes# python -m bitsandbytes
Could not load bitsandbytes native library: /var/lib/jenkins/bitsandbytes/bitsandbytes/libbitsandbytes_rocm64.so: undefined symbol: _Z36__device_stub__kOptimizer32bit1StateI12hip_bfloat16Li2EEvPT_S2_PfS3_ffffffiffbi. If you use Intel CPU or XPU, please pip install intel_extension_for_pytorch
Traceback (most recent call last):
File "/var/lib/jenkins/bitsandbytes/bitsandbytes/cextension.py", line 115, in
lib = get_native_library()
^^^^^^^^^^^^^^^^^^^^
File "/var/lib/jenkins/bitsandbytes/bitsandbytes/cextension.py", line 86, in get_native_library
dll = ct.cdll.LoadLibrary(str(binary_path))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/envs/py_3.12/lib/python3.12/ctypes/init.py", line 460, in LoadLibrary
return self._dlltype(name)
^^^^^^^^^^^^^^^^^^^
File "/opt/conda/envs/py_3.12/lib/python3.12/ctypes/init.py", line 379, in init
self._handle = _dlopen(self._name, mode)
^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: /var/lib/jenkins/bitsandbytes/bitsandbytes/libbitsandbytes_rocm64.so: undefined symbol: _Z36__device_stub__kOptimizer32bit1StateI12hip_bfloat16Li2EEvPT_S2_PfS3_ffffffiffbi
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++ BUG REPORT INFORMATION ++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++ OTHER +++++++++++++++++++++++++++
ROCm specs: rocm_version_string='64', rocm_version_tuple=(6, 4)
PyTorch settings found: ROCM_VERSION=64
The directory listed in your path is found to be non-existent: /opt/ompi/lib
The directory listed in your path is found to be non-existent: /opt/ompi
The directory listed in your path is found to be non-existent: /opt/ucx
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++ DEBUG INFO END ++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Checking that the library is importable and ROCm is callable...
Couldn't load the bitsandbytes library, likely due to missing binaries.
Please ensure bitsandbytes is properly installed.
For source installations, compile the binaries with
cmake -DCOMPUTE_BACKEND=hip -S ..See the documentation for more details if needed.
Trying a simple check anyway, but this will likely fail...
Traceback (most recent call last):
File "/var/lib/jenkins/bitsandbytes/bitsandbytes/diagnostics/main.py", line 73, in main
sanity_check()
File "/var/lib/jenkins/bitsandbytes/bitsandbytes/diagnostics/main.py", line 42, in sanity_check
adam.step()
File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/optim/optimizer.py", line 484, in wrapper
out = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/var/lib/jenkins/bitsandbytes/bitsandbytes/optim/optimizer.py", line 292, in step
self.update_step(group, p, gindex, pindex)
File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/var/lib/jenkins/bitsandbytes/bitsandbytes/optim/optimizer.py", line 522, in update_step
F.optimizer_update_32bit(
File "/var/lib/jenkins/bitsandbytes/bitsandbytes/functional.py", line 1266, in optimizer_update_32bit
return backends[g.device.type].optimizer_update_32bit(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/var/lib/jenkins/bitsandbytes/bitsandbytes/backends/cuda.py", line 780, in optimizer_update_32bit
optim_func = str2optimizer32bit[optimizer_name][0]
^^^^^^^^^^^^^^^^^^
NameError: name 'str2optimizer32bit' is not defined
Above we output some debug information.
Please provide this info when creating an issue via https://github.com/TimDettmers/bitsandbytes/issues/new/choose
WARNING: Please be sure to sanitize sensitive info from the output before posting it.
Debugging details:
If we use the latest /ROCm/bitsandbytes to install, the installation was successful.
docker run -it --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri -v /:/workspace --group-add video --ipc=host --name bitsandbytes01 rocm/pytorch:rocm6.4_ubuntu24.04_py3.12_pytorch_release_2.4.1
Inside the docker:
git clone -b rocm_enabled_multi_backend https://github.com/ROCm/bitsandbytes.git
cd bitsandbytes
git checkout rocm_enabled_multi_backend
pip install -r requirements-dev.txt
cmake -DCOMPUTE_BACKEND=hip -S . #Use -DBNB_ROCM_ARCH="gfx90a;gfx942" to target specific gpu arch
make
pip install .
Verify the installation once done,
python -m bitsandbytes
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++ BUG REPORT INFORMATION ++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++ OTHER +++++++++++++++++++++++++++
ROCm specs: rocm_version_string='64', rocm_version_tuple=(6, 4)
PyTorch settings found: ROCM_VERSION=64
The directory listed in your path is found to be non-existent: /opt/ompi/lib
The directory listed in your path is found to be non-existent: /opt/ompi
The directory listed in your path is found to be non-existent: /opt/ucx
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++ DEBUG INFO END ++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Checking that the library is importable and ROCm is callable...
SUCCESS!
Installation was successful!
Expected behavior
Could you please check the branch difference, so that the multi-backend-refactor could be installed successfully in the latest Rocm6.4 environment, thanks so much !