Skip to content

Commit 252e6ff

Browse files
TimDettmersclaude
andcommitted
docs: Update issue patterns from 43 recently closed issues
Add 11 new pattern sections and expand existing ones based on a review of all 43 issues closed Feb 12-14. New patterns: TensorFlow/non-PyTorch frameworks, DeepSpeed ZeRO-3 incompatibility, CPU optimizer requests, ROCm build issues, Colab runtime restart, CMake+CUDA 13 arch mismatch, EOL glibc platforms, prepare_model_for_kbit_training memory, insufficient info triage, quantized output quality (NaN), and 4-bit weight loading. Updated existing patterns for third-party apps, FSDP, questions, and unrelated errors with new examples and cross-references. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 88c6c71 commit 252e6ff

File tree

1 file changed

+116
-6
lines changed

1 file changed

+116
-6
lines changed

agents/issue_patterns.md

Lines changed: 116 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,7 @@ These are the single largest category of issues. Most are environment problems o
6969

7070
### Third-party application issues
7171

72-
**How to identify:** User is running Automatic1111, Forge UI, ComfyUI, kohya_ss, or similar Stable Diffusion tools. The error occurs inside bitsandbytes but is caused by the app pinning old bnb versions or misconfiguring the environment. Minimal or no diagnostic info. Often no bnb version specified.
72+
**How to identify:** User is running Automatic1111, Forge UI, ComfyUI, kohya_ss/kohya-trainer, MimicMotion, or similar Stable Diffusion / fine-tuning tools. The error occurs inside bitsandbytes but is caused by the app pinning old bnb versions or misconfiguring the environment. Minimal or no diagnostic info. Often no bnb version specified. May also manifest as errors from other libraries (e.g., diffusers API changes like `AttributeError: module diffusers.models has no attribute unet_2d_condition`) that the user files against bnb because it was in their stack. Another variant: ComfyUI + Triton on Windows, which isn't officially supported by Triton — the user sees a bnb library loading error but the root cause is the app's dependency packaging. Device placement errors (tensors on different devices) from ComfyUI's execution pipeline also fall in this category.
7373

7474
**Resolution:** These are dependency management issues in third-party apps. Close with a note to report to the app's issue tracker and upgrade bitsandbytes.
7575

@@ -85,9 +85,18 @@ These are the single largest category of issues. Most are environment problems o
8585
**Closing template:**
8686
> Closing this issue. This error message originates from the `transformers` library, not from bitsandbytes. Upgrading `transformers` to the latest version resolves it.
8787
88+
### TensorFlow / non-PyTorch frameworks
89+
90+
**How to identify:** User mentions TensorFlow, JAX (without explicit bnb JAX support), or other non-PyTorch frameworks. They may be searching for CUDA runtime DLLs like `cudart64_118.dll` for TensorFlow GPU support and conflate it with bitsandbytes. Bitsandbytes only works with PyTorch (>= 2.2.2).
91+
92+
**Resolution:** Close, noting that bitsandbytes is PyTorch-only.
93+
94+
**Closing template:**
95+
> Closing this issue. Bitsandbytes is only compatible with PyTorch (>= 2.2.2) and does not support TensorFlow or other frameworks. The issue you're describing appears to be related to your [TensorFlow/other] setup rather than bitsandbytes.
96+
8897
### Unrelated errors filed against bitsandbytes
8998

90-
**How to identify:** The traceback's root cause is in another library (sentencepiece, diffusers, ONNX, etc.) but the user filed it here because bitsandbytes appeared somewhere in their stack. Look at the actual exception — if it's about tokenizer parsing, model loading from a different library, or API changes in diffusers/transformers, it's not a bnb issue.
99+
**How to identify:** The traceback's root cause is in another library (sentencepiece, diffusers, ONNX, etc.) but the user filed it here because bitsandbytes appeared somewhere in their stack. Look at the actual exception — if it's about tokenizer parsing (e.g., `could not parse ModelProto from tokenizer.model` — that's sentencepiece), model loading from a different library, or API changes in diffusers/transformers, it's not a bnb issue.
91100

92101
**Closing template:**
93102
> Closing this issue. The error originates in [library name], not in bitsandbytes. Please report it to the appropriate issue tracker.
@@ -102,14 +111,115 @@ These are the single largest category of issues. Most are environment problems o
102111

103112
### Questions filed as bugs
104113

105-
**How to identify:** The issue asks about NF4 internals (offset value, data format, quantile bins), how quantization works, or how to use a feature. Often has the `Question` label. No actual error or bug report.
114+
**How to identify:** The issue asks about NF4 internals (offset value, data format, quantile bins), how quantization works, or how to use a feature. Often has the `Question` label. No actual error or bug report. Common specific questions:
115+
- How NF4 values are derived from `create_normal_map` and why they differ slightly from recomputing (floating-point rounding; the hardcoded values are canonical and avoid a scipy runtime dependency).
116+
- Whether NF4 is a floating-point format with sign/exponent/mantissa bits — it is not; NF4 is a lookup table of 16 quantile-based values, not an IEEE-style float format.
117+
- How `Linear8bitLt`'s `threshold` parameter works — users often assume it operates on **weights**, but it actually controls outlier detection on **activations** (inputs). Columns where activation magnitude exceeds the threshold are computed in fp16; the rest use int8.
118+
- How to inspect which columns were quantized vs. kept in fp16 after a forward pass.
119+
- Requests for per-layer mixed quantization (different bit widths for different layers, like llama.cpp's approach) — not currently supported.
106120

107121
**Resolution:** If answered in comments or by the reporter themselves, close. If useful, convert to a discussion. Consider whether the question reveals a documentation gap worth addressing.
108122

109-
### FSDP + bitsandbytes optimizers
123+
### FSDP + bitsandbytes (optimizers and quantized models)
110124

111-
**How to identify:** Errors when using bnb optimizers (Adam8bit, PagedAdamW, etc.) with FSDP or FSDP2. Common errors include `AssertionError` in `_convert_all_state_info`, `AttributeError: 'int' object has no attribute 'cpu'`, or `illegal memory access`.
125+
**How to identify:** Errors when using bnb optimizers (Adam8bit, PagedAdamW, etc.) with FSDP or FSDP2. Common errors include `AssertionError` in `_convert_all_state_info`, `AttributeError: 'int' object has no attribute 'cpu'`, or `illegal memory access`. Also includes errors loading 8-bit quantized models with FSDP, e.g., `Must flatten tensors with uniform dtype but got torch.float16 and torch.int8` — FSDP cannot handle mixed-dtype parameter groups from LLM.int8() quantization. FSDP2 optimizer state checkpointing (saving/resuming optimizer state with `bf16 + 8-bit optimizer`) also fails with assertion errors. Paged optimizers (PagedAdamW) also fail with FSDP when resuming from checkpoint.
112126

113-
**Current status:** FSDP support for bnb optimizers is a known gap. The maintainer has stated this repeatedly. Track via #1633 (open, Contributions Welcome). Historical context in #89 (closed).
127+
**Current status:** FSDP support for bnb optimizers is a known gap. The maintainer has stated this repeatedly. LLM.int8() with FSDP1 is not supported and unlikely to be worked on. Track via #1633 (open, Contributions Welcome). Historical context in #89 (closed). Recent duplicates: #1732 (FSDP2 checkpointing), #1709 (FSDP1 + int8 model loading), #1381 (paged optimizer + FSDP checkpoint resume), #1403 (FSDP2 + 8-bit optimizer).
114128

115129
**Resolution:** Close as duplicate of #1633, noting that FSDP optimizer support is not yet available.
130+
131+
### DeepSpeed ZeRO-3 + quantized models
132+
133+
**How to identify:** Errors when using `deepspeed.zero.Init` (ZeRO-3) with bitsandbytes-quantized models. Typically occurs when trying to combine ZeRO-3 weight partitioning with pre-quantized weights or `load_in_4bit`/`load_in_8bit`.
134+
135+
**What happened:** ZeRO-3's weight partitioning mechanism is incompatible with pre-quantized weights. The `zero.Init` context manager expects to shard standard floating-point parameters, but quantized weights have a different internal structure. This is a limitation of the transformers + DeepSpeed integration, not a bitsandbytes bug per se.
136+
137+
**Resolution:** Close, noting that ZeRO-3 `zero.Init` does not support quantized weights. Users should use ZeRO-2 or load the model without ZeRO-3 `zero.Init`.
138+
139+
**Closing template:**
140+
> Closing this issue. DeepSpeed ZeRO-3's `zero.Init` does not support bitsandbytes-quantized weights. The weight partitioning mechanism expects standard floating-point parameters. Consider using ZeRO stage 1 or 2 instead, or loading the model outside of `zero.Init`.
141+
142+
### CPU optimizer support requests
143+
144+
**How to identify:** Feature request asking for 8-bit or other low-bit optimizers to run on CPU (no CUDA). Common use case: DeepSpeed ZeRO-Offload where optimizer states are offloaded to CPU. Users want reduced memory for CPU-side optimizer states (e.g., 8-bit Adam on CPU for full fine-tuning of large models).
145+
146+
**Current status:** CPU optimizer support is tracked in #1226 (open). Recent duplicate: #1402.
147+
148+
**Resolution:** Close as duplicate of #1226.
149+
150+
### ROCm / AMD GPU build issues
151+
152+
**How to identify:** Build failure when compiling bitsandbytes from source with ROCm/HIP backend. Common errors include "Failed to find ROCm root directory" or hipcc-related failures. Often caused by incomplete or broken ROCm installations rather than bnb bugs.
153+
154+
**Resolution:** Verify the ROCm installation is complete and `ROCM_HOME`/`HIP_PATH` are set correctly. Upgrading ROCm often resolves the issue. If the user has a valid ROCm setup and still fails, it may be a real build bug.
155+
156+
**Closing template:**
157+
> Closing this issue. The build failure appears to be caused by an incomplete or misconfigured ROCm installation. Please ensure ROCm is installed correctly, `ROCM_HOME` and `HIP_PATH` are set, and `hipcc` is functional. Upgrading to a recent ROCm version (6.3+) often resolves these issues.
158+
159+
### Colab / Jupyter runtime not restarted after upgrade
160+
161+
**How to identify:** `ImportError: cannot import name 'sync_gpu' from 'bitsandbytes.utils'` or similar errors where a function exists in the installed version but not in the loaded module. The user upgraded bitsandbytes via `pip install` in a Colab or Jupyter notebook but didn't restart the runtime/kernel. The old `.pyc` files or already-imported modules remain in memory, causing version mismatches between submodules (e.g., `optimizer.py` from the new version references `sync_gpu` but `utils.py` from the old version is still loaded).
162+
163+
**Resolution:** Instruct the user to restart their Colab runtime / Jupyter kernel after upgrading bitsandbytes. Also check for outdated dependency versions (e.g., old PEFT).
164+
165+
**Closing template:**
166+
> Closing this issue. The `ImportError` indicates a version mismatch caused by upgrading bitsandbytes without restarting your Colab runtime / Jupyter kernel. After running `pip install -U bitsandbytes`, you must restart the runtime so that all modules are reloaded from the new version. Also consider upgrading related packages (peft, transformers, accelerate) to their latest versions.
167+
168+
### CMake + CUDA version architecture mismatch (source builds)
169+
170+
**How to identify:** Build failure when compiling bitsandbytes from source with CUDA 13+ and CMake < 3.31.9. CMake tries to compile for Maxwell, Pascal, or Volta architectures that CUDA 13 dropped. Error messages reference unsupported `sm_` values or nvcc compilation failures for old compute capabilities.
171+
172+
**What happened:** CMake versions before 3.31.9 don't know which GPU architectures were removed in CUDA 13. CMake's `CMAKE_CUDA_ARCHITECTURES` auto-detection includes architectures that the installed CUDA toolkit no longer supports, causing compilation failures. This is a CMake bug/limitation, not a bitsandbytes bug.
173+
174+
**Resolution:** Upgrade CMake to 3.31.9+, or manually specify supported architectures with `-DCOMPUTE_CAPABILITY=`.
175+
176+
**Closing template:**
177+
> Closing this issue. CMake versions before 3.31.9 don't know which architectures CUDA 13 dropped, so they attempt to compile for unsupported targets (Maxwell, Pascal, Volta). The fix is to either upgrade CMake to 3.31.9+ or manually specify your target architectures with `-DCOMPUTE_CAPABILITY=75;80;86` (or whichever you need). This is a CMake limitation, not a bitsandbytes bug.
178+
179+
### EOL platforms / old glibc preventing upgrades
180+
181+
**How to identify:** User is on CentOS 7, RHEL 7, or another EOL Linux distribution with glibc < 2.24. They cannot install bitsandbytes > 0.42.x from PyPI because the published wheels require glibc >= 2.24 (`manylinux_2_24`). They're stuck on old versions and hitting all the legacy `cuda_setup/main.py` bugs.
182+
183+
**What happened:** Modern bitsandbytes wheels are built with `manylinux_2_24`, which requires glibc >= 2.24. EOL platforms like CentOS 7 (glibc 2.17) can't use them. The user can't upgrade past the broken 0.42.x versions without upgrading their OS or building from source.
184+
185+
**Resolution:** Close, noting that EOL platforms can't be officially supported. Suggest building from source or upgrading the OS.
186+
187+
**Closing template:**
188+
> Closing this issue. The bitsandbytes wheels on PyPI require glibc >= 2.24, which means EOL platforms like CentOS 7 cannot install modern versions. We recommend upgrading your OS or building bitsandbytes from source. See the [installation docs](https://huggingface.co/docs/bitsandbytes/main/en/installation) for source build instructions.
189+
190+
### `prepare_model_for_kbit_training` memory concerns
191+
192+
**How to identify:** User reports that NF4/4-bit quantized model + LoRA uses more memory than expected, sometimes even more than bf16. The traceback or description references `prepare_model_for_kbit_training` from PEFT. Users expect quantization to always reduce memory but find backpropagation memory is higher than anticipated.
193+
194+
**What happened:** `prepare_model_for_kbit_training` intentionally casts adapter (LoRA) weights to float32 for training stability, which increases memory vs. keeping them in bf16. Additionally, quantized models still need to dequantize during the forward pass, and gradient computation through the dequantization step has its own memory overhead. This is by-design behavior in PEFT, not a bitsandbytes bug.
195+
196+
**Resolution:** Close, noting this is expected behavior. Users can skip `prepare_model_for_kbit_training` and call `model.gradient_checkpointing_enable()` directly if they want to trade off training stability for lower memory.
197+
198+
**Closing template:**
199+
> Closing this issue. The higher-than-expected memory usage is by design — `prepare_model_for_kbit_training` (from PEFT) casts adapter weights to float32 for training stability. You can skip it and call `model.gradient_checkpointing_enable()` directly if you prefer lower memory at the cost of potential training instability. This is a PEFT behavior, not a bitsandbytes issue.
200+
201+
### Insufficient information / no reproduction
202+
203+
**How to identify:** Issue reports an error but provides no bitsandbytes version, no `python -m bitsandbytes` output, no minimal reproduction code, or no response to maintainer follow-up questions. May also include screenshot-only bug reports where the image is inaccessible, bare model support requests with no detail (e.g., just a model name with "supported?"), or vague performance complaints without measurements.
204+
205+
**Resolution:** Ask for specifics. If no response after a reasonable period, close.
206+
207+
**Closing template:**
208+
> Closing this issue due to insufficient information to reproduce or investigate. If you're still experiencing this problem, please open a new issue with: (1) the output of `python -m bitsandbytes`, (2) your full environment details (OS, Python, PyTorch, GPU), and (3) a minimal code snippet that reproduces the error.
209+
210+
### Quantized model output quality (NaN, large numeric differences)
211+
212+
**How to identify:** User reports NaN values in model logits/outputs after 8-bit or 4-bit quantization, or reports that quantized model outputs are very different from the unquantized model. Often on old bitsandbytes versions (0.42.x or earlier). May also be caused by using float16 instead of bfloat16 on Ampere+ GPUs.
213+
214+
**Resolution:** Ask the user to upgrade bitsandbytes and try with `torch_dtype=torch.bfloat16`. If on the latest version with bfloat16 and the issue persists with a minimal repro, it may be a real bug. Otherwise close.
215+
216+
**Closing template:**
217+
> Closing this issue. NaN or large numeric differences in quantized outputs are often caused by using an old bitsandbytes version or float16 dtype. Please upgrade to the latest bitsandbytes and use `torch_dtype=torch.bfloat16`. If the issue persists, please open a new issue with a minimal reproduction.
218+
219+
### 4-bit model loading drops certain weights
220+
221+
**How to identify:** Certain model architectures lose specific weights when loaded with `load_in_4bit=True` via transformers. The saved model's `state_dict` is missing expected keys (e.g., `decoder.lm_head.weight`). Works correctly without quantization. Typically affects models with tied/shared weights or non-standard architectures (e.g., VisionEncoderDecoder, Donut).
222+
223+
**What happened:** The transformers `load_in_4bit` integration may not correctly handle tied weights or non-standard model architectures. Weights that are shared or aliased in the original model may get dropped during the quantization loading process.
224+
225+
**Resolution:** This is likely a transformers integration issue. Check if the model architecture has tied weights. Suggest filing against transformers if it's a loading issue in their quantization code path.

0 commit comments

Comments
 (0)