Bad image output for Flux.2-dev using quantization and separate prompt encoding sequence

### Describe the bug

Image generation using `black-forest-labs/Flux.2-dev` with diffusers using quantization at int8 and separate stages for prompt encoding and transformer inference results in bad random checkerboard image output. This same workflow works find with all other large models I have tried (QwenImage, z-image, Flux.1-dev, stable-diffusion-3-5-large).

System:
 OS: Fedora 
 Kernel: x86_64 Linux 7.0.8-100.fc43.x86_64
 CPU: Intel Xeon Silver 4114 @ 40x 3GHz [46.0°C]
 GPU: AMD Radeon Pro W7900 (radeonsi, navi31, LLVM 21.1.8, DRM 3.64, 7.0.8-100.fc43.x86_64)
 RAM: 321061MiB

Using docker images: `rocm/pytorch`
tags tested:
- latest (as of May 20, 2026)
- rocm7.2.2_ubuntu24.04_py3.12_pytorch_release_2.7.1

Model resulting in bad output:
- black-forest-labs/Flux.2-dev

The reproduction script included runs the prompt encoding and inference with  an int8 quantization, but explicitly separated by unloading everything in between. 

Output image:

<img width="1024" height="1024" alt="Image" src="https://github.com/user-attachments/assets/d43b26af-16e4-4959-b3a2-0533d966788a" />

### Reproduction

```python
import gc

import diffusers
import torch
import transformers

# tested with these docker images (rocm/pytorch):
#   rocm/pytorch:rocm7.2.2_ubuntu24.04_py3.12_pytorch_release_2.7.1
#   rocm/pytorch:latest (as of 2026-05-20)
# Where latest was at pytorch version 2.8.0

# this seems to make no difference on the output or performance
# torch.backends.cuda.enable_mem_efficient_sdp(False)

model = "black-forest-labs/FLUX.2-dev"
outfile = "cool-cat.png"
prompt = "A cat with a banjo"

print("==== Phase 1: text encoder ====")
print("Loading text encoder (quantization config: llm_int8)...")
te_qconfig = transformers.BitsAndBytesConfig(
    load_in_8bit=True,
)
text_encoder = transformers.Mistral3ForConditionalGeneration.from_pretrained(
    model,
    subfolder="text_encoder",
    quantization_config=te_qconfig,
    tie_word_embeddings=False,
    torch_dtype=torch.bfloat16,
)

print("Building prompt-encoder pipeline (with quantization)...")
encoder_pipeline = diffusers.Flux2Pipeline.from_pretrained(
    model,
    text_encoder=text_encoder,
    transformer=None,
    vae=None,
    torch_dtype=torch.bfloat16,
)

encoder_pipeline.to("cuda")

print("Encoding prompt...")
with torch.no_grad():
    prompt_embeds, text_ids = encoder_pipeline.encode_prompt(prompt=prompt)

print("Unloading prompt-encoder pipeline...")
del encoder_pipeline
del text_encoder
gc.collect()
torch.cuda.empty_cache()

print("==== Phase 2: inference ====")
print("Loading transformer (quantization config: llm_int8)...")
tr_qconfig = diffusers.BitsAndBytesConfig(
    load_in_8bit=True,
)
transformer = diffusers.Flux2Transformer2DModel.from_pretrained(
    model,
    subfolder="transformer",
    quantization_config=tr_qconfig,
    torch_dtype=torch.bfloat16,
)

print("Building inference pipeline (with quantization)...")
pipeline = diffusers.Flux2Pipeline.from_pretrained(
    model,
    text_encoder=None,
    tokenizer=None,
    transformer=transformer,
    torch_dtype=torch.bfloat16,
)

pipeline = pipeline.to("cuda")

print("Running inference...")
result = pipeline(prompt_embeds=prompt_embeds)

print(f"Saving image to {outfile}...")
result.images[0].save(outfile)

print("Done.")
```

### Logs

```shell
# python test-flux2-int8.py
==== Phase 1: text encoder ====
Loading text encoder (quantization config: llm_int8)...
Loading weights: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 585/585 [04:38<00:00,  2.10it/s]
Building prompt-encoder pipeline (with quantization)...
Loading pipeline components...: 100%|███████████████████████████████████████████████████████████████████████████████████| 3/3 [00:02<00:00,  1.44it/s]
Encoding prompt...
[transformers] Kwargs passed to `processor.__call__` have to be in `processor_kwargs` dict, not in `**kwargs`
/opt/venv/lib/python3.12/site-packages/bitsandbytes/autograd/_functions.py:123: UserWarning: MatMul8bitLt: inputs will be cast from torch.bfloat16 to float16 during quantization
  warnings.warn(f"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization")
Unloading prompt-encoder pipeline...
==== Phase 2: inference ====
Loading transformer (quantization config: llm_int8)...
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████| 7/7 [06:17<00:00, 53.96s/it]
Building inference pipeline (with quantization)...
Loading pipeline components...: 100%|███████████████████████████████████████████████████████████████████████████████████| 3/3 [00:01<00:00,  1.51it/s]
Running inference...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [05:16<00:00,  6.33s/it]
Saving image to cool-cat.png...
Done.
```

### System Info

- 🤗 Diffusers version: 0.38.0
- Platform: Linux-7.0.8-100.fc43.x86_64-x86_64-with-glibc2.39
- Running on Google Colab?: No
- Python version: 3.12.3
- PyTorch version (GPU?): 2.8.0+rocm7.0.0.git64359f59 (True)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Huggingface_hub version: 1.15.0
- Transformers version: 5.8.1
- Accelerate version: 1.13.0
- PEFT version: 0.19.1
- Bitsandbytes version: 0.49.2
- Safetensors version: 0.8.0-rc.0
- xFormers version: not installed
- Accelerator: NA

System:
 OS: Fedora 
 Kernel: x86_64 Linux 7.0.8-100.fc43.x86_64
 Shell: zsh 5.9
 Resolution: 10240x2880
 DE: GNOME 49.7
 WM: Mutter
 WM Theme: Adwaita
 GTK Theme: Adwaita [GTK2/3]
 Icon Theme: Adwaita
 Font: Adwaita Sans 11
 CPU: Intel Xeon Silver 4114 @ 40x 3GHz [46.0°C]
 GPU: AMD Radeon Pro W7900 (radeonsi, navi31, LLVM 21.1.8, DRM 3.64, 7.0.8-100.fc43.x86_64)
 RAM: 321061MiB

### Who can help?

This is general use issue about regular inference with a base model.

@sayakpaul @DN6 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bad image output for Flux.2-dev using quantization and separate prompt encoding sequence #13772

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Bad image output for Flux.2-dev using quantization and separate prompt encoding sequence #13772

Description

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions