Skip to content

Bad image output for Flux.2-dev using quantization and separate prompt encoding sequence #13772

@john-tecplot

Description

@john-tecplot

Describe the bug

Image generation using black-forest-labs/Flux.2-dev with diffusers using quantization at int8 and separate stages for prompt encoding and transformer inference results in bad random checkerboard image output. This same workflow works find with all other large models I have tried (QwenImage, z-image, Flux.1-dev, stable-diffusion-3-5-large).

System:
OS: Fedora
Kernel: x86_64 Linux 7.0.8-100.fc43.x86_64
CPU: Intel Xeon Silver 4114 @ 40x 3GHz [46.0°C]
GPU: AMD Radeon Pro W7900 (radeonsi, navi31, LLVM 21.1.8, DRM 3.64, 7.0.8-100.fc43.x86_64)
RAM: 321061MiB

Using docker images: rocm/pytorch
tags tested:

  • latest (as of May 20, 2026)
  • rocm7.2.2_ubuntu24.04_py3.12_pytorch_release_2.7.1

Model resulting in bad output:

  • black-forest-labs/Flux.2-dev

The reproduction script included runs the prompt encoding and inference with an int8 quantization, but explicitly separated by unloading everything in between.

Output image:

Image

Reproduction

import gc

import diffusers
import torch
import transformers

# tested with these docker images (rocm/pytorch):
#   rocm/pytorch:rocm7.2.2_ubuntu24.04_py3.12_pytorch_release_2.7.1
#   rocm/pytorch:latest (as of 2026-05-20)
# Where latest was at pytorch version 2.8.0

# this seems to make no difference on the output or performance
# torch.backends.cuda.enable_mem_efficient_sdp(False)

model = "black-forest-labs/FLUX.2-dev"
outfile = "cool-cat.png"
prompt = "A cat with a banjo"

print("==== Phase 1: text encoder ====")
print("Loading text encoder (quantization config: llm_int8)...")
te_qconfig = transformers.BitsAndBytesConfig(
    load_in_8bit=True,
)
text_encoder = transformers.Mistral3ForConditionalGeneration.from_pretrained(
    model,
    subfolder="text_encoder",
    quantization_config=te_qconfig,
    tie_word_embeddings=False,
    torch_dtype=torch.bfloat16,
)

print("Building prompt-encoder pipeline (with quantization)...")
encoder_pipeline = diffusers.Flux2Pipeline.from_pretrained(
    model,
    text_encoder=text_encoder,
    transformer=None,
    vae=None,
    torch_dtype=torch.bfloat16,
)

encoder_pipeline.to("cuda")

print("Encoding prompt...")
with torch.no_grad():
    prompt_embeds, text_ids = encoder_pipeline.encode_prompt(prompt=prompt)

print("Unloading prompt-encoder pipeline...")
del encoder_pipeline
del text_encoder
gc.collect()
torch.cuda.empty_cache()

print("==== Phase 2: inference ====")
print("Loading transformer (quantization config: llm_int8)...")
tr_qconfig = diffusers.BitsAndBytesConfig(
    load_in_8bit=True,
)
transformer = diffusers.Flux2Transformer2DModel.from_pretrained(
    model,
    subfolder="transformer",
    quantization_config=tr_qconfig,
    torch_dtype=torch.bfloat16,
)

print("Building inference pipeline (with quantization)...")
pipeline = diffusers.Flux2Pipeline.from_pretrained(
    model,
    text_encoder=None,
    tokenizer=None,
    transformer=transformer,
    torch_dtype=torch.bfloat16,
)

pipeline = pipeline.to("cuda")

print("Running inference...")
result = pipeline(prompt_embeds=prompt_embeds)

print(f"Saving image to {outfile}...")
result.images[0].save(outfile)

print("Done.")

Logs

# python test-flux2-int8.py
==== Phase 1: text encoder ====
Loading text encoder (quantization config: llm_int8)...
Loading weights: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 585/585 [04:38<00:00,  2.10it/s]
Building prompt-encoder pipeline (with quantization)...
Loading pipeline components...: 100%|███████████████████████████████████████████████████████████████████████████████████| 3/3 [00:02<00:00,  1.44it/s]
Encoding prompt...
[transformers] Kwargs passed to `processor.__call__` have to be in `processor_kwargs` dict, not in `**kwargs`
/opt/venv/lib/python3.12/site-packages/bitsandbytes/autograd/_functions.py:123: UserWarning: MatMul8bitLt: inputs will be cast from torch.bfloat16 to float16 during quantization
  warnings.warn(f"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization")
Unloading prompt-encoder pipeline...
==== Phase 2: inference ====
Loading transformer (quantization config: llm_int8)...
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████| 7/7 [06:17<00:00, 53.96s/it]
Building inference pipeline (with quantization)...
Loading pipeline components...: 100%|███████████████████████████████████████████████████████████████████████████████████| 3/3 [00:01<00:00,  1.51it/s]
Running inference...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [05:16<00:00,  6.33s/it]
Saving image to cool-cat.png...
Done.

System Info

  • 🤗 Diffusers version: 0.38.0
  • Platform: Linux-7.0.8-100.fc43.x86_64-x86_64-with-glibc2.39
  • Running on Google Colab?: No
  • Python version: 3.12.3
  • PyTorch version (GPU?): 2.8.0+rocm7.0.0.git64359f59 (True)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Huggingface_hub version: 1.15.0
  • Transformers version: 5.8.1
  • Accelerate version: 1.13.0
  • PEFT version: 0.19.1
  • Bitsandbytes version: 0.49.2
  • Safetensors version: 0.8.0-rc.0
  • xFormers version: not installed
  • Accelerator: NA

System:
OS: Fedora
Kernel: x86_64 Linux 7.0.8-100.fc43.x86_64
Shell: zsh 5.9
Resolution: 10240x2880
DE: GNOME 49.7
WM: Mutter
WM Theme: Adwaita
GTK Theme: Adwaita [GTK2/3]
Icon Theme: Adwaita
Font: Adwaita Sans 11
CPU: Intel Xeon Silver 4114 @ 40x 3GHz [46.0°C]
GPU: AMD Radeon Pro W7900 (radeonsi, navi31, LLVM 21.1.8, DRM 3.64, 7.0.8-100.fc43.x86_64)
RAM: 321061MiB

Who can help?

This is general use issue about regular inference with a base model.

@sayakpaul @DN6

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions