Skip to content

Reduce FLUX int8 test peak memory with sequential offload#13776

Open
jiqing-feng wants to merge 4 commits into
huggingface:mainfrom
jiqing-feng:test_xpu
Open

Reduce FLUX int8 test peak memory with sequential offload#13776
jiqing-feng wants to merge 4 commits into
huggingface:mainfrom
jiqing-feng:test_xpu

Conversation

@jiqing-feng
Copy link
Copy Markdown
Contributor

@jiqing-feng jiqing-feng commented May 21, 2026

Summary

Update the slow FLUX bitsandbytes int8 tests to use sequential CPU offload instead of model CPU offload.

enable_model_cpu_offload() can move an entire sub-model onto the GPU at once. For black-forest-labs/FLUX.1-dev, this can OOM on <=24 GB cards even when the T5 encoder and transformer are loaded from the pre-quantized int8 test checkpoint. Sequential CPU offload keeps peak memory lower by materializing one layer at a time, which lets the int8 FLUX tests run in more constrained environments.

The LoRA-loading assertion tolerance is also relaxed from 1e-3 to 2e-3 to account for small backend-specific numerical differences observed in the slow int8 path.

Changes

  • Switch SlowBnb8bitFluxTests setup from enable_model_cpu_offload() to enable_sequential_cpu_offload().
  • Document why sequential offload is needed for the FLUX int8 slow tests.
  • Relax the test_lora_loading cosine-distance tolerance to 2e-3.

Validation

Run the affected slow tests:

RUN_SLOW=1 python -m pytest \
  tests/quantization/bnb/test_mixed_int8.py::SlowBnb8bitFluxTests::test_quality \
  tests/quantization/bnb/test_mixed_int8.py::SlowBnb8bitFluxTests::test_lora_loading \
  -x -s

Observed result:

2 passed

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
@github-actions github-actions Bot added tests size/S PR with diff < 50 LOC labels May 21, 2026
@jiqing-feng jiqing-feng changed the title Fix OOM on int8 tests Reduce FLUX int8 test peak memory with sequential offload May 21, 2026
@jiqing-feng
Copy link
Copy Markdown
Contributor Author

require change: huggingface/accelerate#4044 merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/S PR with diff < 50 LOC tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant