Skip to content

[new-model] Port FLUX.1-dev T2I to FastVideo#1228

Open
Ishxn20 wants to merge 2 commits intohao-ai-lab:mainfrom
Ishxn20:flux-dev
Open

[new-model] Port FLUX.1-dev T2I to FastVideo#1228
Ishxn20 wants to merge 2 commits intohao-ai-lab:mainfrom
Ishxn20:flux-dev

Conversation

@Ishxn20
Copy link
Copy Markdown
Contributor

@Ishxn20 Ishxn20 commented Apr 11, 2026

Summary

Adds FLUX.1-dev text-to-image support to FastVideo: Diffusers-aligned packed latents, FlowMatch mu, CLIP pooled + T5 sequence conditioning, embedded guidance and optional true CFG via true_cfg_scale, plus registry wiring for black-forest-labs/FLUX.1-dev. Includes parity, loader, pipeline smoke, and SSIM test hooks, a minimal example script, and contributor-oriented test layout (pipeline smoke and checkpoint loader tests under tests/local_tests/).

What changed

  • Model: FluxTransformer2DModel and FluxDiTConfig with Diffusers-compatible forward (txt_ids / img_ids, timestep scaling, guidance when guidance_embeds).
  • Pipeline: FluxPipeline and FLUX stages (pack/unpack, latent image ids, scheduler mu, denoise loop, VAE denormalize, 5D image output).
  • Config / sampling: FluxPipelineConfig, FluxSamplingParam (defaults aligned with FLUX.1-dev).
  • Registry: T2I registration and path detectors for FLUX checkpoints.
  • Example: examples/inference/basic/flux_dev_t2i.py.
  • Tests: DiT parity (fastvideo/tests/transformers/test_flux.py), SSIM (fastvideo/tests/ssim/test_flux_t2i_similarity.py), local pipeline smoke (tests/local_tests/pipelines/test_flux_dev_pipeline_smoke.py), local component loaders (tests/local_tests/flux/test_flux_dev_component_loaders.py).

How to test

From repo root (requires CUDA and official_weights/FLUX.1-dev or FLUX_DEV_ROOT / FLUX_TRANSFORMER_PATH where noted):

FASTVIDEO_ATTENTION_BACKEND=TORCH_SDPA pytest \
  fastvideo/tests/transformers/test_flux.py \
  tests/local_tests/flux/test_flux_dev_component_loaders.py \
  tests/local_tests/pipelines/test_flux_dev_pipeline_smoke.py \
  -v

Copilot AI review requested due to automatic review settings April 11, 2026 05:32
@mergify mergify Bot added scope: inference Inference pipeline, serving, CLI scope: infra CI, tests, Docker, build labels Apr 11, 2026
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Apr 11, 2026

⚠️ PR title format required

Your PR title must start with a type tag in brackets. Examples:

  • [feat] Add new model support
  • [bugfix] Fix VAE tiling corruption
  • [refactor] Restructure training pipeline
  • [perf] Optimize attention kernel
  • [ci] Update test infrastructure
  • [docs] Add inference guide
  • [misc] Clean up configs
  • [new-model] Port Flux2 to FastVideo

Valid tags: feat, feature, bugfix, fix, refactor, perf, ci, doc, docs, misc, chore, kernel, new-model

Please update your PR title and the merge protection check will pass automatically.

@mergify mergify Bot added the scope: model Model architecture (DiTs, encoders, VAEs) label Apr 11, 2026
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Apr 11, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🔴 PR merge requirements

Waiting for:

  • #approved-reviews-by>=1
  • check-success=fastcheck-passed
  • check-success=full-suite-passed
This rule is failing.
  • #approved-reviews-by>=1
  • check-success=fastcheck-passed
  • check-success=full-suite-passed
  • check-success~=pre-commit
  • title~=(?i)^\[(feat|feature|bugfix|fix|refactor|perf|ci|doc|docs|misc|chore|kernel|new.?model)\]

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for the FLUX.1-dev text-to-image model. It includes the implementation of the FLUX transformer architecture with joint and single-stream attention, specialized pipeline stages for latent packing and flow matching, and configuration updates to handle embedded guidance. Feedback focuses on simplifying redundant model layers, correcting timestep scaling in the forward context, and improving error handling by replacing assertions with explicit exceptions in pipeline stages.

Comment on lines +153 to +158
self.to_out = nn.ModuleList(
[
ReplicatedLinear(self.inner_dim, dim, bias=True),
nn.Dropout(0.0),
]
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The nn.Dropout(0.0) is a no-op and adds unnecessary overhead to the model's forward pass. Additionally, using nn.ModuleList for a single projection layer is redundant. It is recommended to simplify self.to_out to a direct ReplicatedLinear layer and update the forward call accordingly.

        self.to_out = ReplicatedLinear(self.inner_dim, dim, bias=True)

Comment on lines +209 to +210
img_out, _ = self.to_out[0](img_out)
img_out = self.to_out[1](img_out)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If self.to_out is simplified to a ReplicatedLinear layer as suggested, the forward call should be updated to call the layer directly.

Suggested change
img_out, _ = self.to_out[0](img_out)
img_out = self.to_out[1](img_out)
img_out, _ = self.to_out(img_out)

get_forward_context()
forward_context = nullcontext()
except AssertionError:
ts0 = int(timestep[0].item()) if timestep.numel() > 0 else 0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The fallback calculation for ts0 uses the scaled timestep (in range [0, 1]) directly as an integer, which will result in either 0 or 1. Since the forward pass later scales this by 1000 (line 516), ts0 should be scaled by 1000 here to correctly represent the raw timestep in the forward context (e.g., for TeaCache or logging).

Suggested change
ts0 = int(timestep[0].item()) if timestep.numel() > 0 else 0
ts0 = int(timestep[0].item() * 1000) if timestep.numel() > 0 else 0

),
):
if use_true_cfg:
assert neg_enc is not None and neg_pooled is not None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using assert for state validation in pipeline stages is discouraged because assertions can be disabled in production environments (using the -O flag). It is safer to raise a RuntimeError with a descriptive message to handle cases where the required conditioning embeddings are missing.

                    if neg_enc is None or neg_pooled is None:\n                        raise RuntimeError('True CFG requires negative prompt embeddings (neg_enc and neg_pooled) to be populated in batch.extra.')

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds FLUX.1-dev text-to-image support to FastVideo, including a Diffusers-aligned FLUX transformer implementation, a composed FLUX pipeline with packed-latent + FlowMatch scheduling, and associated registration, tests, and an example script.

Changes:

  • Introduces FLUX model + pipeline configs and wiring to support black-forest-labs/FLUX.1-dev for T2I.
  • Adds FLUX pipeline stages (conditioning, packed-latent prep, FlowMatch mu, denoise loop, VAE decode) and a new FluxPipeline.
  • Adds parity + SSIM + local smoke/loader tests, and extends SSIM utilities to support single-frame image outputs.

Reviewed changes

Copilot reviewed 17 out of 18 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
tests/local_tests/pipelines/test_flux_dev_pipeline_smoke.py Local end-to-end FLUX T2I smoke test from a local checkpoint.
tests/local_tests/flux/test_flux_dev_component_loaders.py Local loader smoke tests for FLUX tokenizers/encoders/VAE/scheduler.
tests/local_tests/flux/__init__.py Marks local FLUX tests as a package.
fastvideo/tests/utils.py Extends SSIM utilities to read images as 1-frame clips.
fastvideo/tests/transformers/test_flux.py Adds FastVideo vs Diffusers parity test for FluxTransformer2DModel.
fastvideo/tests/ssim/test_flux_t2i_similarity.py Adds FLUX T2I SSIM gate using .png outputs.
fastvideo/tests/ssim/inference_similarity_utils.py Generalizes SSIM harness from “video” to “media” (video or image).
fastvideo/registry.py Registers FLUX pipeline + sampling param and adds model detectors for discovery.
fastvideo/pipelines/stages/flux_stages.py Implements FLUX pipeline stages (pack/unpack, mu shifting, denoise, decode).
fastvideo/pipelines/pipeline_batch_info.py Adds embedded-guidance and true-CFG knobs to ForwardBatch.
fastvideo/pipelines/basic/flux/flux_pipeline.py Adds composed FluxPipeline definition and stage wiring.
fastvideo/pipelines/basic/flux/__init__.py Package marker for the FLUX pipeline.
fastvideo/models/dits/flux.py Adds FluxTransformer2DModel implementation compatible with Diffusers weights.
fastvideo/configs/sample/flux.py Adds FLUX sampling defaults aligned to FLUX.1-dev.
fastvideo/configs/sample/base.py Adds embedded-guidance and true-CFG fields to global sampling params.
fastvideo/configs/pipelines/flux.py Adds FluxPipelineConfig (encoders, precisions, tokenization defaults).
fastvideo/configs/models/dits/flux.py Adds FluxDiTConfig and arch config for the FLUX transformer.
examples/inference/basic/basic_flux_dev.py Adds a minimal CLI example to run FLUX.1-dev T2I and save PNGs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +95 to +100
args = FastVideoArgs(
model_path=_FLUX_DEV_ROOT,
pipeline_config=_FluxDevLoaderPipelineConfig(),
hsdp_shard_dim=1,
pin_cpu_memory=False,
)
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FastVideoArgs defaults enable text_encoder_cpu_offload (and can trigger FSDP2 fully_shard wrapping for CLIP/T5 since their configs define _fsdp_shard_conditions). In that case te_clip.__class__.__name__ / te_t5.__class__.__name__ won’t end with CLIPTextModel/T5EncoderModel, making these assertions flaky. Either disable text encoder CPU offload explicitly in this test (and/or FSDP inference), or unwrap the underlying module before asserting its type/name.

Copilot uses AI. Check for mistakes.
Comment on lines +118 to +120
assert te_clip.__class__.__name__.endswith("CLIPTextModel")
assert te_t5.__class__.__name__.endswith("T5EncoderModel")

Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class-name assertion is not robust when encoders are wrapped by FSDP2 during CPU offload (default FastVideoArgs behavior). Consider unwrapping the wrapped module (or asserting on a stable attribute/config) instead of relying on __class__.__name__ suffix matching.

Suggested change
assert te_clip.__class__.__name__.endswith("CLIPTextModel")
assert te_t5.__class__.__name__.endswith("T5EncoderModel")
def _unwrap_wrapped_module(module: object) -> object:
current = module
seen: set[int] = set()
while id(current) not in seen:
seen.add(id(current))
wrapped = getattr(current, "module", None)
if wrapped is not None and wrapped is not current:
current = wrapped
continue
wrapped = getattr(current, "_fsdp_wrapped_module", None)
if wrapped is not None and wrapped is not current:
current = wrapped
continue
break
return current
te_clip_unwrapped = _unwrap_wrapped_module(te_clip)
te_t5_unwrapped = _unwrap_wrapped_module(te_t5)
assert te_clip_unwrapped.__class__.__name__.endswith("CLIPTextModel")
assert te_t5_unwrapped.__class__.__name__.endswith("T5EncoderModel")

Copilot uses AI. Check for mistakes.
Comment on lines +62 to +73
"""Require height/width divisible by 16 (VAE scale × 2 for FLUX packing)."""

def forward(
self,
batch: ForwardBatch,
fastvideo_args: FastVideoArgs,
) -> ForwardBatch:
if (batch.height is not None and batch.width is not None
and (batch.height % 16 != 0 or batch.width % 16 != 0)):
raise ValueError(
"FLUX expects height and width divisible by 16 "
f"(VAE latent grid × 2× packing); got {batch.height}×{batch.width}."
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FluxInputValidationStage hard-codes height/width divisibility by 16, but later stages derive the VAE spatial compression ratio from fastvideo_args.pipeline_config.vae_config.arch_config.spatial_compression_ratio. If that ratio differs from 8, the correct constraint is height % (2*spatial_ratio) == 0 and width % (2*spatial_ratio) == 0 (2x comes from FLUX packing), otherwise invalid inputs can pass this stage and fail later.

Suggested change
"""Require height/width divisible by 16 (VAE scale × 2 for FLUX packing)."""
def forward(
self,
batch: ForwardBatch,
fastvideo_args: FastVideoArgs,
) -> ForwardBatch:
if (batch.height is not None and batch.width is not None
and (batch.height % 16 != 0 or batch.width % 16 != 0)):
raise ValueError(
"FLUX expects height and width divisible by 16 "
f"(VAE latent grid × 2× packing); got {batch.height}×{batch.width}."
"""Require height/width divisible by 2 × VAE spatial compression ratio."""
def forward(
self,
batch: ForwardBatch,
fastvideo_args: FastVideoArgs,
) -> ForwardBatch:
arch_config = getattr(getattr(getattr(fastvideo_args, "pipeline_config", None), "vae_config", None), "arch_config", None)
spatial_compression_ratio = getattr(arch_config, "spatial_compression_ratio", 8)
required_divisibility = 2 * spatial_compression_ratio
if (batch.height is not None and batch.width is not None
and (batch.height % required_divisibility != 0 or batch.width % required_divisibility != 0)):
raise ValueError(
"FLUX expects height and width divisible by "
f"{required_divisibility} (2 × VAE spatial compression ratio "
f"{spatial_compression_ratio}); got {batch.height}×{batch.width}."

Copilot uses AI. Check for mistakes.
Comment on lines 99 to 106
@@ -88,25 +106,29 @@ def _assert_similarity(
raise FileNotFoundError(error_msg)
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The raised error message still says “Reference video folder…” even though this helper now supports image (non-video) media. Update the message (and the download instructions text if needed) to refer to “reference media” to avoid confusing failures when running T2I SSIM.

Copilot uses AI. Check for mistakes.
Comment thread fastvideo/tests/utils.py
Comment on lines 74 to 78
@@ -58,8 +77,8 @@ def compute_video_ssim_torchvision(video1_path, video2_path, use_ms_ssim=True):
if not os.path.exists(video2_path):
raise FileNotFoundError(f"Video2 not found: {video2_path}")
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that compute_video_ssim_torchvision accepts image paths too, the FileNotFoundError messages (“Video1 not found”, “Video2 not found”) are misleading. Consider switching these to “Media1/Media2 not found” (or include “video/image”) to match the updated behavior.

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +6
# SPDX-License-Identifier: Apache-2.0

from __future__ import annotations

import argparse
import contextlib
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description says the example is examples/inference/basic/flux_dev_t2i.py, but the added example file here is named basic_flux_dev.py. Please align the filename/path in the PR description (or rename/move the script) so contributors can find it easily.

Copilot uses AI. Check for mistakes.


def test_flux_dev_pipeline_short_run_finite_output(
monkeypatch: pytest.MonkeyPatch) -> None:
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function signature indentation is inconsistent with the surrounding codebase’s typical formatting (and may fail auto-formatting checks). Consider reformatting this definition to standard 4-space continuation indentation (e.g., Black-compatible).

Suggested change
monkeypatch: pytest.MonkeyPatch) -> None:
monkeypatch: pytest.MonkeyPatch,
) -> None:

Copilot uses AI. Check for mistakes.
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Apr 11, 2026

Pre-commit checks failed

Hi @Ishxn20, the pre-commit checks have failed. To fix them locally:

# Install pre-commit if you haven't already
uv pip install pre-commit
pre-commit install

# Run all checks and auto-fix what's possible
pre-commit run --all-files

Common fixes:

  • yapf: yapf -i <file> (formatting)
  • ruff: ruff check --fix <file> (linting)
  • codespell: codespell --write-changes <file> (spelling)

After fixing, commit and push the changes. The checks will re-run automatically.

For future commits, pre-commit will run automatically on changed files before each commit.

@Ishxn20 Ishxn20 changed the title Added Flux dev pipeline [new-model] Port FLUX.1-dev T2I to FastVideo Apr 11, 2026
@mergify mergify Bot added the type: new-model New model support label Apr 11, 2026
@alexzms alexzms self-requested a review May 1, 2026 03:51
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented May 1, 2026

This PR has merge conflicts with the base branch. Please rebase:

git fetch origin main
git rebase origin/main
# Resolve any conflicts, then:
git push --force-with-lease

@mergify mergify Bot added the needs-rebase PR has merge conflicts label May 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-rebase PR has merge conflicts scope: inference Inference pipeline, serving, CLI scope: infra CI, tests, Docker, build scope: model Model architecture (DiTs, encoders, VAEs) type: new-model New model support

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants