Skip to content

Fix redundant Z-Image terminal timestep#13730

Merged
dg845 merged 7 commits into
huggingface:mainfrom
rootonchair:fix/z-image-redundant-timestep
May 29, 2026
Merged

Fix redundant Z-Image terminal timestep#13730
dg845 merged 7 commits into
huggingface:mainfrom
rootonchair:fix/z-image-redundant-timestep

Conversation

@rootonchair

@rootonchair rootonchair commented May 12, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Summary

This PR fixes a redundant terminal denoising step in Z-Image pipelines.

Previously, Z-Image pipelines mutated the scheduler with scheduler.sigma_min = 0.0.
With FlowMatchEulerDiscreteScheduler, this caused the generated timestep/sigma schedule to include a model-forward step at terminal sigma 0.0, while the scheduler also appended its own terminal 0.0. The final denoising step therefore became a no-op transition from 0.0 -> 0.0.

This change makes Z-Image pipelines compute their default sigma schedule up front and pass it through the existing sigmas argument, instead of mutating scheduler.sigma_min.

Behavior

Old Schedule

Requested steps Scheduler sigmas Effective updates
9 [1.0, 0.875, ..., 0.125, 0.0, 0.0] 8 meaningful + 1 no-op

New Schedule

Requested steps Scheduler sigmas Effective updates
8 [1.0, 0.875, ..., 0.125, 0.0] 8 meaningful

Result Check

Generated 1024x1024 images with Tongyi-MAI/Z-Image-Turbo, prompt dance monkey, seed 0.

New 8 steps

new_8_steps_1024

Old 9 steps

old_9_steps_1024

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@sayakpaul

@dg845 dg845 requested review from dg845 and sayakpaul May 26, 2026 00:54

@sayakpaul sayakpaul left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your efforts.

I ran this script to see if the outputs diverge from outputs of this branch from the main branch and they're diverging.

Could we get an explanation of why that is expected?

@HuggingFaceDocBuilderDev

Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@rootonchair

Copy link
Copy Markdown
Contributor Author

Hi @sayakpaul , thank you for reviewing the work. Could you specify which script to run? I think the diverging coming from the fact that the main branch is running 8 steps + 1 model run but no update (as sigma=0 at step 8 and 9) while the new fix truly corresponding to the actual result when running the model + update latent under 9 steps

@sayakpaul

Copy link
Copy Markdown
Member

Could you specify which script to run?

I ran this script to see if the outputs diverge from outputs of this branch from the main branch and they're diverging.

@rootonchair

Copy link
Copy Markdown
Contributor Author

Could you check the link? Because I got: https://pastebin.com/q8aFG47V

@dg845

dg845 commented May 28, 2026

Copy link
Copy Markdown
Collaborator

I think the discrepancy in the script is explained by the fact that a num_inference_steps-length timestep schedule after the PR changes is equivalent to a num_inference_steps + 1-length timestep schedule before the changes (e.g. on main). So for example, 8-step inference after the PR is equivalent to 9-step inference before the PR.

Z-Image Test Script
import argparse

import torch

from diffusers import ZImagePipeline


DEFAULT_T2I_PROMPT = "一幅为名为“造相「Z-IMAGE-TURBO」”的项目设计的创意海报。画面巧妙地将文字概念视觉化:一辆复古蒸汽小火车化身为巨大的拉链头,正拉开厚厚的冬日积雪,展露出一个生机盎然的春天。"


def compare(path_a, path_b):
    image_a = torch.load(path_a, weights_only=True)
    image_b = torch.load(path_b, weights_only=True)
    diff = (image_a - image_b).abs()
    print(f"a={path_a}  b={path_b}")
    print(f"  max abs diff:  {diff.max().item():.6e}")
    print(f"  mean abs diff: {diff.mean().item():.6e}")
    torch.testing.assert_close(image_a, image_b, rtol=0.0, atol=1e-5)
    print("ASSERTION PASSED: outputs match within atol=1e-5.")


def main(args):
    if args.compare:
        compare(args.base_tensor_path, args.new_tensor_path)
        return

    pipe = ZImagePipeline.from_pretrained(args.model_id, torch_dtype=args.dtype)
    pipe.to(device=args.device)

    prompt = args.prompt
    if prompt is None:
        prompt = DEFAULT_T2I_PROMPT

    image_latents = pipe(
        prompt=prompt,
        height=args.height,
        width=args.width,
        num_inference_steps=args.num_inference_steps,
        guidance_scale=args.guidance_scale,
        generator=torch.Generator(device=args.device).manual_seed(args.seed),
        output_type="latent",
        return_dict=False,
    )

    if args.test == "base":
        save_path = args.base_tensor_path
    else:
        save_path = args.new_tensor_path
    torch.save(image_latents[0], save_path)
    print(f"Saved output to {save_path}")


if __name__ == "__main__":
    parser = argparse.ArgumentParser()

    parser.add_argument("--test", type=str, choices=["base", "new"], default="base")
    parser.add_argument("--compare", action="store_true")

    parser.add_argument("--model_id", type=str, default="Tongyi-MAI/Z-Image-Turbo")

    parser.add_argument("--device", type=str, default="cuda")
    parser.add_argument("--dtype", type=str, default="bf16")

    parser.add_argument("--prompt", type=str, default=None)

    parser.add_argument("--num_inference_steps", type=int, default=9)
    parser.add_argument("--height", type=int, default=1024)
    parser.add_argument("--width", type=int, default=1024)
    parser.add_argument("--guidance_scale", type=float, default=0.0)
    parser.add_argument("--seed", type=int, default=42)

    parser.add_argument("--base_tensor_path", type=str, default="zimage_main.pt")
    parser.add_argument("--new_tensor_path", type=str, default="zimage_pr.pt")

    args = parser.parse_args()
    args.dtype = torch.bfloat16 if args.dtype == "bf16" else torch.float32

    main(args)
# On the main branch
>>> python scripts/zimage_test_tensor.py --test base --num_inference_steps 9
# On the PR branch
>>> python scripts/zimage_test_tensor.py --test new --num_inference_steps 8
>>> python scripts/zimage_test_tensor.py --compare
...
a=zimage_main.pt  b=zimage_pr.pt
  max abs diff:  0.000000e+00
  mean abs diff: 0.000000e+00
ASSERTION PASSED: outputs match within atol=1e-5.

If we print out the timesteps and sigmas from the 9-step run on main, we get

Timesteps: tensor([1000.0000,  954.5454,  900.0000,  833.3333,  750.0000,  642.8571, 500.0000,  300.0000,    0.0000], device='cuda:0')
Timesteps length: 9
Sigmas: tensor([1.0000, 0.9545, 0.9000, 0.8333, 0.7500, 0.6429, 0.5000, 0.3000, 0.0000, 0.0000], device='cuda:0')
Sigmas length: 10

and for the 8-step run on the PR branch we get

Timesteps: tensor([1000.0000,  954.5454,  900.0000,  833.3333,  750.0000,  642.8571, 500.0000,  300.0000], device='cuda:0')
Timesteps length: 8
Sigmas: tensor([1.0000, 0.9545, 0.9000, 0.8333, 0.7500, 0.6429, 0.5000, 0.3000, 0.0000], device='cuda:0')
Sigmas length: 9

The sigmas are the same except for an extra 0.0 at the end introduced by scheduler.sigma_min = 0.0 in main. Since the last two sigmas are 0.0 on the main schedule, the sample will not be changed by the last scheduler step (dt == 0 below):

prev_sample = sample + dt * model_output

but we will spend an extra transformer forward pass calculating model_output for the extra 0.0 in the main timestep schedule.

@sayakpaul

Copy link
Copy Markdown
Member

That makes sense. Thanks for the wonderful explanation!

Comment thread src/diffusers/modular_pipelines/z_image/before_denoise.py Outdated

@dg845 dg845 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I think we should also update the Z-Image examples so that they use the same number of effective timesteps as before (e.g. if an example used num_inference_steps=9 before the PR, we should change it to use num_inference_steps=8 instead).

rootonchair and others added 2 commits May 28, 2026 15:37
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
@rootonchair

Copy link
Copy Markdown
Contributor Author

I think we should also update the Z-Image examples so that they use the same number of effective timesteps as before (e.g. if an example used num_inference_steps=9 before the PR, we should change it to use num_inference_steps=8 instead).

@dg845 thank you for reviewing. Sure, I will update the existing examples for z-image too

@rootonchair rootonchair requested a review from dg845 May 28, 2026 09:05
@github-actions github-actions Bot added the documentation Improvements or additions to documentation label May 28, 2026

@dg845 dg845 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@dg845

dg845 commented May 29, 2026

Copy link
Copy Markdown
Collaborator

Merging as the CI failures are unrelated.

@dg845 dg845 merged commit 32ecbe3 into huggingface:main May 29, 2026
17 of 19 checks passed
@rootonchair rootonchair deleted the fix/z-image-redundant-timestep branch May 29, 2026 10:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation modular-pipelines pipelines size/M PR with diff < 200 LOC

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants