Fix redundant Z-Image terminal timestep#13730
Conversation
sayakpaul
left a comment
There was a problem hiding this comment.
Thanks for your efforts.
I ran this script to see if the outputs diverge from outputs of this branch from the main branch and they're diverging.
Could we get an explanation of why that is expected?
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
Hi @sayakpaul , thank you for reviewing the work. Could you specify which script to run? I think the diverging coming from the fact that the |
|
|
Could you check the link? Because I got: |
|
I think the discrepancy in the script is explained by the fact that a Z-Image Test Scriptimport argparse
import torch
from diffusers import ZImagePipeline
DEFAULT_T2I_PROMPT = "一幅为名为“造相「Z-IMAGE-TURBO」”的项目设计的创意海报。画面巧妙地将文字概念视觉化:一辆复古蒸汽小火车化身为巨大的拉链头,正拉开厚厚的冬日积雪,展露出一个生机盎然的春天。"
def compare(path_a, path_b):
image_a = torch.load(path_a, weights_only=True)
image_b = torch.load(path_b, weights_only=True)
diff = (image_a - image_b).abs()
print(f"a={path_a} b={path_b}")
print(f" max abs diff: {diff.max().item():.6e}")
print(f" mean abs diff: {diff.mean().item():.6e}")
torch.testing.assert_close(image_a, image_b, rtol=0.0, atol=1e-5)
print("ASSERTION PASSED: outputs match within atol=1e-5.")
def main(args):
if args.compare:
compare(args.base_tensor_path, args.new_tensor_path)
return
pipe = ZImagePipeline.from_pretrained(args.model_id, torch_dtype=args.dtype)
pipe.to(device=args.device)
prompt = args.prompt
if prompt is None:
prompt = DEFAULT_T2I_PROMPT
image_latents = pipe(
prompt=prompt,
height=args.height,
width=args.width,
num_inference_steps=args.num_inference_steps,
guidance_scale=args.guidance_scale,
generator=torch.Generator(device=args.device).manual_seed(args.seed),
output_type="latent",
return_dict=False,
)
if args.test == "base":
save_path = args.base_tensor_path
else:
save_path = args.new_tensor_path
torch.save(image_latents[0], save_path)
print(f"Saved output to {save_path}")
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--test", type=str, choices=["base", "new"], default="base")
parser.add_argument("--compare", action="store_true")
parser.add_argument("--model_id", type=str, default="Tongyi-MAI/Z-Image-Turbo")
parser.add_argument("--device", type=str, default="cuda")
parser.add_argument("--dtype", type=str, default="bf16")
parser.add_argument("--prompt", type=str, default=None)
parser.add_argument("--num_inference_steps", type=int, default=9)
parser.add_argument("--height", type=int, default=1024)
parser.add_argument("--width", type=int, default=1024)
parser.add_argument("--guidance_scale", type=float, default=0.0)
parser.add_argument("--seed", type=int, default=42)
parser.add_argument("--base_tensor_path", type=str, default="zimage_main.pt")
parser.add_argument("--new_tensor_path", type=str, default="zimage_pr.pt")
args = parser.parse_args()
args.dtype = torch.bfloat16 if args.dtype == "bf16" else torch.float32
main(args)# On the main branch
>>> python scripts/zimage_test_tensor.py --test base --num_inference_steps 9
# On the PR branch
>>> python scripts/zimage_test_tensor.py --test new --num_inference_steps 8
>>> python scripts/zimage_test_tensor.py --compare
...
a=zimage_main.pt b=zimage_pr.pt
max abs diff: 0.000000e+00
mean abs diff: 0.000000e+00
ASSERTION PASSED: outputs match within atol=1e-5.If we print out the Timesteps: tensor([1000.0000, 954.5454, 900.0000, 833.3333, 750.0000, 642.8571, 500.0000, 300.0000, 0.0000], device='cuda:0')
Timesteps length: 9
Sigmas: tensor([1.0000, 0.9545, 0.9000, 0.8333, 0.7500, 0.6429, 0.5000, 0.3000, 0.0000, 0.0000], device='cuda:0')
Sigmas length: 10and for the Timesteps: tensor([1000.0000, 954.5454, 900.0000, 833.3333, 750.0000, 642.8571, 500.0000, 300.0000], device='cuda:0')
Timesteps length: 8
Sigmas: tensor([1.0000, 0.9545, 0.9000, 0.8333, 0.7500, 0.6429, 0.5000, 0.3000, 0.0000], device='cuda:0')
Sigmas length: 9The but we will spend an extra transformer forward pass calculating |
|
That makes sense. Thanks for the wonderful explanation! |
dg845
left a comment
There was a problem hiding this comment.
Thanks for the PR! I think we should also update the Z-Image examples so that they use the same number of effective timesteps as before (e.g. if an example used num_inference_steps=9 before the PR, we should change it to use num_inference_steps=8 instead).
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
@dg845 thank you for reviewing. Sure, I will update the existing examples for z-image too |
|
Merging as the CI failures are unrelated. |
What does this PR do?
Summary
This PR fixes a redundant terminal denoising step in Z-Image pipelines.
Previously, Z-Image pipelines mutated the scheduler with
scheduler.sigma_min = 0.0.With FlowMatchEulerDiscreteScheduler, this caused the generated timestep/sigma schedule to include a model-forward step at terminal sigma 0.0, while the scheduler also appended its own terminal 0.0. The final denoising step therefore became a no-op transition from 0.0 -> 0.0.
This change makes Z-Image pipelines compute their default sigma schedule up front and pass it through the existing sigmas argument, instead of mutating scheduler.sigma_min.
Behavior
Old Schedule
New Schedule
Result Check
Generated 1024x1024 images with Tongyi-MAI/Z-Image-Turbo, prompt dance monkey, seed 0.
New 8 steps
Old 9 steps
Before submitting
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@sayakpaul