Skip to content

perf: improve HunyuanVideo1.5 I2V runtime and VAE decode controls#1201

Open
starrkk wants to merge 5 commits into
ModelTC:mainfrom
starrkk:codex/hunyuan-vae-i2v-runtime-optimizations
Open

perf: improve HunyuanVideo1.5 I2V runtime and VAE decode controls#1201
starrkk wants to merge 5 commits into
ModelTC:mainfrom
starrkk:codex/hunyuan-vae-i2v-runtime-optimizations

Conversation

@starrkk

@starrkk starrkk commented Jun 30, 2026

Copy link
Copy Markdown

Summary

  • enable Hygon 8-card HunyuanVideo1.5 I2V runtime compatibility fixes
  • add VAE rank-0 postprocess helpers and output cropping support
  • add optional VAE decode controls, detail timing, and convolution-shape logging
  • include the Hygon DCU SLA top-k environment fix required by this runtime path

Why

This groups the HunyuanVideo1.5 I2V runtime changes that were validated together for 8-card Hygon DCU inference. This is intentionally opened as a draft because it is broader than the smaller PRs and may be easier to review after splitting further.

Validation

  • branch rebuilt on latest ModelTC/LightX2V:main (89dfa833)
  • git diff --check passed for the PR branch
  • validated as part of the HunyuanVideo1.5 I2V 8-card benchmark path on Hygon DCU

zhenggf added 3 commits June 30, 2026 11:50
(cherry picked from commit d60b8f32c7787054faba8fbacaf5c38fac3ffbfb)
(cherry picked from commit e8ee93a79bd20dce2d084e992a8e140710f2c9b6)
(cherry picked from commit b066001a517b59e5ddbf8f7dcce4a14a017be46d)

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces several enhancements and compatibility fixes across the repository, including conditional pipeline imports, backward-compatible unpadding for attention layers, VAE post-processing utilities (such as cropping and rank-0 post-processing skips), detailed timing logs, and fallback support for 4D tensors in SDPA. The code review identified three issues: a runtime AttributeError due to the non-existent is_cpu attribute on PyTorch tensors, a critical layout detection bug in _spatial_dims when the frame count is 16 or 32, and a potential AttributeError when accessing seq_p_group directly.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +116 to +119
if cu_seqlens_q is not None and cu_seqlens_q.is_cpu:
cu_seqlens_q = cu_seqlens_q.to(q_flat.device, non_blocking=True)
if cu_seqlens_kv is not None and cu_seqlens_kv.is_cpu:
cu_seqlens_kv = cu_seqlens_kv.to(k_flat.device, non_blocking=True)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

In PyTorch, torch.Tensor does not have an is_cpu attribute. Attempting to access cu_seqlens_q.is_cpu or cu_seqlens_kv.is_cpu will raise an AttributeError at runtime.

Instead, you should check the device type using .device.type == "cpu".

Suggested change
if cu_seqlens_q is not None and cu_seqlens_q.is_cpu:
cu_seqlens_q = cu_seqlens_q.to(q_flat.device, non_blocking=True)
if cu_seqlens_kv is not None and cu_seqlens_kv.is_cpu:
cu_seqlens_kv = cu_seqlens_kv.to(k_flat.device, non_blocking=True)
if cu_seqlens_q is not None and cu_seqlens_q.device.type == "cpu":
cu_seqlens_q = cu_seqlens_q.to(q_flat.device, non_blocking=True)
if cu_seqlens_kv is not None and cu_seqlens_kv.device.type == "cpu":
cu_seqlens_kv = cu_seqlens_kv.to(k_flat.device, non_blocking=True)

Comment on lines +58 to +66
def _spatial_dims(video):
if video.ndim != 5:
return -2, -1
# VAE tensors are usually B,C,T,H,W before wan_vae_to_comfy and
# B,T,H,W,C afterwards. In both layouts, H/W are the two dims before
# channels only for the postprocessed form; rank0 crop is done before it.
if video.shape[1] in (1, 3, 4, 16, 32):
return 3, 4
return 2, 3

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

There is a critical layout detection bug in _spatial_dims.

If the video tensor is in B,T,H,W,C layout (postprocessed form) and the number of frames T is 16 or 32 (which are extremely common video lengths), video.shape[1] will be 16 or 32.
Since 16 and 32 are in (1, 3, 4, 16, 32), the function will return 3, 4 (treating W and C as spatial dimensions) instead of 2, 3 (treating H and W as spatial dimensions). This will cause incorrect cropping of the channel dimension and corrupt the output.

To make this robust, we can check if the last dimension is the channel dimension (typically 1, 3, 4, 16, 32) and is smaller than or equal to the second dimension (T).

Suggested change
def _spatial_dims(video):
if video.ndim != 5:
return -2, -1
# VAE tensors are usually B,C,T,H,W before wan_vae_to_comfy and
# B,T,H,W,C afterwards. In both layouts, H/W are the two dims before
# channels only for the postprocessed form; rank0 crop is done before it.
if video.shape[1] in (1, 3, 4, 16, 32):
return 3, 4
return 2, 3
def _spatial_dims(video):
if video.ndim != 5:
return -2, -1
# B,T,H,W,C layout: last dimension is channels (typically 1, 3, 4, 16, 32)
# and is smaller than or equal to the second dimension (T).
if video.shape[-1] in (1, 3, 4, 16, 32) and video.shape[1] >= video.shape[-1]:
return 2, 3
# B,C,T,H,W layout: second dimension is channels
if video.shape[1] in (1, 3, 4, 16, 32):
return 3, 4
return 2, 3

@starrkk starrkk marked this pull request as ready for review June 30, 2026 09:46
@helloyongyang

helloyongyang commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

@starrkk
Could you elaborate on the optimization details and provide test data from the ablation experiment?

In addition, please minimize the use of environment variables as switches and use JSON-formatted configuration files whenever possible.

Also, please pay attention to the code format:

pip install ruff pre-commit

pre-commit run --all-files

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants