feat: add Hygon DCU INT8 hipBLASLt GEMM path by starrkk · Pull Request #1199 · ModelTC/LightX2V

starrkk · 2026-06-30T05:01:28Z

Summary

add an int8-vllm-hygon-dcu MM weight backend using Hygon DCU hipBLASLt W8A8 channelwise GEMM
support auto-quantized BF16 weights and optional selective BF16 fallback
expose helpers for shared activation quantization across repeated projections

Why

This enables faster INT8 GEMM execution on Hygon DCU while keeping the path gated by the quantization scheme and clear dependency checks.

Validation

branch rebuilt on latest ModelTC/LightX2V:main (89dfa833)
git diff --check passed for the PR branch
validated as part of the HunyuanVideo1.5 I2V 8-card benchmark path on Hygon DCU

(cherry picked from commit a3a1a1f870b768929d8ca073f0c74added572087)

Add reusable quantized-input helpers for Hygon DCU W8A8 dynamic activation GEMMs, and support selective BF16 fallback for configured INT8 weights. (cherry picked from commit 58dab25b69c41c6ec9a24df0fe584ca93534eacc)

gemini-code-assist

Code Review

This pull request introduces a selective BF16 fallback mechanism and integrates hipblaslt_w8a8_channelwise_gemm for Hygon DCU in the quantization pipeline, along with helper functions for managing weights, biases, and environment flags. It also updates module loading logic to support custom load functions. The review feedback highlights opportunities to optimize performance and robustness, specifically by caching the casted weight tensor in the BF16 fallback path to avoid redundant casting, removing redundant .contiguous() calls on already contiguous weights during GEMM execution, and safely checking for the existence of module.weight to prevent potential AttributeErrors when loading auto-quantized biases.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-30T05:03:00Z

+    def _apply_bf16(self, input_tensor):
+        weight = self.weight
+        if weight.dtype != input_tensor.dtype:
+            weight = weight.to(input_tensor.dtype)
+        bias = _bias_or_none(self, input_tensor.dtype)
+        return F.linear(input_tensor, weight, bias)


Casting a large weight tensor on every single forward pass is extremely inefficient and will cause significant GPU memory churn and latency overhead. We should cache the casted weight back to self.weight so that subsequent forward passes can reuse it directly.

Suggested change

def _apply_bf16(self, input_tensor):

weight = self.weight

if weight.dtype != input_tensor.dtype:

weight = weight.to(input_tensor.dtype)

bias = _bias_or_none(self, input_tensor.dtype)

return F.linear(input_tensor, weight, bias)

def _apply_bf16(self, input_tensor):

if self.weight.dtype != input_tensor.dtype:

self.weight = self.weight.to(input_tensor.dtype)

bias = _bias_or_none(self, input_tensor.dtype)

return F.linear(input_tensor, self.weight, bias)

gemini-code-assist · 2026-06-30T05:03:00Z

+        bias = bias.to(torch.float32)
+    elif hasattr(module, "infer_dtype"):
+        bias = bias.to(module.infer_dtype)
+    module.bias = bias.to(module.weight.device)


To prevent a potential AttributeError if module.weight is None or not yet initialized, we should safely check for its existence and fallback to the bias tensor's own device.

Suggested change

module.bias = bias.to(module.weight.device)

device = module.weight.device if getattr(module, "weight", None) is not None else bias.device

module.bias = bias.to(device)

gemini-code-assist · 2026-06-30T05:03:00Z

+        _, output_tensor = hipblaslt_gemm(
+            a=input_tensor_quant.contiguous(),
+            b=self.weight.contiguous(),
+            scale_a=input_tensor_scale.contiguous(),
+            scale_b=self.weight_scale.contiguous(),


Since self.weight and self.weight_scale are already made contiguous during the load() phase (via _make_weight_contiguous), calling .contiguous() on them during every forward pass is redundant and adds unnecessary overhead.

Suggested change

_, output_tensor = hipblaslt_gemm(

a=input_tensor_quant.contiguous(),

b=self.weight.contiguous(),

scale_a=input_tensor_scale.contiguous(),

scale_b=self.weight_scale.contiguous(),

_, output_tensor = hipblaslt_gemm(

a=input_tensor_quant.contiguous(),

b=self.weight,

scale_a=input_tensor_scale.contiguous(),

scale_b=self.weight_scale,

zhenggf added 2 commits June 30, 2026 11:50

Enable-Hygon-DCU-INT8-hipBLASLt-GEMM

197934a

(cherry picked from commit a3a1a1f870b768929d8ca073f0c74added572087)

hygon-dcu: add shared int8 activation helpers

0cc4605

Add reusable quantized-input helpers for Hygon DCU W8A8 dynamic activation GEMMs, and support selective BF16 fallback for configured INT8 weights. (cherry picked from commit 58dab25b69c41c6ec9a24df0fe584ca93534eacc)

gemini-code-assist Bot reviewed Jun 30, 2026

View reviewed changes

fix: tighten Hygon INT8 GEMM fallback handling

1213895

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add Hygon DCU INT8 hipBLASLt GEMM path#1199

feat: add Hygon DCU INT8 hipBLASLt GEMM path#1199
starrkk wants to merge 3 commits into
ModelTC:mainfrom
starrkk:codex/hygon-dcu-int8-hipblaslt

starrkk commented Jun 30, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 30, 2026

Uh oh!

gemini-code-assist Bot Jun 30, 2026

Uh oh!

gemini-code-assist Bot Jun 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	module.bias = bias.to(module.weight.device)
	device = module.weight.device if getattr(module, "weight", None) is not None else bias.device
	module.bias = bias.to(device)

Uh oh!

Conversation

starrkk commented Jun 30, 2026

Summary

Why

Validation

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant