Skip to content

fix ub overflow in move_cache_dynamic_last_kernel_h_block kernel#477

Open
Li-brua wants to merge 1 commit into
sgl-project:mainfrom
Li-brua:fix-ub-overflow
Open

fix ub overflow in move_cache_dynamic_last_kernel_h_block kernel#477
Li-brua wants to merge 1 commit into
sgl-project:mainfrom
Li-brua:fix-ub-overflow

Conversation

@Li-brua
Copy link
Copy Markdown

@Li-brua Li-brua commented May 15, 2026

Fix Triton compilation failure caused by UB overflow in
move_cache_dynamic_last_kernel_h_block on Ascend NPUs. See issue: sgl-project/sglang#25330

The previous implementation only tiled the H dimension while fully materializing
V x K blocks:

BLOCK_V = triton.next_power_of_2(V)
BLOCK_K = triton.next_power_of_2(K)

For large V/K values, this could generate oversized temporary buffers during
BiShengHIR lowering and fail with:

ub overflow, requires 2097152 bits while 1572864 bits available

This PR additionally tiles V and K dimensions to reduce per-tile UB usage and added upper bounds for Triton block sizes to avoide overflow.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@Li-brua Li-brua force-pushed the fix-ub-overflow branch from 7f5e6d6 to b19891c Compare May 15, 2026 02:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant