[https://nvbugs/6276981][fix] Force the q-split + allgather code path whenever q_split_eligible=True (drop… by tensorrt-cicd · Pull Request #15474 · NVIDIA/TensorRT-LLM

tensorrt-cicd · 2026-06-18T07:24:35Z

Summary

Root cause: When q_split_eligible=True but apply_q_split=False (chunk smaller than threshold), each TP rank ran fp8_mqa_logits independently on the full chunk; the DeepGEMM kernel is not bit-exact across launches, so per-rank topk indices diverged and downstream MLA attention attended to different KV positions on different ranks, corrupting KV-cache writes.
Fix: Force the q-split + allgather code path whenever q_split_eligible=True (drop the chunk_num_token >= q_split_threshold gate). The per-token canonical owner from slice + allgather erases per-rank nondeterminism.
Automated fix generated by repair-bot

Test plan

Verify fix on the same GPU type as the original failure
Check for regressions in related tests

Links

Bug: https://nvbugs/6276981

Summary by CodeRabbit

Release Notes

Performance
- Refined sparse attention path selection logic during prefill operations with distributed tensor configurations. Chunk processing now consistently applies distributed synchronization when specific eligibility criteria are met.
Tests
- Re-enabled a previously skipped test case covering multi-GPU inference with chunked prefill, strengthening quality assurance for distributed configurations.

coderabbitai · 2026-06-18T07:28:42Z

Caution

Review failed

An error occurred during the review process. Please try again later.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…n eligible When the indexer chunked-prefill is gated by q_split_eligible (TP > 1, no attention DP) but apply_q_split is False (chunk smaller than q_split_threshold), every TP rank computes the full chunk's topk indices independently via fp8_mqa_logits / fp8_fp4_mqa_logits. Those DeepGEMM kernels are not bit-exact across launches, so per-rank topk indices diverge for the same tokens. The downstream MLA attention then attends to different KV positions on different ranks, corrupting KV-cache writes. Short generations (MMLU's 2-token answers) hide it; long ones (GSM8K's 256 tokens) compound it into garbage and 0% accuracy. Force the q-split + allgather path whenever eligible: small chunks pay a microscopic allgather instead of redundant per-rank logits compute, and the per-token canonical owner from the slice + allgather erases any rank-local nondeterminism before downstream layers read the indices. q_split_threshold < 0 still fully disables eligibility. Signed-off-by: tensorrt-cicd <90828364+tensorrt-cicd@users.noreply.github.com>

tensorrt-cicd requested a review from a team as a code owner June 18, 2026 07:24

tensorrt-cicd requested a review from PerkzZheng June 18, 2026 07:24

tensorrt-cicd assigned nvxuanyuc Jun 18, 2026

github-actions Bot assigned tensorrt-cicd Jun 18, 2026

tensorrt-cicd force-pushed the repair-bot-bug6276981 branch 3 times, most recently from 290fdfd to 9b45f57 Compare June 25, 2026 11:00

tensorrt-cicd force-pushed the repair-bot-bug6276981 branch from 9b45f57 to 69844b1 Compare June 28, 2026 23:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[https://nvbugs/6276981][fix] Force the q-split + allgather code path whenever q_split_eligible=True (drop…#15474

[https://nvbugs/6276981][fix] Force the q-split + allgather code path whenever q_split_eligible=True (drop…#15474
tensorrt-cicd wants to merge 1 commit into
NVIDIA:mainfrom
tensorrt-cicd:repair-bot-bug6276981

tensorrt-cicd commented Jun 18, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 18, 2026 •

edited

Loading

Review failed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

tensorrt-cicd commented Jun 18, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Links

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tensorrt-cicd commented Jun 18, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 18, 2026 •

edited

Loading