Skip to content

[#14679][fix] Fix fused-QKV TP sharding for Phi-3/Phi-4#15475

Open
guan404ming wants to merge 1 commit into
NVIDIA:mainfrom
guan404ming:fix/autodeploy-phi4-fused-qkv-tp-sharding
Open

[#14679][fix] Fix fused-QKV TP sharding for Phi-3/Phi-4#15475
guan404ming wants to merge 1 commit into
NVIDIA:mainfrom
guan404ming:fix/autodeploy-phi4-fused-qkv-tp-sharding

Conversation

@guan404ming

@guan404ming guan404ming commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Summary by CodeRabbit

Release Notes

  • New Features

    • Enabled Microsoft Phi-4 model variants (microsoft/phi-4, microsoft/Phi-4-reasoning, microsoft/Phi-4-reasoning-plus) for automatic deployment.
  • Bug Fixes

    • Fixed tensor parallel sharding computation for fused weight operations to ensure correct dimension calculations.

Description

close #14679

_determine_fused_weight_dims computed the q/k/v split sizes but never returned them, so tensor-parallel column sharding of a fused qkv_proj got None and broke Phi-3/Phi-4 at TP≥2 (reduction-dim mismatch [s44*s70, 3840] X [2560, 5120]).

Test Coverage

  • test_determine_fused_weight_dims_qkv -> New regression test. Exports a fused-QKV block and asserts _determine_fused_weight_dims returns the [q, k, v] split sizes (not None).
  • test_tp_sharding.py -> Guards that the broader column/row TP-sharding path still produces correct sharded outputs
  • manually tested on local L4 GPU and worked as expected

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • If PR introduces API changes, an appropriate PR label is added - either api-compatible or api-breaking. For api-breaking, include BREAKING in the PR title.

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

@guan404ming guan404ming requested a review from a team as a code owner June 18, 2026 10:29
Signed-off-by: Guan-Ming (Wesley) Chiu <105915352+guan404ming@users.noreply.github.com>
@guan404ming guan404ming force-pushed the fix/autodeploy-phi4-fused-qkv-tp-sharding branch from 6f022ca to 302dcbe Compare June 23, 2026 15:38
@coderabbitai

coderabbitai Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 27bdac07-4361-4aab-8911-16473de8c9db

📥 Commits

Reviewing files that changed from the base of the PR and between beb922f and 302dcbe.

📒 Files selected for processing (3)
  • examples/auto_deploy/model_registry/models.yaml
  • tensorrt_llm/_torch/auto_deploy/transform/library/sharding.py
  • tests/unittest/auto_deploy/multigpu/transformations/library/test_tp_sharding.py

📝 Walkthrough

Walkthrough

Fixes a bug in _determine_fused_weight_dims where the function fell through without returning its computed fused_weight_dims, yielding None to callers and causing shape mismatches. The fix corrects the return annotation and adds an explicit return. A regression test is added, and the three disabled Phi-4 model registry entries are re-enabled.

Changes

Phi-4 Shape Mismatch Fix

Layer / File(s) Summary
_determine_fused_weight_dims bug fix and regression test
tensorrt_llm/_torch/auto_deploy/transform/library/sharding.py, tests/unittest/auto_deploy/multigpu/transformations/library/test_tp_sharding.py
Return type annotation changed from None to Optional[List[int]] and an explicit return fused_weight_dims added. A new _FusedQKVProj model and test_determine_fused_weight_dims_qkv regression test export an FX graph, locate QKV slice nodes, and assert the returned split sizes [q, k, v] are correct.
Re-enable Phi-4 model registry entries
examples/auto_deploy/model_registry/models.yaml
microsoft/phi-4, microsoft/Phi-4-reasoning, and microsoft/Phi-4-reasoning-plus are uncommented and activated with config_id: default_ws_2 and yaml_extra pointing to dashboard_default.yaml and world_size_2.yaml.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 42.86% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The PR title clearly identifies the fix for fused-QKV TP sharding issue affecting Phi-3/Phi-4 models, directly corresponding to the main change in the changeset.
Description check ✅ Passed The PR description includes a clear explanation of the issue, the solution, comprehensive test coverage details, and a completed PR checklist following the template requirements.
Linked Issues check ✅ Passed The PR fully addresses issue #14679 by fixing the _determine_fused_weight_dims function to return split sizes, enabling proper TP sharding for Phi-4 models, and re-enabling the three disabled model entries.
Out of Scope Changes check ✅ Passed All changes are directly related to fixing the fused-QKV TP sharding bug: the core fix in sharding.py, regression test in test_tp_sharding.py, and model registry updates are all within scope.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

@guan404ming

Copy link
Copy Markdown
Contributor Author

Hi @govind-ramnarayan could you help take a look at this one, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[bug][AutoDeploy]: Phi-4 path fails with a shape mismatch error

1 participant