[#14679][fix] Fix fused-QKV TP sharding for Phi-3/Phi-4#15475
[#14679][fix] Fix fused-QKV TP sharding for Phi-3/Phi-4#15475guan404ming wants to merge 1 commit into
Conversation
Signed-off-by: Guan-Ming (Wesley) Chiu <105915352+guan404ming@users.noreply.github.com>
6f022ca to
302dcbe
Compare
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (3)
📝 WalkthroughWalkthroughFixes a bug in ChangesPhi-4 Shape Mismatch Fix
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
Hi @govind-ramnarayan could you help take a look at this one, thanks! |
Summary by CodeRabbit
Release Notes
New Features
microsoft/phi-4,microsoft/Phi-4-reasoning,microsoft/Phi-4-reasoning-plus) for automatic deployment.Bug Fixes
Description
close #14679
_determine_fused_weight_dimscomputed the q/k/v split sizes but never returned them, so tensor-parallel column sharding of a fused qkv_proj got None and broke Phi-3/Phi-4 at TP≥2 (reduction-dim mismatch [s44*s70, 3840] X [2560, 5120]).Test Coverage
test_determine_fused_weight_dims_qkv-> New regression test. Exports a fused-QKV block and asserts_determine_fused_weight_dimsreturns the [q, k, v] split sizes (not None).test_tp_sharding.py-> Guards that the broader column/row TP-sharding path still produces correct sharded outputsPR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
If PR introduces API changes, an appropriate PR label is added - either
api-compatibleorapi-breaking. Forapi-breaking, includeBREAKINGin the PR title.Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.