[None][chore] Allow fp8 per-tensor base weights for MoE LoRA#15528
[None][chore] Allow fp8 per-tensor base weights for MoE LoRA#15528brb-nv wants to merge 1 commit into
Conversation
0c686b2 to
9028723
Compare
9028723 to
cde2b98
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #56244 [ run ] triggered by Bot. Commit: |
📝 WalkthroughWalkthroughExtends MoE LoRA to support per-tensor FP8 (qdq) quantized base weights. Python validation gains a ChangesMoE LoRA FP8 Support
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
tests/unittest/_torch/lora/test_moe_lora_validator.py (1)
93-147: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winAdd a test for the new quant-wrapper normalization path.
These cases only exercise plain
QuantModeinputs, buttensorrt_llm/_torch/peft/lora/validation.pynow has a separate.objsnormalization branch. Please add a tiny fake wrapper intests/unittest/_torch/lora/test_moe_lora_validator.pyand cover at least one accepted mix ([QuantMode(0), _FP8_QDQ]) and one rejected mix ([_FP8_QDQ, _FP8_BLOCK_SCALE]) so that branch cannot regress silently.As per path instructions, “Act as a QA engineer reviewing test changes and coverage” and “Keep feedback actionable”; coverage here is insufficient for the new wrapper path.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unittest/_torch/lora/test_moe_lora_validator.py` around lines 93 - 147, Add coverage for the new quant-wrapper normalization branch in check_moe_lora_supported: the current tests only pass plain QuantMode values, so create a small fake wrapper with an objs field in test_moe_lora_validator.py and use it to exercise the .objs path in tensorrt_llm/_torch/peft/lora/validation.py. Add one passing case with a mixed list such as QuantMode(0) plus _FP8_QDQ, and one failing case with _FP8_QDQ plus _FP8_BLOCK_SCALE, so the wrapper normalization logic is validated for both accepted and rejected inputs.Source: Path instructions
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/integration/defs/llmapi/test_llm_api_pytorch_moe_lora.py`:
- Around line 212-217: The mixed-batch no-LoRA row is only being checked for
non-empty output, which can miss cross-request adapter contamination. Update the
assertions in the relevant mixed-batch test cases that use out_tokens and
base_tokens so the rank-0/no-LoRA row is compared directly against the
standalone base_tokens baseline, while keeping the existing per-LoRA row checks
intact. Use the mixed-batch assertions around out_tokens[0] and the standalone
baseline variables to ensure any accidental adapter application fails loudly.
In `@tests/integration/test_lists/test-db/l0_h100.yml`:
- Line 259: The pre-merge H100 test list still only covers the Qwen bf16 path,
so add the new Mixtral routed-expert FP8 case to the pre-merge tier in the
test-db list, or include an equivalent pre-merge FP8 test alongside it. Update
the relevant entry in the H100 list used by the auto-run CI test-db so the FP8
enablement is exercised before merge, and ensure the chosen test name matches
the existing Mixtral FP8 case referenced in the integration suite.
In `@tests/unittest/llmapi/test_llm_pytorch.py`:
- Around line 1096-1100: The module defines TestLlmError twice, so the later
class overwrites the earlier one and pytest only collects one suite. Rename one
of the TestLlmError classes in tests/unittest/llmapi/test_llm_pytorch.py,
keeping the existing error-path tests like test_max_num_token_check under
distinct class names so both suites are discoverable. Verify collection includes
both classes and that coverage is sufficient after the rename.
---
Nitpick comments:
In `@tests/unittest/_torch/lora/test_moe_lora_validator.py`:
- Around line 93-147: Add coverage for the new quant-wrapper normalization
branch in check_moe_lora_supported: the current tests only pass plain QuantMode
values, so create a small fake wrapper with an objs field in
test_moe_lora_validator.py and use it to exercise the .objs path in
tensorrt_llm/_torch/peft/lora/validation.py. Add one passing case with a mixed
list such as QuantMode(0) plus _FP8_QDQ, and one failing case with _FP8_QDQ plus
_FP8_BLOCK_SCALE, so the wrapper normalization logic is validated for both
accepted and rejected inputs.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 4522f74d-ca40-4f79-9f77-7659393a283a
📒 Files selected for processing (11)
cpp/tensorrt_llm/thop/moeOp.cpptensorrt_llm/_torch/models/modeling_mixtral.pytensorrt_llm/_torch/peft/lora/validation.pytests/integration/defs/llmapi/test_llm_api_pytorch_moe_lora.pytests/integration/test_lists/qa/llm_function_core.txttests/integration/test_lists/test-db/l0_a100.ymltests/integration/test_lists/test-db/l0_h100.ymltests/integration/test_lists/waives.txttests/unittest/_torch/lora/test_moe_lora_op.pytests/unittest/_torch/lora/test_moe_lora_validator.pytests/unittest/llmapi/test_llm_pytorch.py
💤 Files with no reviewable changes (2)
- tests/integration/test_lists/test-db/l0_a100.yml
- tests/integration/test_lists/waives.txt
| assert out_tokens[0], "No-LoRA row in the mixed batch produced no tokens." | ||
| for i in range(len(lora_requests)): | ||
| assert out_tokens[i + 1] != base_tokens, ( | ||
| f"Quantized routed-expert MoE LoRA adapter {i} produced output " | ||
| "identical to the base model; it was not applied." | ||
| ) |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Assert that the mixed-batch no-LoRA row matches the standalone baseline.
Both sites only check that out_tokens[0] is non-empty. That misses the isolation failure these tests are meant to catch: the rank-0 row can accidentally pick up another request's adapter and still pass. Compare out_tokens[0] to base_tokens in both places so cross-request contamination fails loudly.
Suggested fix
- assert out_tokens[0], "No-LoRA row in the mixed batch produced no tokens."
+ assert out_tokens[0] == base_tokens, (
+ "No-LoRA row in the mixed batch diverged from the standalone base output."
+ )- assert out_tokens[0], "No-LoRA row in the mixed batch produced no tokens."
+ assert out_tokens[0] == base_tokens, (
+ "No-LoRA row in the mixed batch diverged from the standalone base output."
+ )As per path instructions, “Act as a QA engineer reviewing test changes and coverage” and “Keep feedback actionable”; coverage is insufficient until the no-LoRA row is validated against the base output.
Also applies to: 345-353
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/integration/defs/llmapi/test_llm_api_pytorch_moe_lora.py` around lines
212 - 217, The mixed-batch no-LoRA row is only being checked for non-empty
output, which can miss cross-request adapter contamination. Update the
assertions in the relevant mixed-batch test cases that use out_tokens and
base_tokens so the rank-0/no-LoRA row is compared directly against the
standalone base_tokens baseline, while keeping the existing per-LoRA row checks
intact. Use the mixed-batch assertions around out_tokens[0] and the standalone
baseline variables to ensure any accidental adapter application fails loudly.
Source: Path instructions
| - unittest/llmapi/test_llm_pytorch.py -m "part2" | ||
| - unittest/llmapi/test_llm_pytorch.py -m "part3" | ||
| - unittest/llmapi/test_llm_pytorch.py::test_qwen_moe_routed_expert_multi_lora_varying_ranks TIMEOUT (90) | ||
| - llmapi/test_llm_api_pytorch_moe_lora.py::test_qwen_moe_routed_expert_multi_lora_varying_ranks TIMEOUT (90) |
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick win
Promote an FP8 Mixtral case into pre-merge H100 coverage.
The pre-merge H100 entry still exercises only the Qwen bf16 path, while the new Mixtral routed-expert FP8 case is post-merge only. That leaves the PR’s headline FP8 enablement unguarded before merge. Please add one FP8 Mixtral case to the pre-merge tier, or add an equivalent pre-merge FP8 test.
As per path instructions, files under tests/integration/test_lists/test-db/** “drive the auto-run CI test-db”; coverage here is insufficient for the new FP8 path.
Also applies to: 334-334
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/integration/test_lists/test-db/l0_h100.yml` at line 259, The pre-merge
H100 test list still only covers the Qwen bf16 path, so add the new Mixtral
routed-expert FP8 case to the pre-merge tier in the test-db list, or include an
equivalent pre-merge FP8 test alongside it. Update the relevant entry in the
H100 list used by the auto-run CI test-db so the FP8 enablement is exercised
before merge, and ensure the chosen test name matches the existing Mixtral FP8
case referenced in the integration suite.
Source: Path instructions
| class TestLlmError: | ||
|
|
||
| @pytest.mark.part3 | ||
| def test_max_num_token_check(self): | ||
| """ LLM should raise error when got prompt length exceed the valid range. """ | ||
| """LLM should raise error when got prompt length exceed the valid range.""" |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Rename one of the TestLlmError classes so both suites are collected.
This module defines TestLlmError twice. The second definition at Line 1394 overwrites the earlier one at Line 1096 during import, so pytest only collects the later class and silently drops the earlier error-path coverage in tests/unittest/llmapi/test_llm_pytorch.py. Coverage is insufficient until both classes are discoverable.
Proposed fix
-class TestLlmError:
+class TestLlmErrorMaxNumTokens:As per path instructions, "tests/**: Act as a QA engineer reviewing test changes and coverage for TensorRT-LLM. Keep feedback actionable: suggest concrete list file names and whether coverage is sufficient, insufficient, or needs follow-up outside the PR."
Also applies to: 1394-1398
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/unittest/llmapi/test_llm_pytorch.py` around lines 1096 - 1100, The
module defines TestLlmError twice, so the later class overwrites the earlier one
and pytest only collects one suite. Rename one of the TestLlmError classes in
tests/unittest/llmapi/test_llm_pytorch.py, keeping the existing error-path tests
like test_max_num_token_check under distinct class names so both suites are
discoverable. Verify collection includes both classes and that coverage is
sufficient after the rename.
Source: Path instructions
|
PR_Github #56244 [ run ] completed with state
|
cde2b98 to
a0d30c4
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #56418 [ run ] triggered by Bot. Commit: |
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
a0d30c4 to
ca2bb08
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #56435 [ run ] triggered by Bot. Commit: |
|
PR_Github #56418 [ run ] completed with state |
|
PR_Github #56435 [ run ] completed with state
|
Description
This MR does the following in context of supporting MoE lora:
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
If PR introduces API changes, an appropriate PR label is added - either
api-compatibleorapi-breaking. Forapi-breaking, includeBREAKINGin the PR title.Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.Summary by CodeRabbit
New Features
Bug Fixes