[https://nvbugs/5969216][fix] Ministral3 loading fix#12743
[https://nvbugs/5969216][fix] Ministral3 loading fix#12743evezhier wants to merge 9 commits intoNVIDIA:mainfrom
Conversation
Signed-off-by: Olya Kozlova <okozlova@nvidia.com>
Signed-off-by: Olya Kozlova <okozlova@nvidia.com>
Signed-off-by: Olya Kozlova <okozlova@nvidia.com>
Signed-off-by: Olya Kozlova <okozlova@nvidia.com>
fc87519 to
c66b9a2
Compare
Signed-off-by: Olya Kozlova <okozlova@nvidia.com>
📝 WalkthroughWalkthroughChanges extend Hugging Face weight loader to support consolidated checkpoint selection via a new Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes 🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tensorrt_llm/_torch/models/modeling_mistral.py (1)
373-379:⚠️ Potential issue | 🔴 CriticalText-only
mistral_large_3requests will now fail.Line 379 makes
self.text_processoraMistralCommonImageProcessor, but the text-only path later calls it without animagesargument. SinceMistralCommonImageProcessor.__call__requiresimages, text-only preprocessing now raisesTypeError.Minimal fix
- def __call__(self, text, images, **kwargs): + def __call__(self, text, images=None, **kwargs):🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tensorrt_llm/_torch/models/modeling_mistral.py` around lines 373 - 379, The current branch for model_type == "mistral_large_3" assigns self.text_processor to a MistralCommonImageProcessor which requires an images arg and breaks text-only paths; instead instantiate or assign a text-only processor to self.text_processor (e.g., a MistralTextProcessor or tokenizer-backed text processor) while keeping self._processor as the image-capable MistralCommonImageProcessor so text-only preprocessing (calls to self.text_processor(...)) do not require an images parameter.
🧹 Nitpick comments (1)
tests/unittest/_torch/models/checkpoints/mistral/test_weight_mapper.py (1)
7-47: Extend this regression to cover the new mapper paths.This test only exercises
rename_by_params_map(). The PR also changespermute_qk()and adds thek_fake_quantizer.qscale_act/v_fake_quantizer.qscale_actmappings, but neither path is asserted here, so the new loading logic can still regress without failing this test.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unittest/_torch/models/checkpoints/mistral/test_weight_mapper.py` around lines 7 - 47, The test currently only covers rename_by_params_map but must also exercise the new permute_qk path and the added k_fake_quantizer.qscale_act / v_fake_quantizer.qscale_act mappings; update test_rename_by_params_map to include source keys for both "layers.0.attention.k_fake_quantizer.qscale_act" and "layers.0.attention.v_fake_quantizer.qscale_act" (so they map to the new targets in mapper.mistral_llm_mapping) and add an assertion that mapper.permute_qk was applied for Q/K tensors by invoking mapper.permute_qk on a mock q/k pair (or include q/k-shaped tensors in input_weights and assert their shapes/ordering in result) using the MistralWeightMapper instance and the mistral_llm_mapping so the permute_qk code path and the kv fake-quantizer mappings are covered.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tensorrt_llm/_torch/models/modeling_mistral.py`:
- Line 509: The condition using getattr(config, "input_processor_type", None) in
("mistral_large_3") uses substring membership because ("mistral_large_3") is a
string; change it to an exact equality check by comparing getattr(config,
"input_processor_type", None) == "mistral_large_3" so only the exact value
matches (update the conditional in modeling_mistral.py where
input_processor_type is checked).
In `@tensorrt_llm/_torch/pyexecutor/config_utils.py`:
- Around line 284-288: The code is assigning the string "MinistralConfig" to
config_class and then calling .from_pretrained on that string; replace this with
the actual HF class and call its from_pretrained method — change the branch
where model_type == "mistral3" to use Mistral3Config (imported from
transformers) and call Mistral3Config.from_pretrained(model_name_or_path,
**kwargs) to produce model_config; ensure the top of the module imports
Mistral3Config (or conditionally import it) instead of relying on a string in
the variable config_class.
In `@tensorrt_llm/llmapi/llm_utils.py`:
- Line 434: Replace the truthiness check on
hf_quant_config.get("weight_block_size") with an explicit presence check so a
present-but-falsey value still selects block-scales instead of falling through
to QuantAlgo.FP8; specifically, update the conditional around hf_quant_config in
llm_utils.py (the block that decides between block-scales and QuantAlgo.FP8) to
use either "'weight_block_size' in hf_quant_config" or
"hf_quant_config.get('weight_block_size') is not None" when deciding the code
path for block-scales handling.
In `@tests/unittest/_torch/models/checkpoints/mistral/test_weight_mapper.py`:
- Around line 1-4: Add the standard NVIDIA file header to the top of this new
test file so it includes the repository copyright/ownership notice and updated
year; modify the file
tests/unittest/_torch/models/checkpoints/mistral/test_weight_mapper.py to
prepend the standard NVIDIA header block above the imports (before pytest,
torch, and the MistralWeightMapper import) ensuring the header format matches
other files in the repo.
---
Outside diff comments:
In `@tensorrt_llm/_torch/models/modeling_mistral.py`:
- Around line 373-379: The current branch for model_type == "mistral_large_3"
assigns self.text_processor to a MistralCommonImageProcessor which requires an
images arg and breaks text-only paths; instead instantiate or assign a text-only
processor to self.text_processor (e.g., a MistralTextProcessor or
tokenizer-backed text processor) while keeping self._processor as the
image-capable MistralCommonImageProcessor so text-only preprocessing (calls to
self.text_processor(...)) do not require an images parameter.
---
Nitpick comments:
In `@tests/unittest/_torch/models/checkpoints/mistral/test_weight_mapper.py`:
- Around line 7-47: The test currently only covers rename_by_params_map but must
also exercise the new permute_qk path and the added k_fake_quantizer.qscale_act
/ v_fake_quantizer.qscale_act mappings; update test_rename_by_params_map to
include source keys for both "layers.0.attention.k_fake_quantizer.qscale_act"
and "layers.0.attention.v_fake_quantizer.qscale_act" (so they map to the new
targets in mapper.mistral_llm_mapping) and add an assertion that
mapper.permute_qk was applied for Q/K tensors by invoking mapper.permute_qk on a
mock q/k pair (or include q/k-shaped tensors in input_weights and assert their
shapes/ordering in result) using the MistralWeightMapper instance and the
mistral_llm_mapping so the permute_qk code path and the kv fake-quantizer
mappings are covered.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 1610eca5-a029-4955-ae8f-b7436983c76c
📒 Files selected for processing (9)
tensorrt_llm/_torch/models/checkpoints/hf/weight_loader.pytensorrt_llm/_torch/models/checkpoints/mistral/checkpoint_loader.pytensorrt_llm/_torch/models/checkpoints/mistral/weight_mapper.pytensorrt_llm/_torch/models/modeling_mistral.pytensorrt_llm/_torch/pyexecutor/config_utils.pytensorrt_llm/_torch/pyexecutor/resource_manager.pytensorrt_llm/llmapi/llm_utils.pytests/unittest/_torch/models/checkpoints/hf/test_weight_loader.pytests/unittest/_torch/models/checkpoints/mistral/test_weight_mapper.py
SimengLiu-nv
left a comment
There was a problem hiding this comment.
The kvcache related changes in tensorrt_llm/_torch/pyexecutor/resource_manager.py look good to me.
Signed-off-by: Olya Kozlova <okozlova@nvidia.com>
|
/bot run --disable-fail-fast |
|
PR_Github #42754 [ run ] triggered by Bot. Commit: |
Description
Fixes weight loading for Ministral3 model family.
Test Coverage
tests/unittest/_torch/models/checkpoints/mistral/test_weight_mapper.py -- new
tests/unittest/_torch/models/checkpoints/hf/test_weight_loader.py -- modified
numerical correctness tests when HF format is supported
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.Summary by CodeRabbit
Release Notes
New Features
Bug Fixes
Improvements
Tests