[https://nvbugs/5969216][fix] Ministral3 loading fix by evezhier · Pull Request #12743 · NVIDIA/TensorRT-LLM

evezhier · 2026-04-03T16:54:42Z

Description

Fixes weight loading for Ministral3 model family.

Test Coverage

tests/unittest/_torch/models/checkpoints/mistral/test_weight_mapper.py -- new
tests/unittest/_torch/models/checkpoints/hf/test_weight_loader.py -- modified
numerical correctness tests when HF format is supported

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Summary by CodeRabbit

Release Notes

New Features
- Added support for consolidated weight file loading in Mistral models.
- Enhanced Mistral3 configuration and weight loading capabilities.
Bug Fixes
- Fixed KV cache size calculation for Multi-Head Latent Attention (MLA) configurations.
Improvements
- Refined FP8 quantization handling with block-scale support for Ministral models.
- Improved weight permutation and mapping for Mistral model architectures.
Tests
- Expanded test coverage for weight loading and mapping functionality.

Signed-off-by: Olya Kozlova <okozlova@nvidia.com>

coderabbitai · 2026-04-09T19:59:27Z

📝 Walkthrough

Walkthrough

Changes extend Hugging Face weight loader to support consolidated checkpoint selection via a new use_consolidated parameter, update Mistral checkpoint and weight handling to use this parameter and refactor weight permutation logic, and add supporting configuration and resource management updates for Mistral3 model variants.

Changes

Cohort / File(s)	Summary
Weight Loading Configuration `tensorrt_llm/_torch/models/checkpoints/hf/weight_loader.py`, `tensorrt_llm/_torch/models/checkpoints/mistral/checkpoint_loader.py`	Extended HF weight loader signature with `use_consolidated: bool = False` parameter controlling SafeTensors file selection (consolidated vs. sharded). Mistral checkpoint loader now passes `use_consolidated=True` to parent loader.
Mistral Weight Processing `tensorrt_llm/_torch/models/checkpoints/mistral/weight_mapper.py`, `tensorrt_llm/_torch/models/modeling_mistral.py`	Replaced callback-based `_permute_qk` with standalone `permute_qk(weights, config)` method iterating all weight entries. Updated quantization scale key remapping (`k_scale`/`v_scale` → `attn.k_scale`/`attn.v_scale`). Modified `Mistral3VLM.load_weights()` to conditionally invoke `weight_mapper.permute_qk()` for LLM and vision tower weights.
Configuration & Infrastructure `tensorrt_llm/_torch/pyexecutor/config_utils.py`, `tensorrt_llm/_torch/pyexecutor/resource_manager.py`, `tensorrt_llm/llmapi/llm_utils.py`	Added `mistral3` model type handling in config loading. Refined KVCache MLA detection to check both `hasattr` and `is not None` for `kv_lora_rank`. Updated FP8 quantization branching on `weight_block_size` presence.
Test Coverage `tests/unittest/_torch/models/checkpoints/hf/test_weight_loader.py`, `tests/unittest/_torch/models/checkpoints/mistral/test_weight_mapper.py`	Extended weight loader test with `use_consolidated` parameterization. Added new test module for `MistralWeightMapper.rename_by_params_map()` with fixture-driven weight renaming validation.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 5.88% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	❓ Inconclusive	The description provides a clear issue statement ('Fixes weight loading for Mistral3 model family') and identifies test coverage locations, but lacks detailed explanation of what the issue was and how the solution addresses it.	Expand the description to explain the specific weight loading issue encountered and how the changes (consolidated flag, permutation logic refactoring) resolve it.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly identifies the main change: fixing weight loading for the Ministral3 model family, which aligns with the core modifications across weight loading and mapping logic.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

tensorrt_llm/_torch/models/modeling_mistral.py (1)
373-379: ⚠️ Potential issue | 🔴 Critical

Text-only mistral_large_3 requests will now fail.

Line 379 makes self.text_processor a MistralCommonImageProcessor, but the text-only path later calls it without an images argument. Since MistralCommonImageProcessor.__call__ requires images, text-only preprocessing now raises TypeError.
Minimal fix
-    def __call__(self, text, images, **kwargs):
+    def __call__(self, text, images=None, **kwargs):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/models/modeling_mistral.py` around lines 373 - 379, The
current branch for model_type == "mistral_large_3" assigns self.text_processor
to a MistralCommonImageProcessor which requires an images arg and breaks
text-only paths; instead instantiate or assign a text-only processor to
self.text_processor (e.g., a MistralTextProcessor or tokenizer-backed text
processor) while keeping self._processor as the image-capable
MistralCommonImageProcessor so text-only preprocessing (calls to
self.text_processor(...)) do not require an images parameter.

🧹 Nitpick comments (1)

tests/unittest/_torch/models/checkpoints/mistral/test_weight_mapper.py (1)
7-47: Extend this regression to cover the new mapper paths.

This test only exercises rename_by_params_map(). The PR also changes permute_qk() and adds the k_fake_quantizer.qscale_act / v_fake_quantizer.qscale_act mappings, but neither path is asserted here, so the new loading logic can still regress without failing this test.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unittest/_torch/models/checkpoints/mistral/test_weight_mapper.py`
around lines 7 - 47, The test currently only covers rename_by_params_map but
must also exercise the new permute_qk path and the added
k_fake_quantizer.qscale_act / v_fake_quantizer.qscale_act mappings; update
test_rename_by_params_map to include source keys for both
"layers.0.attention.k_fake_quantizer.qscale_act" and
"layers.0.attention.v_fake_quantizer.qscale_act" (so they map to the new targets
in mapper.mistral_llm_mapping) and add an assertion that mapper.permute_qk was
applied for Q/K tensors by invoking mapper.permute_qk on a mock q/k pair (or
include q/k-shaped tensors in input_weights and assert their shapes/ordering in
result) using the MistralWeightMapper instance and the mistral_llm_mapping so
the permute_qk code path and the kv fake-quantizer mappings are covered.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tensorrt_llm/_torch/models/modeling_mistral.py`:
- Line 509: The condition using getattr(config, "input_processor_type", None) in
("mistral_large_3") uses substring membership because ("mistral_large_3") is a
string; change it to an exact equality check by comparing getattr(config,
"input_processor_type", None) == "mistral_large_3" so only the exact value
matches (update the conditional in modeling_mistral.py where
input_processor_type is checked).

In `@tensorrt_llm/_torch/pyexecutor/config_utils.py`:
- Around line 284-288: The code is assigning the string "MinistralConfig" to
config_class and then calling .from_pretrained on that string; replace this with
the actual HF class and call its from_pretrained method — change the branch
where model_type == "mistral3" to use Mistral3Config (imported from
transformers) and call Mistral3Config.from_pretrained(model_name_or_path,
**kwargs) to produce model_config; ensure the top of the module imports
Mistral3Config (or conditionally import it) instead of relying on a string in
the variable config_class.

In `@tensorrt_llm/llmapi/llm_utils.py`:
- Line 434: Replace the truthiness check on
hf_quant_config.get("weight_block_size") with an explicit presence check so a
present-but-falsey value still selects block-scales instead of falling through
to QuantAlgo.FP8; specifically, update the conditional around hf_quant_config in
llm_utils.py (the block that decides between block-scales and QuantAlgo.FP8) to
use either "'weight_block_size' in hf_quant_config" or
"hf_quant_config.get('weight_block_size') is not None" when deciding the code
path for block-scales handling.

In `@tests/unittest/_torch/models/checkpoints/mistral/test_weight_mapper.py`:
- Around line 1-4: Add the standard NVIDIA file header to the top of this new
test file so it includes the repository copyright/ownership notice and updated
year; modify the file
tests/unittest/_torch/models/checkpoints/mistral/test_weight_mapper.py to
prepend the standard NVIDIA header block above the imports (before pytest,
torch, and the MistralWeightMapper import) ensuring the header format matches
other files in the repo.

---

Outside diff comments:
In `@tensorrt_llm/_torch/models/modeling_mistral.py`:
- Around line 373-379: The current branch for model_type == "mistral_large_3"
assigns self.text_processor to a MistralCommonImageProcessor which requires an
images arg and breaks text-only paths; instead instantiate or assign a text-only
processor to self.text_processor (e.g., a MistralTextProcessor or
tokenizer-backed text processor) while keeping self._processor as the
image-capable MistralCommonImageProcessor so text-only preprocessing (calls to
self.text_processor(...)) do not require an images parameter.

---

Nitpick comments:
In `@tests/unittest/_torch/models/checkpoints/mistral/test_weight_mapper.py`:
- Around line 7-47: The test currently only covers rename_by_params_map but must
also exercise the new permute_qk path and the added k_fake_quantizer.qscale_act
/ v_fake_quantizer.qscale_act mappings; update test_rename_by_params_map to
include source keys for both "layers.0.attention.k_fake_quantizer.qscale_act"
and "layers.0.attention.v_fake_quantizer.qscale_act" (so they map to the new
targets in mapper.mistral_llm_mapping) and add an assertion that
mapper.permute_qk was applied for Q/K tensors by invoking mapper.permute_qk on a
mock q/k pair (or include q/k-shaped tensors in input_weights and assert their
shapes/ordering in result) using the MistralWeightMapper instance and the
mistral_llm_mapping so the permute_qk code path and the kv fake-quantizer
mappings are covered.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1610eca5-a029-4955-ae8f-b7436983c76c

📥 Commits

Reviewing files that changed from the base of the PR and between f8d2090 and d22ecdd.

📒 Files selected for processing (9)

tensorrt_llm/_torch/models/checkpoints/hf/weight_loader.py
tensorrt_llm/_torch/models/checkpoints/mistral/checkpoint_loader.py
tensorrt_llm/_torch/models/checkpoints/mistral/weight_mapper.py
tensorrt_llm/_torch/models/modeling_mistral.py
tensorrt_llm/_torch/pyexecutor/config_utils.py
tensorrt_llm/_torch/pyexecutor/resource_manager.py
tensorrt_llm/llmapi/llm_utils.py
tests/unittest/_torch/models/checkpoints/hf/test_weight_loader.py
tests/unittest/_torch/models/checkpoints/mistral/test_weight_mapper.py

tensorrt_llm/_torch/models/modeling_mistral.py

tensorrt_llm/_torch/pyexecutor/config_utils.py

tensorrt_llm/llmapi/llm_utils.py

tests/unittest/_torch/models/checkpoints/mistral/test_weight_mapper.py

SimengLiu-nv

The kvcache related changes in tensorrt_llm/_torch/pyexecutor/resource_manager.py look good to me.

Signed-off-by: Olya Kozlova <okozlova@nvidia.com>

evezhier · 2026-04-10T22:40:34Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-10T22:46:46Z

PR_Github #42754 [ run ] triggered by Bot. Commit: 83347a7 Link to invocation

github-actions bot assigned evezhier Apr 3, 2026

evezhier added 6 commits April 9, 2026 11:58

config load for HF format

e95f6c2

Signed-off-by: Olya Kozlova <okozlova@nvidia.com>

weight loading fix

8963b83

Signed-off-by: Olya Kozlova <okozlova@nvidia.com>

move permute from callbacks to weight loading

34cf70c

Signed-off-by: Olya Kozlova <okozlova@nvidia.com>

move permute from callbacks to weight loading

3eb0006

Signed-off-by: Olya Kozlova <okozlova@nvidia.com>

cleanup

dad332e

Signed-off-by: Olya Kozlova <okozlova@nvidia.com>

tests

c66b9a2

Signed-off-by: Olya Kozlova <okozlova@nvidia.com>

evezhier force-pushed the fix/ministral-loading branch from fc87519 to c66b9a2 Compare April 9, 2026 19:13

precommit fix

d22ecdd

Signed-off-by: Olya Kozlova <okozlova@nvidia.com>

evezhier marked this pull request as ready for review April 9, 2026 19:50

evezhier requested review from a team as code owners April 9, 2026 19:50

evezhier requested review from hchings, jdebache, joyang-nv and symphonylyh April 9, 2026 19:50

coderabbitai bot reviewed Apr 9, 2026

View reviewed changes

SimengLiu-nv approved these changes Apr 9, 2026

View reviewed changes

evezhier added 2 commits April 10, 2026 15:28

openai server fixes

7fdba72

Signed-off-by: Olya Kozlova <okozlova@nvidia.com>

license

83347a7

Signed-off-by: Olya Kozlova <okozlova@nvidia.com>

evezhier enabled auto-merge (squash) April 10, 2026 22:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[https://nvbugs/5969216][fix] Ministral3 loading fix#12743

[https://nvbugs/5969216][fix] Ministral3 loading fix#12743
evezhier wants to merge 9 commits intoNVIDIA:mainfrom
evezhier:fix/ministral-loading

evezhier commented Apr 3, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Apr 9, 2026

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SimengLiu-nv left a comment

Uh oh!

evezhier commented Apr 10, 2026

Uh oh!

tensorrt-cicd commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

evezhier commented Apr 3, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Test Coverage

PR Checklist

GitHub Bot Help

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Apr 9, 2026

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SimengLiu-nv left a comment

Choose a reason for hiding this comment

Uh oh!

evezhier commented Apr 10, 2026

Uh oh!

tensorrt-cicd commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

evezhier commented Apr 3, 2026 •

edited by coderabbitai bot

Loading