Skip to content

[https://nvbugs/5969216][fix] Ministral3 loading fix#12743

Open
evezhier wants to merge 9 commits intoNVIDIA:mainfrom
evezhier:fix/ministral-loading
Open

[https://nvbugs/5969216][fix] Ministral3 loading fix#12743
evezhier wants to merge 9 commits intoNVIDIA:mainfrom
evezhier:fix/ministral-loading

Conversation

@evezhier
Copy link
Copy Markdown
Collaborator

@evezhier evezhier commented Apr 3, 2026

Description

Fixes weight loading for Ministral3 model family.

Test Coverage

tests/unittest/_torch/models/checkpoints/mistral/test_weight_mapper.py -- new
tests/unittest/_torch/models/checkpoints/hf/test_weight_loader.py -- modified
numerical correctness tests when HF format is supported

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added support for consolidated weight file loading in Mistral models.
    • Enhanced Mistral3 configuration and weight loading capabilities.
  • Bug Fixes

    • Fixed KV cache size calculation for Multi-Head Latent Attention (MLA) configurations.
  • Improvements

    • Refined FP8 quantization handling with block-scale support for Ministral models.
    • Improved weight permutation and mapping for Mistral model architectures.
  • Tests

    • Expanded test coverage for weight loading and mapping functionality.

evezhier added 6 commits April 9, 2026 11:58
Signed-off-by: Olya Kozlova <okozlova@nvidia.com>
Signed-off-by: Olya Kozlova <okozlova@nvidia.com>
Signed-off-by: Olya Kozlova <okozlova@nvidia.com>
Signed-off-by: Olya Kozlova <okozlova@nvidia.com>
Signed-off-by: Olya Kozlova <okozlova@nvidia.com>
Signed-off-by: Olya Kozlova <okozlova@nvidia.com>
@evezhier evezhier force-pushed the fix/ministral-loading branch from fc87519 to c66b9a2 Compare April 9, 2026 19:13
Signed-off-by: Olya Kozlova <okozlova@nvidia.com>
@evezhier evezhier marked this pull request as ready for review April 9, 2026 19:50
@evezhier evezhier requested review from a team as code owners April 9, 2026 19:50
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 9, 2026

📝 Walkthrough

Walkthrough

Changes extend Hugging Face weight loader to support consolidated checkpoint selection via a new use_consolidated parameter, update Mistral checkpoint and weight handling to use this parameter and refactor weight permutation logic, and add supporting configuration and resource management updates for Mistral3 model variants.

Changes

Cohort / File(s) Summary
Weight Loading Configuration
tensorrt_llm/_torch/models/checkpoints/hf/weight_loader.py, tensorrt_llm/_torch/models/checkpoints/mistral/checkpoint_loader.py
Extended HF weight loader signature with use_consolidated: bool = False parameter controlling SafeTensors file selection (consolidated vs. sharded). Mistral checkpoint loader now passes use_consolidated=True to parent loader.
Mistral Weight Processing
tensorrt_llm/_torch/models/checkpoints/mistral/weight_mapper.py, tensorrt_llm/_torch/models/modeling_mistral.py
Replaced callback-based _permute_qk with standalone permute_qk(weights, config) method iterating all weight entries. Updated quantization scale key remapping (k_scale/v_scaleattn.k_scale/attn.v_scale). Modified Mistral3VLM.load_weights() to conditionally invoke weight_mapper.permute_qk() for LLM and vision tower weights.
Configuration & Infrastructure
tensorrt_llm/_torch/pyexecutor/config_utils.py, tensorrt_llm/_torch/pyexecutor/resource_manager.py, tensorrt_llm/llmapi/llm_utils.py
Added mistral3 model type handling in config loading. Refined KVCache MLA detection to check both hasattr and is not None for kv_lora_rank. Updated FP8 quantization branching on weight_block_size presence.
Test Coverage
tests/unittest/_torch/models/checkpoints/hf/test_weight_loader.py, tests/unittest/_torch/models/checkpoints/mistral/test_weight_mapper.py
Extended weight loader test with use_consolidated parameterization. Added new test module for MistralWeightMapper.rename_by_params_map() with fixture-driven weight renaming validation.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 5.88% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive The description provides a clear issue statement ('Fixes weight loading for Mistral3 model family') and identifies test coverage locations, but lacks detailed explanation of what the issue was and how the solution addresses it. Expand the description to explain the specific weight loading issue encountered and how the changes (consolidated flag, permutation logic refactoring) resolve it.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly identifies the main change: fixing weight loading for the Ministral3 model family, which aligns with the core modifications across weight loading and mapping logic.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tensorrt_llm/_torch/models/modeling_mistral.py (1)

373-379: ⚠️ Potential issue | 🔴 Critical

Text-only mistral_large_3 requests will now fail.

Line 379 makes self.text_processor a MistralCommonImageProcessor, but the text-only path later calls it without an images argument. Since MistralCommonImageProcessor.__call__ requires images, text-only preprocessing now raises TypeError.

Minimal fix
-    def __call__(self, text, images, **kwargs):
+    def __call__(self, text, images=None, **kwargs):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/models/modeling_mistral.py` around lines 373 - 379, The
current branch for model_type == "mistral_large_3" assigns self.text_processor
to a MistralCommonImageProcessor which requires an images arg and breaks
text-only paths; instead instantiate or assign a text-only processor to
self.text_processor (e.g., a MistralTextProcessor or tokenizer-backed text
processor) while keeping self._processor as the image-capable
MistralCommonImageProcessor so text-only preprocessing (calls to
self.text_processor(...)) do not require an images parameter.
🧹 Nitpick comments (1)
tests/unittest/_torch/models/checkpoints/mistral/test_weight_mapper.py (1)

7-47: Extend this regression to cover the new mapper paths.

This test only exercises rename_by_params_map(). The PR also changes permute_qk() and adds the k_fake_quantizer.qscale_act / v_fake_quantizer.qscale_act mappings, but neither path is asserted here, so the new loading logic can still regress without failing this test.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unittest/_torch/models/checkpoints/mistral/test_weight_mapper.py`
around lines 7 - 47, The test currently only covers rename_by_params_map but
must also exercise the new permute_qk path and the added
k_fake_quantizer.qscale_act / v_fake_quantizer.qscale_act mappings; update
test_rename_by_params_map to include source keys for both
"layers.0.attention.k_fake_quantizer.qscale_act" and
"layers.0.attention.v_fake_quantizer.qscale_act" (so they map to the new targets
in mapper.mistral_llm_mapping) and add an assertion that mapper.permute_qk was
applied for Q/K tensors by invoking mapper.permute_qk on a mock q/k pair (or
include q/k-shaped tensors in input_weights and assert their shapes/ordering in
result) using the MistralWeightMapper instance and the mistral_llm_mapping so
the permute_qk code path and the kv fake-quantizer mappings are covered.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tensorrt_llm/_torch/models/modeling_mistral.py`:
- Line 509: The condition using getattr(config, "input_processor_type", None) in
("mistral_large_3") uses substring membership because ("mistral_large_3") is a
string; change it to an exact equality check by comparing getattr(config,
"input_processor_type", None) == "mistral_large_3" so only the exact value
matches (update the conditional in modeling_mistral.py where
input_processor_type is checked).

In `@tensorrt_llm/_torch/pyexecutor/config_utils.py`:
- Around line 284-288: The code is assigning the string "MinistralConfig" to
config_class and then calling .from_pretrained on that string; replace this with
the actual HF class and call its from_pretrained method — change the branch
where model_type == "mistral3" to use Mistral3Config (imported from
transformers) and call Mistral3Config.from_pretrained(model_name_or_path,
**kwargs) to produce model_config; ensure the top of the module imports
Mistral3Config (or conditionally import it) instead of relying on a string in
the variable config_class.

In `@tensorrt_llm/llmapi/llm_utils.py`:
- Line 434: Replace the truthiness check on
hf_quant_config.get("weight_block_size") with an explicit presence check so a
present-but-falsey value still selects block-scales instead of falling through
to QuantAlgo.FP8; specifically, update the conditional around hf_quant_config in
llm_utils.py (the block that decides between block-scales and QuantAlgo.FP8) to
use either "'weight_block_size' in hf_quant_config" or
"hf_quant_config.get('weight_block_size') is not None" when deciding the code
path for block-scales handling.

In `@tests/unittest/_torch/models/checkpoints/mistral/test_weight_mapper.py`:
- Around line 1-4: Add the standard NVIDIA file header to the top of this new
test file so it includes the repository copyright/ownership notice and updated
year; modify the file
tests/unittest/_torch/models/checkpoints/mistral/test_weight_mapper.py to
prepend the standard NVIDIA header block above the imports (before pytest,
torch, and the MistralWeightMapper import) ensuring the header format matches
other files in the repo.

---

Outside diff comments:
In `@tensorrt_llm/_torch/models/modeling_mistral.py`:
- Around line 373-379: The current branch for model_type == "mistral_large_3"
assigns self.text_processor to a MistralCommonImageProcessor which requires an
images arg and breaks text-only paths; instead instantiate or assign a text-only
processor to self.text_processor (e.g., a MistralTextProcessor or
tokenizer-backed text processor) while keeping self._processor as the
image-capable MistralCommonImageProcessor so text-only preprocessing (calls to
self.text_processor(...)) do not require an images parameter.

---

Nitpick comments:
In `@tests/unittest/_torch/models/checkpoints/mistral/test_weight_mapper.py`:
- Around line 7-47: The test currently only covers rename_by_params_map but must
also exercise the new permute_qk path and the added k_fake_quantizer.qscale_act
/ v_fake_quantizer.qscale_act mappings; update test_rename_by_params_map to
include source keys for both "layers.0.attention.k_fake_quantizer.qscale_act"
and "layers.0.attention.v_fake_quantizer.qscale_act" (so they map to the new
targets in mapper.mistral_llm_mapping) and add an assertion that
mapper.permute_qk was applied for Q/K tensors by invoking mapper.permute_qk on a
mock q/k pair (or include q/k-shaped tensors in input_weights and assert their
shapes/ordering in result) using the MistralWeightMapper instance and the
mistral_llm_mapping so the permute_qk code path and the kv fake-quantizer
mappings are covered.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1610eca5-a029-4955-ae8f-b7436983c76c

📥 Commits

Reviewing files that changed from the base of the PR and between f8d2090 and d22ecdd.

📒 Files selected for processing (9)
  • tensorrt_llm/_torch/models/checkpoints/hf/weight_loader.py
  • tensorrt_llm/_torch/models/checkpoints/mistral/checkpoint_loader.py
  • tensorrt_llm/_torch/models/checkpoints/mistral/weight_mapper.py
  • tensorrt_llm/_torch/models/modeling_mistral.py
  • tensorrt_llm/_torch/pyexecutor/config_utils.py
  • tensorrt_llm/_torch/pyexecutor/resource_manager.py
  • tensorrt_llm/llmapi/llm_utils.py
  • tests/unittest/_torch/models/checkpoints/hf/test_weight_loader.py
  • tests/unittest/_torch/models/checkpoints/mistral/test_weight_mapper.py

Copy link
Copy Markdown
Collaborator

@SimengLiu-nv SimengLiu-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The kvcache related changes in tensorrt_llm/_torch/pyexecutor/resource_manager.py look good to me.

Signed-off-by: Olya Kozlova <okozlova@nvidia.com>
Signed-off-by: Olya Kozlova <okozlova@nvidia.com>
@evezhier
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42754 [ run ] triggered by Bot. Commit: 83347a7 Link to invocation

@evezhier evezhier enabled auto-merge (squash) April 10, 2026 22:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants