Refine _extract_layer_prefixes to better handle mtp modules by Edwardf0t1 · Pull Request #1124 · NVIDIA/Model-Optimizer

Edwardf0t1 · 2026-03-26T01:51:38Z

What does this PR do?

Type of change: ? bugfix

example_utils.py — _extract_layer_prefixes

Added extraction of the top-level MTP module prefix (e.g., mtp from mtp.fc.weight). Previously it only captured mtp.layers.0 because it looked exclusively for *.layers. patterns.

Now it returns {"mtp", "mtp.layers.0"} instead of just {"mtp.layers.0"}.

hf_ptq.py — quantization exclusion pattern

Updated pattern construction to handle single-component prefixes:

mtp.layers.0 (multi-component) → layers.0 (same as before)
mtp (single-component) → mtp (new — covers mtp.fc, mtp.norm, etc.)

Usage

# Add a code snippet demonstrating how to use this

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

Is this change backward compatible?: ✅ / ❌ / N/A
If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: ✅ / ❌ / N/A
Did you write any new necessary tests?: ✅ / ❌ / N/A
Did you update Changelog?: ✅ / ❌ / N/A

Additional Information

Summary by CodeRabbit

Bug Fixes
- Improved quantization exclusion so all relevant model parameters, including top-level non-layer components, are correctly identified and skipped during quantization.
- Broadened exclusion matching to use full-prefix patterns, reducing incorrect inclusions/exclusions of parameters during optimization.

coderabbitai · 2026-03-26T01:51:51Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 77eb05be-075b-4308-9e91-66eeb8a2b55d

📥 Commits

Reviewing files that changed from the base of the PR and between fac8927 and 3d9d337.

📒 Files selected for processing (2)

examples/llm_ptq/example_utils.py
examples/llm_ptq/hf_ptq.py

✅ Files skipped from review due to trivial changes (1)

examples/llm_ptq/example_utils.py

📝 Walkthrough

Walkthrough

Adds top-level mtp module prefixes to the extracted prefix set and changes quantization-exclusion pattern generation to match whole prefixes (wildcard around full prefix) instead of deriving patterns from only the last two components.

Changes

Cohort / File(s)	Summary
Prefix extraction `examples/llm_ptq/example_utils.py`	In `_extract_layer_prefixes(keys)`, always include the top-level `mtp` segment (`parts[0]`) when collecting layer prefixes so non-layer MTP parameters (e.g., `mtp.fc`, `mtp.norm`) are returned in the prefix set.
Quantization exclusion pattern `examples/llm_ptq/hf_ptq.py`	When `full_model` supplies `mtp_layer_prefixes`, build exclusion patterns as `{prefix}` (wildcard around the entire prefix) and set `quant_cfg["quant_cfg"][pattern] = {"enable": False}` for each pattern instead of using only the last two dot-separated components.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change in the pull request—refining the _extract_layer_prefixes function to better handle MTP modules, which aligns with the primary modifications across both modified files.
Security Anti-Patterns	✅ Passed	Code changes in example_utils.py and hf_ptq.py involve only string manipulation and prefix extraction with no unsafe deserialization, hardcoded trust settings, eval/exec operations, nosec bypasses, or new dependencies.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch zhiyu/exclude-mtp-in-config

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-03-26T01:55:49Z

PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-04-06 20:12 UTC

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/llm_ptq/example_utils.py`:
- Around line 369-370: The code currently adds parts[0] into mtp_layer_prefixes
for any dotted name, which can collect non-MTP roots like "model"; update the
logic where parts is computed so you only add the prefix when it is an actual
MTP root (e.g., check parts[0] == "mtp" or otherwise validate that the token
represents an mtp root) before calling mtp_layer_prefixes.add(parts[0]),
ensuring only real "mtp" top-level prefixes are captured.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 17929680-e2d3-448f-af63-da3b24ba1883

📥 Commits

Reviewing files that changed from the base of the PR and between 291498b and 011a2a6.

📒 Files selected for processing (2)

examples/llm_ptq/example_utils.py
examples/llm_ptq/hf_ptq.py

coderabbitai · 2026-03-26T01:56:41Z

examples/llm_ptq/example_utils.py

+            if parts:
+                mtp_layer_prefixes.add(parts[0])


⚠️ Potential issue | 🟠 Major

Restrict top-level prefix capture to actual mtp roots.

At Line 369–Line 370, adding parts[0] unconditionally can capture non-MTP roots (e.g., "model"), which then propagates into very broad exclusion patterns and can disable quantization for unintended modules.

💡 Suggested fix

- if parts: - mtp_layer_prefixes.add(parts[0]) + mtp_root_idx = next((i for i, p in enumerate(parts) if p == "mtp"), None) + if mtp_root_idx is not None: + mtp_layer_prefixes.add(".".join(parts[: mtp_root_idx + 1]))

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if parts:

mtp_layer_prefixes.add(parts[0])

mtp_root_idx = next((i for i, p in enumerate(parts) if p == "mtp"), None)

if mtp_root_idx is not None:

mtp_layer_prefixes.add(".".join(parts[: mtp_root_idx + 1]))

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@examples/llm_ptq/example_utils.py` around lines 369 - 370, The code currently adds parts[0] into mtp_layer_prefixes for any dotted name, which can collect non-MTP roots like "model"; update the logic where parts is computed so you only add the prefix when it is an actual MTP root (e.g., check parts[0] == "mtp" or otherwise validate that the token represents an mtp root) before calling mtp_layer_prefixes.add(parts[0]), ensuring only real "mtp" top-level prefixes are captured.

codecov · 2026-03-26T02:04:59Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.05%. Comparing base (df80a0f) to head (3d9d337).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1124      +/-   ##
==========================================
+ Coverage   74.77%   76.05%   +1.28%     
==========================================
  Files         351      351              
  Lines       40072    40072              
==========================================
+ Hits        29964    30478     +514     
+ Misses      10108     9594     -514

Flag	Coverage Δ
examples	`43.85% <ø> (+3.63%)`	⬆️
unit	`54.74% <ø> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

vadiklyutiy

LGTM

examples/llm_ptq/hf_ptq.py

Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>

Edwardf0t1 requested a review from a team as a code owner March 26, 2026 01:51

Edwardf0t1 requested a review from cjluo-nv March 26, 2026 01:51

coderabbitai bot reviewed Mar 26, 2026

View reviewed changes

vadiklyutiy reviewed Apr 1, 2026

View reviewed changes

This was referenced Apr 1, 2026

[Bugfix] Enable MTP for the official Qwen3.5 NVFP4 checkpoint vllm-project/vllm#38650

Closed

[Bugfix] Fix NVFP4+MTP crash: force unquantized mtp.fc for Qwen3.5 vllm-project/vllm#38832

Merged

Edwardf0t1 requested a review from meenchen April 6, 2026 04:14

meenchen reviewed Apr 6, 2026

View reviewed changes

examples/llm_ptq/hf_ptq.py Outdated Show resolved Hide resolved

meenchen approved these changes Apr 6, 2026

View reviewed changes

Edwardf0t1 added 2 commits April 6, 2026 12:12

refine _extract_layer_prefixes to better handle mtp modules

f279522

Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>

update

3d9d337

Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>

Edwardf0t1 force-pushed the zhiyu/exclude-mtp-in-config branch from fac8927 to 3d9d337 Compare April 6, 2026 19:13

Edwardf0t1 enabled auto-merge (squash) April 6, 2026 19:14

Edwardf0t1 merged commit c542c09 into main Apr 6, 2026
43 checks passed

Edwardf0t1 deleted the zhiyu/exclude-mtp-in-config branch April 6, 2026 20:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refine _extract_layer_prefixes to better handle mtp modules#1124

Refine _extract_layer_prefixes to better handle mtp modules#1124
Edwardf0t1 merged 2 commits intomainfrom
zhiyu/exclude-mtp-in-config

Edwardf0t1 commented Mar 26, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 26, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

github-actions bot commented Mar 26, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 26, 2026

Uh oh!

codecov bot commented Mar 26, 2026 •

edited

Loading

Uh oh!

vadiklyutiy left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

-            if parts:
-                mtp_layer_prefixes.add(parts[0])
+            mtp_root_idx = next((i for i, p in enumerate(parts) if p == "mtp"), None)
+            if mtp_root_idx is not None:
+                mtp_layer_prefixes.add(".".join(parts[: mtp_root_idx + 1]))

Conversation

Edwardf0t1 commented Mar 26, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

github-actions bot commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

vadiklyutiy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Edwardf0t1 commented Mar 26, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 26, 2026 •

edited

Loading

github-actions bot commented Mar 26, 2026 •

edited

Loading

codecov bot commented Mar 26, 2026 •

edited

Loading