Skip to content

fix hybrid model subblock param counting: all FFN sizes reported identical params#1258

Open
j-rausch wants to merge 3 commits intofeature/puzzletronfrom
jrausch/nemotron-h-fix-pattern-truncation
Open

fix hybrid model subblock param counting: all FFN sizes reported identical params#1258
j-rausch wants to merge 3 commits intofeature/puzzletronfrom
jrausch/nemotron-h-fix-pattern-truncation

Conversation

@j-rausch
Copy link
Copy Markdown
Contributor

@j-rausch j-rausch commented Apr 14, 2026

Summary

  • on hybrid models (e.g. Nemotron-H), calculate_subblock_params builds a 1-layer model to count per-layer params by setting num_hidden_layers=1
  • it left hybrid_override_pattern at full length, so the 1-layer model always built layer 0 (pattern[0] = Mamba). every FFN variant reported the same Mamba param count regardless of intermediate_size
  • this made MIP unable to differentiate FFN sizes

Fix

  • truncate hybrid_override_pattern to the single character matching the subblock being measured before instantiating the 1-layer model
  • per iteration/layer, we deep copy the model config and create a per-layer model config with fixed pattern
  • activates only when hybrid_override_pattern is present; non-hybrid models (Llama, Qwen, etc.) are unaffected

Summary by CodeRabbit

  • Bug Fixes

    • Enhanced accuracy of parameter counting for hybrid FFN model configurations.
  • Tests

    • Added unit tests for hybrid pattern truncation functionality.
    • Added GPU validation tests for Nemotron-H model parameter calculations.

… calculate_subblock_params reporting identical params for all FFN sizes on hybrid models

Signed-off-by: jrausch <jrausch@nvidia.com>
Signed-off-by: jrausch <jrausch@nvidia.com>
@j-rausch j-rausch requested a review from a team as a code owner April 14, 2026 15:45
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Apr 14, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 14, 2026

📝 Walkthrough

Walkthrough

Added a static method truncate_pattern_for_subblock to truncate hybrid override patterns for single-layer configuration usage. Integrated this method into subblock statistics calculation to apply per-layer configuration adjustments. Added unit and GPU validation tests covering the new functionality.

Changes

Cohort / File(s) Summary
Core Implementation
modelopt/torch/puzzletron/anymodel/model_descriptor/base.py, modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py
Added truncate_pattern_for_subblock() static method that mutates hybrid_override_pattern based on parent layer index. Integrated into subblock stats calculation: deep-copy model config per subblock, truncate pattern via descriptor, and pass truncated config to parameter/memory/active param calculations.
Documentation
modelopt/torch/puzzletron/subblock_stats/calc_subblock_params_and_memory.py
Updated calculate_subblock_params docstring to clarify that callers must pre-adjust per-layer configuration fields using ModelDescriptor.truncate_pattern_for_subblock before passing config.
Unit Tests
tests/unit/torch/puzzletron/test_hybrid_pattern_truncation.py
Comprehensive unit test suite exercising truncate_pattern_for_subblock() with coverage for index-to-pattern mapping, pipe separator stripping, missing/degenerate inputs, and out-of-range index handling.
GPU Validation Test
tests/gpu/puzzletron/test_nemotron_h_gpu_validation.py
GPU test validating Nemotron-H hybrid FFN subblock parameter counting across three intermediate size variants using the new truncation method and calculate_subblock_params.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes


Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 1 warning)

Check name Status Explanation Resolution
Security Anti-Patterns ❌ Error The test file hardcodes trust_remote_code=True when calling load_model_config(), violating the security anti-pattern that requires using the descriptor-driven source of truth. Update the test fixture to use nemotron_descriptor.requires_trust_remote_code() instead of hardcoded True value.
Docstring Coverage ⚠️ Warning Docstring coverage is 73.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and specifically describes the main fix: correcting hybrid model subblock parameter counting where all FFN sizes incorrectly reported identical parameters.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch jrausch/nemotron-h-fix-pattern-truncation

Comment @coderabbitai help to get the list of available commands and usage tips.

Signed-off-by: jrausch <jrausch@nvidia.com>
@github-actions
Copy link
Copy Markdown
Contributor

PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1258/

Built to branch gh-pages at 2026-04-14 15:52 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt/torch/puzzletron/anymodel/model_descriptor/base.py`:
- Around line 193-199: The code currently strips pipe separators with pattern =
pattern.replace("|", "") then indexes pattern[0], which raises IndexError if the
result is empty (e.g., "|||"); update the logic in the block handling
pattern/parent_layer_index to guard for an empty pattern after normalization:
after computing pattern = pattern.replace("|", ""), check if pattern is empty
and in that case set lm_config.hybrid_override_pattern to an appropriate safe
value (e.g., "" or None) and return (or leave unchanged), otherwise continue
with the existing parent_layer_index conditional and assignment to
lm_config.hybrid_override_pattern; reference the variables pattern,
parent_layer_index, and lm_config.hybrid_override_pattern when making the
change.

In `@tests/gpu/puzzletron/test_nemotron_h_gpu_validation.py`:
- Around line 40-43: The nemotron_config fixture currently hardcodes
trust_remote_code=True; change it to depend on the nemotron_descriptor fixture
and pass its requires_trust_remote_code() value into load_model_config so the
descriptor drives the trust decision. Specifically, update the nemotron_config
fixture signature to accept nemotron_descriptor and call
load_model_config(MODEL_ID,
trust_remote_code=nemotron_descriptor.requires_trust_remote_code()), keeping
MODEL_ID and load_model_config as-is.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 9c6c7a62-1477-4e19-a077-f220983bcdb4

📥 Commits

Reviewing files that changed from the base of the PR and between 3f41819 and b279014.

📒 Files selected for processing (5)
  • modelopt/torch/puzzletron/anymodel/model_descriptor/base.py
  • modelopt/torch/puzzletron/subblock_stats/calc_subblock_params_and_memory.py
  • modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py
  • tests/gpu/puzzletron/test_nemotron_h_gpu_validation.py
  • tests/unit/torch/puzzletron/test_hybrid_pattern_truncation.py

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 14, 2026

Codecov Report

❌ Patch coverage is 78.57143% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.25%. Comparing base (38d9522) to head (b279014).
⚠️ Report is 11 commits behind head on feature/puzzletron.

Files with missing lines Patch % Lines
...h/puzzletron/subblock_stats/calc_subblock_stats.py 25.00% 3 Missing ⚠️
Additional details and impacted files
@@                  Coverage Diff                   @@
##           feature/puzzletron    #1258      +/-   ##
======================================================
- Coverage               76.33%   76.25%   -0.09%     
======================================================
  Files                     454      454              
  Lines                   48025    48103      +78     
======================================================
+ Hits                    36660    36681      +21     
- Misses                  11365    11422      +57     
Flag Coverage Δ
unit 51.86% <78.57%> (+0.07%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Collaborator

@kevalmorabia97 kevalmorabia97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments. Otherwise LGTM

return ModelDescriptorFactory.get("nemotron_h_v2")


@pytest.mark.gpu
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pytest mark not needed as all tests in tests/gpu assume requires gpu

Suggested change
@pytest.mark.gpu

parent_layer_indices = subblock_config_indexed["parent_layer_indices"]

layer_model_config = copy.deepcopy(model_config)
descriptor.truncate_pattern_for_subblock(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
descriptor.truncate_pattern_for_subblock(
ModelDescriptor.truncate_pattern_for_subblock(

return config

@staticmethod
def truncate_pattern_for_subblock(lm_config, parent_layer_index=None):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add type annotations for both arguments?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants