fix hybrid model subblock param counting: all FFN sizes reported identical params#1258
fix hybrid model subblock param counting: all FFN sizes reported identical params#1258j-rausch wants to merge 3 commits intofeature/puzzletronfrom
Conversation
… calculate_subblock_params reporting identical params for all FFN sizes on hybrid models Signed-off-by: jrausch <jrausch@nvidia.com>
Signed-off-by: jrausch <jrausch@nvidia.com>
📝 WalkthroughWalkthroughAdded a static method Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Important Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional. ❌ Failed checks (1 error, 1 warning)
✅ Passed checks (2 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Signed-off-by: jrausch <jrausch@nvidia.com>
|
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@modelopt/torch/puzzletron/anymodel/model_descriptor/base.py`:
- Around line 193-199: The code currently strips pipe separators with pattern =
pattern.replace("|", "") then indexes pattern[0], which raises IndexError if the
result is empty (e.g., "|||"); update the logic in the block handling
pattern/parent_layer_index to guard for an empty pattern after normalization:
after computing pattern = pattern.replace("|", ""), check if pattern is empty
and in that case set lm_config.hybrid_override_pattern to an appropriate safe
value (e.g., "" or None) and return (or leave unchanged), otherwise continue
with the existing parent_layer_index conditional and assignment to
lm_config.hybrid_override_pattern; reference the variables pattern,
parent_layer_index, and lm_config.hybrid_override_pattern when making the
change.
In `@tests/gpu/puzzletron/test_nemotron_h_gpu_validation.py`:
- Around line 40-43: The nemotron_config fixture currently hardcodes
trust_remote_code=True; change it to depend on the nemotron_descriptor fixture
and pass its requires_trust_remote_code() value into load_model_config so the
descriptor drives the trust decision. Specifically, update the nemotron_config
fixture signature to accept nemotron_descriptor and call
load_model_config(MODEL_ID,
trust_remote_code=nemotron_descriptor.requires_trust_remote_code()), keeping
MODEL_ID and load_model_config as-is.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 9c6c7a62-1477-4e19-a077-f220983bcdb4
📒 Files selected for processing (5)
modelopt/torch/puzzletron/anymodel/model_descriptor/base.pymodelopt/torch/puzzletron/subblock_stats/calc_subblock_params_and_memory.pymodelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.pytests/gpu/puzzletron/test_nemotron_h_gpu_validation.pytests/unit/torch/puzzletron/test_hybrid_pattern_truncation.py
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## feature/puzzletron #1258 +/- ##
======================================================
- Coverage 76.33% 76.25% -0.09%
======================================================
Files 454 454
Lines 48025 48103 +78
======================================================
+ Hits 36660 36681 +21
- Misses 11365 11422 +57
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
kevalmorabia97
left a comment
There was a problem hiding this comment.
Minor comments. Otherwise LGTM
| return ModelDescriptorFactory.get("nemotron_h_v2") | ||
|
|
||
|
|
||
| @pytest.mark.gpu |
There was a problem hiding this comment.
pytest mark not needed as all tests in tests/gpu assume requires gpu
| @pytest.mark.gpu |
| parent_layer_indices = subblock_config_indexed["parent_layer_indices"] | ||
|
|
||
| layer_model_config = copy.deepcopy(model_config) | ||
| descriptor.truncate_pattern_for_subblock( |
There was a problem hiding this comment.
| descriptor.truncate_pattern_for_subblock( | |
| ModelDescriptor.truncate_pattern_for_subblock( |
| return config | ||
|
|
||
| @staticmethod | ||
| def truncate_pattern_for_subblock(lm_config, parent_layer_index=None): |
There was a problem hiding this comment.
Can you add type annotations for both arguments?
Summary
calculate_subblock_paramsbuilds a 1-layer model to count per-layer params by settingnum_hidden_layers=1hybrid_override_patternat full length, so the 1-layer model always built layer 0 (pattern[0]= Mamba). every FFN variant reported the same Mamba param count regardless ofintermediate_sizeFix
hybrid_override_patternto the single character matching the subblock being measured before instantiating the 1-layer modelhybrid_override_patternis present; non-hybrid models (Llama, Qwen, etc.) are unaffectedSummary by CodeRabbit
Bug Fixes
Tests