Skip to content

Merge puzzletron compression algorithm#1121

Merged
kevalmorabia97 merged 106 commits intomainfrom
feature/puzzletron
Apr 15, 2026
Merged

Merge puzzletron compression algorithm#1121
kevalmorabia97 merged 106 commits intomainfrom
feature/puzzletron

Conversation

@danielkorzekwa
Copy link
Copy Markdown
Contributor

@danielkorzekwa danielkorzekwa commented Mar 25, 2026

What does this PR do?

Implement puzzletron compression algorithm based on Puzzle paper (https://arxiv.org/abs/2411.19146)

Th list of reviewed and merged MRs that resulted in the feature/puzzletron branch

Merging dkorzekwa/any_model to feature/puzzletron

Add anymodel directories to feature/puzzletron by danielkorzekwa · Pull Request #974 · NVIDIA/Model-Optimizer - merged

Draft: anymodel activation scoring by danielkorzekwa · Pull Request #989 · NVIDIA/Model-Optimizer - merged

Draft: Merge anymodel pruning by danielkorzekwa · Pull Request #990 · NVIDIA/Model-Optimizer - merged

Draft: Merging anymodel:build_library_and_stats by danielkorzekwa · Pull Request #993 · NVIDIA/Model-Optimizer - merged

Dkorzekwa/any model calc one block scores by danielkorzekwa · Pull Request #994 · NVIDIA/Model-Optimizer - merged

Draft: merge any_model: mip_and_realize_models by danielkorzekwa · Pull Request #995 · NVIDIA/Model-Optimizer - merged

Dkorzekwa/any model other modeqls by danielkorztiekwa · Pull Request #1007 · NVIDIA/Model-Optimizer - merged

PR to 1007: #1039 - merged

Dkorzekwa/anymodel gptoss by danielkorzekwa · Pull Request #1020 · NVIDIA/Model-Optimizer - merged

Merge any_model tutorial by danielkorzekwa · Pull Request #1035 · NVIDIA/Model-Optimizer - merged

Merge mbridge distillation for any_model by danielkorzekwa · Pull Request #1036 · NVIDIA/Model-Optimizer - merged

MR branch for the remaining difference between dkorzekwa/any_model an… by danielkorzekwa · Pull Request #1047 · NVIDIA/Model-Optimizer - merged

Dkorzekwa/decilm hf code cleanup by danielkorzekwa · Pull Request #1071 · NVIDIA/Model-Optimizer - merged

Dkorzekwa/decilm hf code cleanup 2 by danielkorzekwa · Pull Request #1073 · NVIDIA/Model-Optimizer - merged

Dkorzekwa/anymodel subblock stats by danielkorzekwa · Pull Request #1085 · NVIDIA/Model-Optimizer - merged

Dkorzekwa/anymodel subblock stats nodecilm by danielkorzekwa · Pull Request #1102 · NVIDIA/Model-Optimizer - merged

Dkorzekwa/decilm cleanup post subblockstats by danielkorzekwa · Pull Request #1103 · NVIDIA/Model-Optimizer - merged

code clean up by danielkorzekwa · Pull Request #1110 · NVIDIA/Model-Optimizer - merged

Merging into main:

Activation hooks redesign (reuse hooks component across both minitron and puzzletron) by danielkorzekwa · Pull Request #1022 · NVIDIA/Model-Optimizer - merged

Dkorzekwa/puzzletron use importance hooks from prune by danielkorzekwa · Pull Request #1115 · NVIDIA/Model-Optimizer - merged

Usage

Puzzletron tutorial:
https://github.com/NVIDIA/Model-Optimizer/tree/feature/puzzletron/examples/puzzletron

Testing

The main e2e test for compressing 9 models with Puzzletron:
https://github.com/NVIDIA/Model-Optimizer/blob/feature/puzzletron/tests/gpu/torch/puzzletron/test_puzzletron.py

2-gpu nightly tests:

Before your PR is "Ready for review"

  • Is this change backward compatible?: ✅
  • If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: ✅
  • Did you write any new necessary tests?: ✅
  • Did you update Changelog?: ✅

Summary by CodeRabbit

  • New Features

    • Added Puzzletron: end-to-end heterogeneous pruning & NAS workflow with AnyModel support, example pipelines, deployment and evaluation utilities, and tools for converting/pruning and exporting compressed checkpoints.
  • Documentation

    • Comprehensive Puzzletron tutorials, model-specific guides, evaluator instructions, example configs, and changelog entry.
  • Chores

    • CI/workflow updates (extras installation, longer GPU test timeout), pre-commit hook exclusion updated, and CODEOWNERS entries added.

kevalmorabia97 and others added 30 commits October 27, 2025 17:42
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
…iv.org/abs/2411.19146 (#464)

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
## What does this PR do?

Add decilm modelling code

---------

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
## What does this PR do?

Compress tutorial (PoC) + compress cli app.

---------

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Liana Mikaelyan <lmikaelyan@nvidia.com>
Signed-off-by: Liana Mikaelyan <45925959+LianaMikael@users.noreply.github.com>
Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Co-authored-by: Liana Mikaelyan <lmikaelyan@nvidia.com>
Co-authored-by: Liana Mikaelyan <45925959+LianaMikael@users.noreply.github.com>
#545)

## What does this PR do?

Add llama converter (no dependency on internal Nvidia code) - part 1/2
- change top-level dependencies in convert_llama3_to_decilm.py from
puzzle_tools.... to modelopt.....
- added modelopt.torch._compress.tools module
- remove tokenization_mistral.py - not used

scope of 2/2 part (will come once part 1/2 is merged):
- change all deeper dependencies from from puzzle_tools.... to
modelopt....
- test_convert_llama3_config_to_decilm_config.py should run without any
internal nvidia dependencies

---------

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
…ia code) (#552)

## What does this PR do?
llama converter is self-contained now (no dependency on internal nvidia
code)

---------

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
## What does this PR do?
Make score pruning activations self-contained (no dependency on internal
Nvidia code) - 1/6
- add integration test for attention pruning (pre-step to make adding
activation scoring safer)

---------

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
## What does this PR do?
- Add score_pruning_activations.py

Notes:
- validate_model.py still depends on Nvidia internal code (will be
changed in the subsequent MR)
- sharded_checkpoint_utils.py - for now it needs to use DeciLM from
internal Nvidia code, to be changed in the next MR

---------

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <daniel.korzekwa@gmail.com>
Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
## What does this PR do?

Add activation hooks used for pruning

---------

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
…ng is self-contained now (#584)

## What does this PR do?

Add sewing kit and utilities used for pruning scoring - pruning scoring
is self-contained now - no dependency on internal Nvidia code.

---------

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <daniel.korzekwa@gmail.com>
Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
## What does this PR do?
- Add L2NormHook and use it in megatron.py
- Using L2NormHook removes code duplication between
_DynamicSelfAttention and _DynamicMLP

This is the first step towards reusing activation scores logic across
Minitron and Puzzle. Next steps:
- complete redesign of megatron.py - move other activation hooks logic
to hooks.py
- then combined those hooks.py with a similar hooks.py functoriality in
puzzle
(modelopt/torch/_compress/activation_scoring/activation_hooks/hooks.py)

Questions:
- why in the code before and after this redesign we store temp variables
in two ways _register_temp_attribute and self.hook_handle)?

```
self._register_temp_attribute("_activation_hook", activation_hook)
# TODO: confusion: why hook_handle is removed manually in export() and not using _register_temp_attribute?
self.hook_handle = self.linear_fc2.register_forward_hook(activation_hook)
```

---------

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <daniel.korzekwa@gmail.com>
Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
## What does this PR do?

Add pruning checkpoints for the compress algorithm.

---------

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
## What does this PR do?

Add build replacement library to the compress algorithm.

---------

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
## What does this PR do?

Add subblock stats to the compress algorithm.

---------

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
## What does this PR do?

Add 1-block scoring to the compress algorithm.

---------

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <daniel.korzekwa@gmail.com>
Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
…utionHook (#610)

## What does this PR do?

Add checkpoint save/load to ForwardHook + add
IterativeChannelContributionHook.

---------

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
## What does this PR do?

Add MIP step to the compress algorithm.

---------

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <daniel.korzekwa@gmail.com>
Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
## What does this PR do?

**Type of change:** Improvement <!-- Use one of the following: Bug fix,
new feature, new example, new tests, documentation. -->

- Fix tests for 2-gpu: Some places hard-coded cpu device for distributed
communications which was causing this issue
- Remove unused constrain_search_space.py
- Remove `is_multi_layer_puzzle: False` case
- Remove `use_greedy_search: False` case
- Remove knapsack mip case
- Remove unused `num_solutions` and `minimal_diversity` flags

## Testing
<!-- Mention how have you tested your change if applicable. -->

- GH CICD test passing
- Tested on 2-gpu setup locally as well


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

# Release Notes

* **Refactor**
  * Optimized solver implementation with improved library integration.
* Simplified model compression configuration by removing deprecated
search options.
  * Consolidated optimization paths for streamlined processing.

* **Chores**
  * Updated dependencies for improved compatibility.

* **Documentation**
  * Clarified Model-Optimizer installation instructions in examples.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
…hooks analysis (#670)

## What does this PR do?

Fix a bug in IterativeChannelContributionHook + tools for activation
hooks analysis

---------

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <daniel.korzekwa@gmail.com>
Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
…unctions (#667)

## What does this PR do?

- Remove `runtime.py` and directly use
`modelopt.torch.utils.distributed` functions
- Read model_dtype and autocast_dtype from validate_model_defaults.yaml
instead of runtime object
- Remove more unused functions
- Remove unnecessary parse_args for intermediate steps, and improve
docstrings

## Testing
<!-- Mention how have you tested your change if applicable. -->

- No change in functionality
- Existing tests passing

---------

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

♻️ Duplicate comments (5)
modelopt/torch/puzzletron/mip/sweep.py (1)

255-268: ⚠️ Potential issue | 🟠 Major

Isolate each sweep iteration from shared Hydra state.

This loop still mutates the shared hydra_cfg in place before handing it to launch_mip_and_realize_model(). Any downstream mutation leaks into later compression rates, so one sweep point can contaminate the next. Clone the config inside the loop before patching target_memory.

🐛 Proposed fix
-        hydra_cfg.mip.human_constraints.target_memory = target_memory_mib
-
-        # Run MIP and realize models (reuse existing distributed logic!)
-        solution_paths = mip_and_realize_models.launch_mip_and_realize_model(hydra_cfg)
+        iter_cfg = OmegaConf.create(OmegaConf.to_container(hydra_cfg, resolve=False))
+        iter_cfg.mip.human_constraints.target_memory = target_memory_mib
+
+        # Run MIP and realize models (reuse existing distributed logic!)
+        solution_paths = mip_and_realize_models.launch_mip_and_realize_model(iter_cfg)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/mip/sweep.py` around lines 255 - 268, The loop is
mutating the shared hydra_cfg in-place (setting
hydra_cfg.mip.human_constraints.target_memory) before calling
mip_and_realize_models.launch_mip_and_realize_model(), which lets downstream
runs see previous mutations; fix it by making a clone/copy of hydra_cfg at the
top of each iteration (e.g., deep copy) and apply the target_memory change to
that cloned config (not the original) so that
launch_mip_and_realize_model(hydra_cfg_clone) receives an isolated config per
compression_rate; update references to
hydra_cfg.mip.human_constraints.target_memory to use the cloned config instead.
modelopt/torch/puzzletron/mip/run_puzzle.py (1)

411-424: ⚠️ Potential issue | 🟡 Minor

These validation guards still miss “present but None” arguments.

parse_args() always creates subblock_stats_args, mip_constraints, and human_constraints with None defaults, so hasattr() is always true here. Missing config skips the friendly CLI error and fails later in _load_all_* instead.

🐛 Proposed fix
-    if not hasattr(args, "subblock_stats_args") and "subblock_stats_args" not in puzzle_profile:
+    if getattr(args, "subblock_stats_args", None) is None and "subblock_stats_args" not in puzzle_profile:
         mprint(
             "error: Must specify `subblock_stats_args` in either puzzle_profile or as a commandline arg."
         )
         sys.exit(1)
@@
-        not hasattr(args, "mip_constraints")
-        and not hasattr(args, "human_constraints")
+        getattr(args, "mip_constraints", None) is None
+        and getattr(args, "human_constraints", None) is None
         and "mip_constraints" not in puzzle_profile
         and "human_constraints" not in puzzle_profile
     ):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/mip/run_puzzle.py` around lines 411 - 424, The
guards use hasattr(args, "...") which is always true because parse_args() sets
defaults, so change the checks for subblock_stats_args, mip_constraints, and
human_constraints to treat None as missing: when verifying before calling the
loaders (the block around args and puzzle_profile checks in run_puzzle.py), test
both that the key is present in puzzle_profile OR that getattr(args, "<name>")
is not None (e.g., getattr(args, "subblock_stats_args") is not None), and
similarly for mip_constraints and human_constraints, so the CLI-friendly
sys.exit path runs when values are missing or explicitly None before reaching
_load_all_*.
modelopt/torch/puzzletron/pruning/pruning_utils.py (1)

197-209: ⚠️ Potential issue | 🟠 Major

Key the activation-log cache by directory.

Lines 197-209 still keep a single process-global ACTIVATIONS_LOG. Beyond the earlier race concern, the first activations_log_dir wins for the whole process, so a second model or test run silently reuses the wrong ranking data.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/pruning/pruning_utils.py` around lines 197 - 209,
The global ACTIVATIONS_LOG is currently a single flat dict leading to cross-run
pollution; change the cache to be keyed by activations_log_dir inside
_cache_activations_log so each directory stores its own sub-dict (e.g.
ACTIVATIONS_LOG[activations_log_dir] = {...}) and update the load-check to only
populate when that key is missing, using activations_log_dir (from
mlp_init_config) as the unique cache key; update any code that reads
ACTIVATIONS_LOG to first index by directory (or provide a getter that returns
ACTIVATIONS_LOG[activations_log_dir]) so multiple models/tests do not share the
same activation entries.
modelopt/torch/puzzletron/anymodel/models/nemotron_h_v2/nemotron_h_v2_model_descriptor.py (1)

55-58: ⚠️ Potential issue | 🟠 Major

The Nemotron fallback still imports the whole shared remote-code cache.

Lines 55-58 walk transformers_modules.__path__ wholesale when the class isn't already loaded. That reintroduces the earlier issue: one Nemotron lookup can execute top-level code from unrelated cached trust_remote_code repos instead of only the selected checkpoint package.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@modelopt/torch/puzzletron/anymodel/models/nemotron_h_v2/nemotron_h_v2_model_descriptor.py`
around lines 55 - 58, The fallback currently walks the entire
transformers_modules.__path__ via pkgutil.walk_packages and
importlib.import_module, which can execute top-level code from unrelated
trust_remote_code caches; change the logic to only enumerate/import submodules
within the specific checkpoint package instead of the whole transformers_modules
namespace: use the selected checkpoint package's path (or its prefix) when
calling pkgutil.iter_modules/walk_packages and importlib.import_module (e.g.,
derive a single package path for the checkpoint and pass that to pkgutil or
importlib) and then call inspect.getmembers on those module objects; avoid
iterating over transformers_modules.__path__ wholesale to prevent importing
unrelated cached repos.
modelopt/torch/puzzletron/tools/checkpoint_utils.py (1)

126-132: ⚠️ Potential issue | 🟡 Minor

init_empty_module() still races on the global default dtype.

Lines 126-132 fixed exception safety, but concurrent calls can still interleave torch.set_default_dtype() and instantiate modules under the wrong dtype. This still needs the lock requested in the earlier review.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/tools/checkpoint_utils.py` around lines 126 - 132,
The code still races on the global torch default dtype; add a module-level
threading.Lock (e.g., _default_dtype_lock = threading.Lock()) and acquire it
around the dtype change and module instantiation to serialize access: inside
init_empty_module, move torch.get_default_dtype(),
torch.set_default_dtype(dtype) and the call to skip_init(module_cls, ...) into a
with _default_dtype_lock: block, restore the original dtype in a finally inside
that locked region so no other thread can interleave torch.set_default_dtype()
and skip_init; keep using skip_init and preserve the existing exception-safety
behavior.
🟡 Minor comments (6)
modelopt/torch/puzzletron/puzzletron_nas_plugin.py-108-108 (1)

108-108: ⚠️ Potential issue | 🟡 Minor

Docstring typo in API reference.

Line 108 says mnt.search(); should be mtn.search().

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/puzzletron_nas_plugin.py` at line 108, Fix the
docstring typo in puzzletron_nas_plugin.py by replacing the incorrect reference
"mnt.search()" with the correct API call "mtn.search()" in the
PuzzletronNASPlugin (or the module-level docstring) so the documentation
correctly points to the mtn.search() method used for NAS searches; update any
other occurrences in the same file to ensure consistency.
modelopt/torch/puzzletron/puzzletron_nas_plugin.py-25-27 (1)

25-27: ⚠️ Potential issue | 🟡 Minor

Move hydra import inside functions that use it to avoid import-time failures when puzzletron extras are not installed.

The bare import hydra at line 25 will cause ModuleNotFoundError when the puzzletron subpackage is imported without the [puzzletron] optional extra installed. Since hydra is only used inside the conversion and search execution paths (via hydra.utils.instantiate() at lines 126 and 227), move the import statement into those functions using local imports. This follows the codebase's established pattern for optional dependencies and preserves the optionality of the puzzletron feature.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/puzzletron_nas_plugin.py` around lines 25 - 27,
Remove the top-level "import hydra" and instead add a local "import hydra"
inside each function that calls hydra.utils.instantiate (the conversion path
function that calls hydra.utils.instantiate and the search execution function
that calls hydra.utils.instantiate); this follows the optional-dependency
pattern so importing the module won't raise ModuleNotFoundError when puzzletron
extras aren't installed—locate the hydra.utils.instantiate calls and insert a
local import hydra at the start of those functions and delete the module-level
import.
modelopt/torch/puzzletron/sewing_kit/utils.py-164-167 (1)

164-167: ⚠️ Potential issue | 🟡 Minor

Respect reversed=True in get_active().

When reversed=True, the active entry is inserted at index 0, but Line 166 always returns the last element. That reports the wrong context for reversed stacks. Return index 0 in that mode.

Suggested fix
     def get_active(self) -> Optional[T]:
         if self.is_active():
-            return self.activity_stack[-1]
+            return self.activity_stack[0] if self.reversed else self.activity_stack[-1]
         return None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/sewing_kit/utils.py` around lines 164 - 167, The
get_active() method always returns activity_stack[-1], which is wrong when the
stack was created with reversed=True (entries are pushed at index 0); update
get_active() to check the stack's reversed mode and return activity_stack[0]
when reversed is True and activity_stack[-1] otherwise (still returning None
when is_active() is False); reference the get_active method and the
activity_stack and reversed flag to locate and adjust the logic accordingly.
modelopt/torch/puzzletron/anymodel/converter/convert_any_model.py-54-56 (1)

54-56: ⚠️ Potential issue | 🟡 Minor

Validate input_dir early to fail fast with a clear error.

If input_dir is wrong, failure will happen deeper in conversion with less actionable errors.

Proposed fix
     input_dir = Path(input_dir)
     output_dir = Path(output_dir)
+    if not input_dir.is_dir():
+        raise FileNotFoundError(f"Input checkpoint directory does not exist: {input_dir}")
     output_dir.mkdir(parents=True, exist_ok=True)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/anymodel/converter/convert_any_model.py` around
lines 54 - 56, Validate the provided input_dir immediately after converting it
to a Path: after input_dir = Path(input_dir) check input_dir.exists() and
input_dir.is_dir() and raise a clear exception (e.g., FileNotFoundError or
ValueError) with a helpful message including the provided path so the caller
fails fast; update the code in convert_any_model.py (look for the input_dir
variable in that module) to perform this early validation before creating or
using output_dir or proceeding with conversion.
modelopt/torch/puzzletron/block_config.py-276-282 (1)

276-282: ⚠️ Potential issue | 🟡 Minor

Handle empty parallel_blocks before indexing it.

If a serialized config contains parallel_blocks: [], Line 278 raises IndexError during deserialization. An empty list should just round-trip as “no parallel blocks”.

🩹 Proposed fix
     def __post_init__(self):
         super().__post_init__()
-        if (self.parallel_blocks is not None) and isinstance(self.parallel_blocks[0], dict):
+        if self.parallel_blocks and isinstance(self.parallel_blocks[0], dict):
             initialized_block_configs = [
                 BlockConfig(**block_config) for block_config in self.parallel_blocks
             ]
             self._force_setattr("parallel_blocks", initialized_block_configs)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/block_config.py` around lines 276 - 282, The
__post_init__ in BlockConfig currently indexes parallel_blocks[0] and will raise
IndexError for an empty list; update the condition to first ensure
parallel_blocks is truthy (non-empty) before checking element types (e.g., use
"if self.parallel_blocks and isinstance(self.parallel_blocks[0], dict)") so
empty lists are left as-is and only lists with dict items are converted to
BlockConfig instances via BlockConfig(...) and set via
_force_setattr("parallel_blocks", ...); reference __post_init__,
parallel_blocks, BlockConfig, and _force_setattr when making the change.
modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py-321-374 (1)

321-374: ⚠️ Potential issue | 🟡 Minor

Deduplicate scenarios created during the current run as well.

Lines 321-374 only guard against "args" blocks that already exist on disk, and Lines 330-332 always append the teacher hidden size. If the config already contains that size—or any duplicate batch/hidden-size combo—the same scenario is emitted twice in one invocation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py` around lines
321 - 374, The loop only checks existing on-disk args via subblock_stats_args
but can still emit duplicates within the same run (especially after appending
teacher hidden size to model_hidden_sizes); fix by tracking seen scenarios for
this invocation: create a runtime set (e.g., curr_run_args) of immutabledict
args and initialize it from subblock_stats_args, then for each generated
curr_subblock_stats compute key = immutabledict(curr_subblock_stats["args"]) and
skip/raise if key in curr_run_args; after appending curr_subblock_stats also add
key to curr_run_args (and/or update subblock_stats_args) to prevent duplicates
during the same run; also ensure model_hidden_sizes is deduplicated before
iteration (e.g., build a unique list or set from model_hidden_sizes +
[lm_config.hidden_size]) so the teacher hidden size doesn’t cause duplicate
iterations.
🧹 Nitpick comments (6)
modelopt/torch/puzzletron/plugins/mbridge/qwen3.py (1)

26-26: Export PuzzletronQwen3AnyModelBridge in __all__ for consistency.

Registration works today, but an empty __all__ makes this module’s public surface inconsistent with llama.py and less predictable for star-import consumers.

♻️ Proposed change
-__all__ = []
+__all__ = ["PuzzletronQwen3AnyModelBridge"]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/plugins/mbridge/qwen3.py` at line 26, The module
currently exports an empty __all__, which hides PuzzletronQwen3AnyModelBridge
from star-imports; update the module-level __all__ to include the class name
"PuzzletronQwen3AnyModelBridge" so the public API is consistent with other
bridge modules (e.g., llama.py) and predictable for consumers using from module
import *; locate the class definition PuzzletronQwen3AnyModelBridge and add its
exact identifier to the __all__ list.
modelopt/torch/puzzletron/puzzletron_nas_plugin.py (1)

133-133: Hardcoded output subdir should be config-driven.

Line 133 hardcodes "ckpts/teacher" (and already has TODO). Promote it to PuzzletronConfig to avoid path coupling across workflows.

I can draft the config field and wiring patch if you want.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/puzzletron_nas_plugin.py` at line 133, Replace the
hardcoded hf_ckpt_teacher_dir = "ckpts/teacher" by adding a new config field on
PuzzletronConfig (e.g., hf_ckpt_teacher_subdir or teacher_checkpoint_subdir) and
use that field where hf_ckpt_teacher_dir is referenced in
puzzletron_nas_plugin.py; update the plugin to read the value from the
PuzzletronConfig instance (falling back to the previous default "ckpts/teacher"
if unset) so all paths are configurable and not hardcoded.
modelopt/torch/puzzletron/tools/post_init_sparse.py (1)

29-30: Make the base strategy fail fast instead of deferring broken defaults to runtime.

do_sparsity() calls calculate_masks(...) and filter_function(name, modules), but the base class currently provides a docstring-only calculate_masks and a zero-arg filter_function. That leaves SparsityMethod instantiable but unusable.

♻️ Minimal fail-fast change
 class SparsityMethod:
     def calculate_masks(self, state_dict: dict[str, torch.Tensor]) -> dict[str, torch.Tensor]:
-        """Gets a model state_dict, returns a state_dict-like mask_dict with masks"""
+        raise NotImplementedError
 
-    def filter_function(self):
-        pass
+    def filter_function(self, name, modules_to_sparsify_in_block):
+        raise NotImplementedError

Also applies to: 54-55, 64-78

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/tools/post_init_sparse.py` around lines 29 - 30,
The base class SparsityMethod is currently instantiable but unusable; update
SparsityMethod so calculate_masks(self, state_dict: dict[str, torch.Tensor])
raises NotImplementedError with a clear message instead of a docstring, and
change filter_function to accept the expected signature filter_function(self,
name: str, modules: Sequence[nn.Module]) (or similar used by do_sparsity) and
raise NotImplementedError as well; ensure do_sparsity still calls
calculate_masks(...) and filter_function(name, modules) but will now fail fast
on the base class rather than deferring broken defaults to runtime.
modelopt/torch/puzzletron/tools/logger.py (1)

21-21: Remove the unused launcher side-effect import.

The logger already reads LOCAL_RANK, RANK, and WORLD_SIZE directly, so Line 21 only adds a deprecated Torch dependency to this module's import path. If a supported runtime drops torch.distributed.launch, importing the logger will fail before any logging code runs.

Suggested fix
-import torch.distributed.launch  # noqa: F401
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/tools/logger.py` at line 21, Remove the unused
side-effect import "import torch.distributed.launch" from logger.py because the
logger already reads environment variables (LOCAL_RANK, RANK, WORLD_SIZE)
directly; locate the import statement in logger.py and delete it so the module
no longer depends on the deprecated torch.distributed.launch runtime, keeping
any existing references to LOCAL_RANK/RANK/WORLD_SIZE intact.
examples/puzzletron/requirements.txt (1)

2-3: Pin the new example deps to tested, approved versions.

math-verify and ray are open-ended here. That makes the example environment drift over time, and ray has had critical advisories on some released versions. Please pin the versions/ranges you actually test, and verify whether math-verify needs the extra codeowner approval required for non-permissive licenses.

As per coding guidelines, "Any addition of new PIP dependencies in pyproject.toml or requirements.txt that are not permissive licenses ... must be reviewed and approved by @NVIDIA/modelopt-setup-codeowners with an explicit justification in the PR description."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/puzzletron/requirements.txt` around lines 2 - 3, The
requirements.txt entries for the puzzletron example use open-ended package specs
("math-verify" and "ray"); update examples/puzzletron/requirements.txt to pin
both packages to the specific versions or tight version ranges you actually test
(e.g., ray==X.Y.Z or ray>=A.B,<C.D) to prevent drift, and confirm whether
math-verify's license is non-permissive—if so, add the required approval from
`@NVIDIA/modelopt-setup-codeowners` and an explicit justification in the PR
description before merging.
modelopt/torch/puzzletron/anymodel/converter/convert_any_model.py (1)

15-15: Avoid file-wide mypy suppression.

Line 15 disables all type checking for this module. Please prefer targeted ignores on specific lines/issues instead of blanket suppression.

As per coding guidelines: "Use mypy for static type checking Python code".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/anymodel/converter/convert_any_model.py` at line
15, Remove the file-wide "# mypy: ignore-errors" at the top of
convert_any_model.py and replace it with targeted type fixes or inline ignores
for the specific symbols that fail type checking; identify the offending
functions/classes (e.g., any public functions or classes in convert_any_model.py
such as convert_any_model, AnyModelConverter, or helper functions) by running
mypy on this module, fix obvious typing issues (add annotations, correct return
types) and where necessary use precise inline ignores like "# type:
ignore[<error-code>]" on the specific lines rather than suppressing checks for
the whole file.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: be30b5b3-7077-4f8b-b84f-9646c31871be

📥 Commits

Reviewing files that changed from the base of the PR and between 75651cc and ed5fd68.

📒 Files selected for processing (110)
  • .github/CODEOWNERS
  • .github/workflows/_example_tests_runner.yml
  • .github/workflows/example_tests.yml
  • .github/workflows/gpu_tests.yml
  • .pre-commit-config.yaml
  • CHANGELOG.rst
  • docs/source/conf.py
  • examples/llm_eval/README.md
  • examples/llm_eval/lm_eval_hf.py
  • examples/puzzletron/README.md
  • examples/puzzletron/evaluation/hf_deployable_anymodel.py
  • examples/puzzletron/main.py
  • examples/puzzletron/mbridge_distillation/README.md
  • examples/puzzletron/mbridge_distillation/distill_hf.py
  • examples/puzzletron/requirements.txt
  • modelopt/torch/prune/importance_hooks/base_hooks.py
  • modelopt/torch/prune/importance_hooks/compare_module_outputs.py
  • modelopt/torch/prune/importance_hooks/expert_removal_hooks.py
  • modelopt/torch/puzzletron/__init__.py
  • modelopt/torch/puzzletron/activation_scoring/__init__.py
  • modelopt/torch/puzzletron/activation_scoring/activation_hooks/__init__.py
  • modelopt/torch/puzzletron/activation_scoring/activation_hooks/utils.py
  • modelopt/torch/puzzletron/activation_scoring/score_pruning_activations.py
  • modelopt/torch/puzzletron/anymodel/README.md
  • modelopt/torch/puzzletron/anymodel/__init__.py
  • modelopt/torch/puzzletron/anymodel/converter/__init__.py
  • modelopt/torch/puzzletron/anymodel/converter/base.py
  • modelopt/torch/puzzletron/anymodel/converter/convert_any_model.py
  • modelopt/torch/puzzletron/anymodel/converter/converter_factory.py
  • modelopt/torch/puzzletron/anymodel/model_descriptor/__init__.py
  • modelopt/torch/puzzletron/anymodel/model_descriptor/base.py
  • modelopt/torch/puzzletron/anymodel/model_descriptor/model_descriptor_factory.py
  • modelopt/torch/puzzletron/anymodel/models/__init__.py
  • modelopt/torch/puzzletron/anymodel/models/gpt_oss/__init__.py
  • modelopt/torch/puzzletron/anymodel/models/gpt_oss/gpt_oss_converter.py
  • modelopt/torch/puzzletron/anymodel/models/gpt_oss/gpt_oss_model_descriptor.py
  • modelopt/torch/puzzletron/anymodel/models/gpt_oss/gpt_oss_pruned_to_mxfp4.py
  • modelopt/torch/puzzletron/anymodel/models/llama/__init__.py
  • modelopt/torch/puzzletron/anymodel/models/llama/llama_converter.py
  • modelopt/torch/puzzletron/anymodel/models/llama/llama_model_descriptor.py
  • modelopt/torch/puzzletron/anymodel/models/mistral_small/__init__.py
  • modelopt/torch/puzzletron/anymodel/models/mistral_small/mistral_small_converter.py
  • modelopt/torch/puzzletron/anymodel/models/mistral_small/mistral_small_model_descriptor.py
  • modelopt/torch/puzzletron/anymodel/models/nemotron_h/__init__.py
  • modelopt/torch/puzzletron/anymodel/models/nemotron_h/nemotron_h_converter.py
  • modelopt/torch/puzzletron/anymodel/models/nemotron_h/nemotron_h_model_descriptor.py
  • modelopt/torch/puzzletron/anymodel/models/nemotron_h_v2/__init__.py
  • modelopt/torch/puzzletron/anymodel/models/nemotron_h_v2/nemotron_h_v2_converter.py
  • modelopt/torch/puzzletron/anymodel/models/nemotron_h_v2/nemotron_h_v2_model_descriptor.py
  • modelopt/torch/puzzletron/anymodel/models/qwen2/__init__.py
  • modelopt/torch/puzzletron/anymodel/models/qwen2/qwen2_converter.py
  • modelopt/torch/puzzletron/anymodel/models/qwen2/qwen2_model_descriptor.py
  • modelopt/torch/puzzletron/anymodel/models/qwen3/__init__.py
  • modelopt/torch/puzzletron/anymodel/models/qwen3/qwen3_converter.py
  • modelopt/torch/puzzletron/anymodel/models/qwen3/qwen3_model_descriptor.py
  • modelopt/torch/puzzletron/anymodel/models/qwen3_vl/__init__.py
  • modelopt/torch/puzzletron/anymodel/models/qwen3_vl/qwen3_vl_converter.py
  • modelopt/torch/puzzletron/anymodel/models/qwen3_vl/qwen3_vl_model_descriptor.py
  • modelopt/torch/puzzletron/anymodel/puzzformer/__init__.py
  • modelopt/torch/puzzletron/anymodel/puzzformer/no_op.py
  • modelopt/torch/puzzletron/anymodel/puzzformer/patcher.py
  • modelopt/torch/puzzletron/block_config.py
  • modelopt/torch/puzzletron/build_library_and_stats.py
  • modelopt/torch/puzzletron/dataset/__init__.py
  • modelopt/torch/puzzletron/dataset/prepare_dataset.py
  • modelopt/torch/puzzletron/entrypoint.py
  • modelopt/torch/puzzletron/mip/__init__.py
  • modelopt/torch/puzzletron/mip/mip_and_realize_models.py
  • modelopt/torch/puzzletron/mip/mip_with_multi_layer_replacements.py
  • modelopt/torch/puzzletron/mip/run_puzzle.py
  • modelopt/torch/puzzletron/mip/sweep.py
  • modelopt/torch/puzzletron/mip/utils.py
  • modelopt/torch/puzzletron/plugins/__init__.py
  • modelopt/torch/puzzletron/plugins/mbridge/__init__.py
  • modelopt/torch/puzzletron/plugins/mbridge/base.py
  • modelopt/torch/puzzletron/plugins/mbridge/llama.py
  • modelopt/torch/puzzletron/plugins/mbridge/qwen3.py
  • modelopt/torch/puzzletron/pruning/__init__.py
  • modelopt/torch/puzzletron/pruning/expert_removal_pruning_mixin.py
  • modelopt/torch/puzzletron/pruning/ffn_intermediate_pruning_mixin.py
  • modelopt/torch/puzzletron/pruning/kv_heads_pruning_mixin.py
  • modelopt/torch/puzzletron/pruning/pruning_ckpts.py
  • modelopt/torch/puzzletron/pruning/pruning_mixin.py
  • modelopt/torch/puzzletron/pruning/pruning_utils.py
  • modelopt/torch/puzzletron/puzzletron_nas_plugin.py
  • modelopt/torch/puzzletron/replacement_library/__init__.py
  • modelopt/torch/puzzletron/replacement_library/build_replacement_library.py
  • modelopt/torch/puzzletron/replacement_library/library.py
  • modelopt/torch/puzzletron/replacement_library/replacement_utils.py
  • modelopt/torch/puzzletron/scoring/__init__.py
  • modelopt/torch/puzzletron/scoring/score.py
  • modelopt/torch/puzzletron/sewing_kit/__init__.py
  • modelopt/torch/puzzletron/sewing_kit/core.py
  • modelopt/torch/puzzletron/sewing_kit/passage/__init__.py
  • modelopt/torch/puzzletron/sewing_kit/passage/core.py
  • modelopt/torch/puzzletron/sewing_kit/utils.py
  • modelopt/torch/puzzletron/subblock_stats/__init__.py
  • modelopt/torch/puzzletron/subblock_stats/calc_subblock_params_and_memory.py
  • modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py
  • modelopt/torch/puzzletron/tools/__init__.py
  • modelopt/torch/puzzletron/tools/bypassed_training/__init__.py
  • modelopt/torch/puzzletron/tools/bypassed_training/child_init.py
  • modelopt/torch/puzzletron/tools/bypassed_training/init_child_from_parent.py
  • modelopt/torch/puzzletron/tools/checkpoint_utils.py
  • modelopt/torch/puzzletron/tools/checkpoint_utils_hf.py
  • modelopt/torch/puzzletron/tools/common.py
  • modelopt/torch/puzzletron/tools/hydra_utils.py
  • modelopt/torch/puzzletron/tools/kd_model.py
  • modelopt/torch/puzzletron/tools/logger.py
  • modelopt/torch/puzzletron/tools/post_init_sparse.py
✅ Files skipped from review due to trivial changes (27)
  • examples/llm_eval/README.md
  • .github/CODEOWNERS
  • CHANGELOG.rst
  • modelopt/torch/puzzletron/anymodel/models/nemotron_h_v2/init.py
  • modelopt/torch/puzzletron/activation_scoring/init.py
  • modelopt/torch/puzzletron/anymodel/models/qwen2/init.py
  • modelopt/torch/puzzletron/mip/init.py
  • modelopt/torch/puzzletron/anymodel/model_descriptor/init.py
  • modelopt/torch/puzzletron/sewing_kit/passage/init.py
  • modelopt/torch/puzzletron/anymodel/models/llama/init.py
  • modelopt/torch/puzzletron/anymodel/models/nemotron_h/init.py
  • modelopt/torch/puzzletron/scoring/init.py
  • modelopt/torch/puzzletron/anymodel/models/qwen3/init.py
  • modelopt/torch/puzzletron/anymodel/models/gpt_oss/init.py
  • modelopt/torch/puzzletron/anymodel/models/qwen3_vl/init.py
  • examples/puzzletron/mbridge_distillation/README.md
  • modelopt/torch/puzzletron/anymodel/models/mistral_small/mistral_small_converter.py
  • modelopt/torch/puzzletron/anymodel/models/llama/llama_converter.py
  • modelopt/torch/puzzletron/mip/utils.py
  • modelopt/torch/puzzletron/tools/kd_model.py
  • modelopt/torch/puzzletron/anymodel/models/qwen2/qwen2_converter.py
  • modelopt/torch/puzzletron/anymodel/models/gpt_oss/gpt_oss_converter.py
  • modelopt/torch/puzzletron/tools/hydra_utils.py
  • modelopt/torch/puzzletron/anymodel/README.md
  • modelopt/torch/puzzletron/tools/bypassed_training/child_init.py
  • modelopt/torch/puzzletron/sewing_kit/passage/core.py
  • modelopt/torch/puzzletron/anymodel/models/nemotron_h/nemotron_h_model_descriptor.py
🚧 Files skipped from review as they are similar to previous changes (27)
  • .pre-commit-config.yaml
  • modelopt/torch/puzzletron/dataset/init.py
  • modelopt/torch/puzzletron/anymodel/models/mistral_small/init.py
  • modelopt/torch/puzzletron/activation_scoring/activation_hooks/init.py
  • modelopt/torch/puzzletron/sewing_kit/init.py
  • modelopt/torch/puzzletron/dataset/prepare_dataset.py
  • modelopt/torch/puzzletron/tools/common.py
  • modelopt/torch/prune/importance_hooks/compare_module_outputs.py
  • modelopt/torch/puzzletron/build_library_and_stats.py
  • modelopt/torch/puzzletron/anymodel/models/qwen3/qwen3_converter.py
  • modelopt/torch/puzzletron/init.py
  • modelopt/torch/puzzletron/anymodel/models/init.py
  • modelopt/torch/puzzletron/anymodel/puzzformer/no_op.py
  • .github/workflows/example_tests.yml
  • examples/puzzletron/main.py
  • modelopt/torch/puzzletron/activation_scoring/activation_hooks/utils.py
  • modelopt/torch/puzzletron/anymodel/model_descriptor/model_descriptor_factory.py
  • modelopt/torch/puzzletron/anymodel/init.py
  • modelopt/torch/puzzletron/anymodel/converter/converter_factory.py
  • modelopt/torch/puzzletron/tools/init.py
  • modelopt/torch/puzzletron/anymodel/models/nemotron_h/nemotron_h_converter.py
  • modelopt/torch/puzzletron/mip/mip_and_realize_models.py
  • modelopt/torch/puzzletron/pruning/pruning_mixin.py
  • modelopt/torch/puzzletron/tools/bypassed_training/init_child_from_parent.py
  • modelopt/torch/puzzletron/anymodel/puzzformer/init.py
  • modelopt/torch/puzzletron/anymodel/models/qwen2/qwen2_model_descriptor.py
  • modelopt/torch/puzzletron/sewing_kit/core.py

Comment thread examples/puzzletron/evaluation/hf_deployable_anymodel.py
Comment on lines +341 to +346
def get_triton_output(self):
return (
Tensor(name="sentences", shape=(-1,), dtype=bytes),
Tensor(name="logits", shape=(-1,), dtype=np.single),
Tensor(name="scores", shape=(-1,), dtype=np.single),
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

For PyTriton Tensor(shape=...), does the declared shape exclude only the batch dimension, and must every declared output be returned with matching rank on each request?

💡 Result:

Yes, for PyTriton Tensor(shape=...), the declared shape excludes the batch dimension when max_batch_size > 0 (which is the default), and every declared output must be returned with a matching rank (and shape, when strict=True) on each request.

Citations:


🏁 Script executed:

cd examples/puzzletron/evaluation && wc -l hf_deployable_anymodel.py

Repository: NVIDIA/Model-Optimizer

Length of output: 95


🏁 Script executed:

cd examples/puzzletron/evaluation && sed -n '330,350p' hf_deployable_anymodel.py

Repository: NVIDIA/Model-Optimizer

Length of output: 1035


🏁 Script executed:

cd examples/puzzletron/evaluation && sed -n '395,435p' hf_deployable_anymodel.py

Repository: NVIDIA/Model-Optimizer

Length of output: 1721


Fix Triton output signature to match handler returns.

Lines 341-346 declare logits and scores as rank-1 outputs (shape=(-1,)), but the handler returns rank-3 tensors after transpose(1, 0, 2) at lines 417 and 423. Additionally, the error path (line 431) omits these outputs entirely, creating a contract violation where declared outputs are not always present. PyTriton requires all declared outputs to be returned with matching rank and presence on every request.

Also applies to: 401-429

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/puzzletron/evaluation/hf_deployable_anymodel.py` around lines 341 -
346, The Triton output signature declared in get_triton_output must match the
actual tensors returned by the request handler: change the shapes/types for
"logits" and "scores" to the correct rank-3 shapes (matching the tensors
produced after transpose(1, 0, 2)) and ensure the handler always returns all
declared outputs even on error; update get_triton_output (and any duplicate
signatures in the 401-429 region) to declare "logits" and "scores" with the
proper three-dimensional shape and modify the error branch in the handler so it
returns placeholder tensors for "logits" and "scores" (e.g., zeros or
appropriate empty arrays with the correct dtype and rank) alongside "sentences"
to satisfy PyTriton’s requirement that all declared outputs are present and
rank-compatible.

Comment thread modelopt/torch/puzzletron/mip/mip_with_multi_layer_replacements.py
Comment thread modelopt/torch/puzzletron/tools/checkpoint_utils_hf.py Outdated
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt/torch/puzzletron/scoring.py`:
- Line 18: Remove the module-wide "# mypy: ignore-errors" directive at the top
of scoring.py and instead fix the underlying type issues or apply scoped
ignores: run mypy, correct function/type annotations on the scoring entrypoints
(e.g., any top-level functions or classes that implement scoring or exports in
scoring.py), add precise per-line or per-symbol ignores (e.g., "# type:
ignore[arg-type]" or annotate with typing like Any, Optional, Callable) only
where necessary, and re-run mypy to ensure no module-wide suppression remains.
- Around line 84-86: Wrap the call to launch_scoring(cfg) in a try/finally so
that dist.cleanup() is always executed even if launch_scoring raises;
specifically, after calling dist.setup(timeout=cfg.nccl_timeout_minutes) enter a
try block, call launch_scoring(cfg) inside it, and call dist.cleanup() in the
finally block (re-raising any exception if needed) to ensure the process group
is torn down reliably.
- Around line 61-70: get_solutions_to_validate currently calls
pd.read_json(cfg.scoring.solutions_path) unconditionally which will fail if
solutions_path is a directory; update get_solutions_to_validate to mirror
load_puzzle_solutions behavior: convert cfg.scoring.solutions_path to a Path, if
is_file() use pd.read_json to produce single_block_replacement_solutions,
otherwise iterate over solution_*.json files in the directory, read each JSON
and concatenate into a DataFrame/list; then continue using
find_missing_solutions or np.arange(...) as before (refer to
get_solutions_to_validate, pd.read_json, load_puzzle_solutions,
find_missing_solutions and cfg.scoring.solutions_path to locate the relevant
code).

In `@modelopt/torch/puzzletron/sewing_kit/passage.py`:
- Around line 181-190: The __init__ uses mutable default arguments (e.g.,
outputs_cache: dict[str, Any] = {}, inputs_to_capture: Iterable[str] = [],
outputs_to_capture: Iterable[str] = [], input_overrides: Mapping[...] = {},
output_overrides: Mapping[...] = {}), causing shared state between instances;
change these defaults to None and inside Passage.__init__ initialize them to
fresh empty containers (e.g., if outputs_cache is None: outputs_cache = {}),
doing the same for
inputs_to_capture/outputs_to_capture/input_overrides/output_overrides so each
Passage instance gets its own local collections and no cross-instance leakage
occurs (also apply the same fix to the similar parameter list around the other
constructor occurrence noted).
- Around line 356-418: The cache fast-path currently skips modules even when
inputs are requested because can_be_skipped ignores passage.inputs_to_capture;
update can_be_skipped (used by PassageModuleWrapper.can_be_skipped called from
run_passage) to also check passage.inputs_to_capture and return False if any
entry in inputs_to_capture matches the module (using is_submodule_or_same for
direct captures and/or is_submodule_of for descendant captures), so that when
run_passage would return a cached_output it does not bypass required input
capture logic.
- Around line 289-300: The context-manager leaves stale state and builds names
that don't match forward(): in Passage.__exit__ call the active context
manager's __exit__ and then set self.active_context_manager = None so future
enters don't assert; in Passage.__enter__ stop using self.named_modules() to
build module_to_name_mapping (which yields "module."-prefixed names) and instead
derive names that match the override/capture keys used by forward() — e.g.,
iterate self.modules() (or use named_modules() but strip the "module." prefix)
and assign module_to_name_mapping entries keyed by id(module) -> the exact
unprefixed name expected by override/capture so lookups align.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 47033822-a1f6-49a1-a096-6c7e29002d82

📥 Commits

Reviewing files that changed from the base of the PR and between ed5fd68 and 547e76d.

📒 Files selected for processing (5)
  • modelopt/torch/puzzletron/__init__.py
  • modelopt/torch/puzzletron/dataset/prepare_dataset.py
  • modelopt/torch/puzzletron/scoring.py
  • modelopt/torch/puzzletron/sewing_kit/passage.py
  • modelopt/torch/puzzletron/utils/validate_runtime_pipeline.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • modelopt/torch/puzzletron/init.py
  • modelopt/torch/puzzletron/dataset/prepare_dataset.py

Comment thread modelopt/torch/puzzletron/scoring.py
Comment on lines +61 to +70
def get_solutions_to_validate(cfg: DictConfig):
_solutions_to_validate = cfg.scoring.solutions_to_validate
if _solutions_to_validate is None:
single_block_replacement_solutions = pd.read_json(cfg.scoring.solutions_path)
if cfg.scoring.skip_existing_solutions:
_solutions_to_validate = find_missing_solutions(
single_block_replacement_solutions, cfg.scoring.output_dir
)
else:
_solutions_to_validate = np.arange(single_block_replacement_solutions.shape[0]).tolist()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's locate and read the file
find . -name "scoring.py" -path "*/puzzletron/*" | head -5

Repository: NVIDIA/Model-Optimizer

Length of output: 104


🏁 Script executed:

# Get the file and look at the relevant lines
cat -n modelopt/torch/puzzletron/scoring.py | sed -n '50,80p'

Repository: NVIDIA/Model-Optimizer

Length of output: 1535


🏁 Script executed:

# Search for the validator contract/schema definition
rg -n "solutions_path|solutions_to_validate" modelopt/torch/puzzletron/ -A 2 -B 2 | head -50

Repository: NVIDIA/Model-Optimizer

Length of output: 5099


🏁 Script executed:

# Look for find_missing_solutions function
rg -n "def find_missing_solutions" modelopt/torch/puzzletron/ -A 10

Repository: NVIDIA/Model-Optimizer

Length of output: 1014


🏁 Script executed:

# Look for extract_solution_id function mentioned in suggested fix
rg -n "extract_solution_id" modelopt/torch/puzzletron/ -A 5

Repository: NVIDIA/Model-Optimizer

Length of output: 1006


🏁 Script executed:

# Look at load_puzzle_solutions function to see how it handles both file and directory
rg -n "def load_puzzle_solutions" modelopt/torch/puzzletron/ -A 20

Repository: NVIDIA/Model-Optimizer

Length of output: 2440


🏁 Script executed:

# Check imports to see what's available in scoring.py
head -30 modelopt/torch/puzzletron/scoring.py

Repository: NVIDIA/Model-Optimizer

Length of output: 1099


🏁 Script executed:

# Check if load_puzzle_solutions is used in scoring.py
rg -n "load_puzzle_solutions" modelopt/torch/puzzletron/scoring.py

Repository: NVIDIA/Model-Optimizer

Length of output: 48


🏁 Script executed:

# Check if load_puzzle_solutions is imported/available in scoring.py module
rg -n "load_puzzle_solutions|from.*tools" modelopt/torch/puzzletron/scoring.py

Repository: NVIDIA/Model-Optimizer

Length of output: 248


🏁 Script executed:

# Let's look at the complete extract_solution_id function to confirm its behavior
cat -n modelopt/torch/puzzletron/scoring.py | sed -n '37,45p'

Repository: NVIDIA/Model-Optimizer

Length of output: 406


🏁 Script executed:

# Check if there's usage documentation or test cases that show solutions_path as directory
rg -n "solutions_path" modelopt/torch/puzzletron/ -B 2 -A 2 | grep -i "dir\|glob" | head -20

Repository: NVIDIA/Model-Optimizer

Length of output: 981


Support directory-valued solutions_path before calling pd.read_json.

The validator contract allows solutions_path to be either a single JSON file or a directory of solution_<id>.json files, but pd.read_json() on line 64 only handles the file case and will fail when given a directory path. This mirrors the issue handled by load_puzzle_solutions() in validate_puzzle_with_multi_replacements.py, which correctly checks is_file() vs directory and processes accordingly.

Suggested fix
 def get_solutions_to_validate(cfg: DictConfig):
     _solutions_to_validate = cfg.scoring.solutions_to_validate
     if _solutions_to_validate is None:
-        single_block_replacement_solutions = pd.read_json(cfg.scoring.solutions_path)
+        solutions_path = cfg.scoring.solutions_path
+        if os.path.isdir(solutions_path):
+            files = glob(os.path.join(solutions_path, "solution_*.json"))
+            all_solution_ids = sorted(
+                s for s in (extract_solution_id(os.path.basename(f)) for f in files) if s is not None
+            )
+        else:
+            single_block_replacement_solutions = pd.read_json(solutions_path)
+            all_solution_ids = np.arange(single_block_replacement_solutions.shape[0]).tolist()
+
         if cfg.scoring.skip_existing_solutions:
-            _solutions_to_validate = find_missing_solutions(
-                single_block_replacement_solutions, cfg.scoring.output_dir
-            )
+            _solutions_to_validate = np.setdiff1d(
+                all_solution_ids,
+                [
+                    s
+                    for s in (
+                        extract_solution_id(os.path.basename(p))
+                        for p in glob(os.path.join(cfg.scoring.output_dir, "solution_*.json"))
+                    )
+                    if s is not None
+                ],
+            ).tolist()
         else:
-            _solutions_to_validate = np.arange(single_block_replacement_solutions.shape[0]).tolist()
+            _solutions_to_validate = all_solution_ids
     return _solutions_to_validate
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/scoring.py` around lines 61 - 70,
get_solutions_to_validate currently calls
pd.read_json(cfg.scoring.solutions_path) unconditionally which will fail if
solutions_path is a directory; update get_solutions_to_validate to mirror
load_puzzle_solutions behavior: convert cfg.scoring.solutions_path to a Path, if
is_file() use pd.read_json to produce single_block_replacement_solutions,
otherwise iterate over solution_*.json files in the directory, read each JSON
and concatenate into a DataFrame/list; then continue using
find_missing_solutions or np.arange(...) as before (refer to
get_solutions_to_validate, pd.read_json, load_puzzle_solutions,
find_missing_solutions and cfg.scoring.solutions_path to locate the relevant
code).

Comment thread modelopt/torch/puzzletron/scoring.py Outdated
Comment on lines +181 to +190
def __init__(
self,
module: nn.Module,
*,
inputs_to_capture: Iterable[str] = [],
outputs_to_capture: Iterable[str] = [],
input_overrides: Mapping[str, PassageInputAdapter | InputArgs] = {},
output_overrides: Mapping[str, PassageOutputAdapter | Any] = {},
outputs_cache: dict[str, Any] = {},
capture_fake_outputs_predicate: Predicate = always_false_predicate,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Make outputs_cache instance-local.

The default {} here is shared across calls. A fake output cached by one Passage can leak into the next one and make can_be_skipped() reuse stale values on an unrelated run.

Suggested fix
     def __init__(
         self,
         module: nn.Module,
         *,
-        inputs_to_capture: Iterable[str] = [],
-        outputs_to_capture: Iterable[str] = [],
-        input_overrides: Mapping[str, PassageInputAdapter | InputArgs] = {},
-        output_overrides: Mapping[str, PassageOutputAdapter | Any] = {},
-        outputs_cache: dict[str, Any] = {},
+        inputs_to_capture: Iterable[str] = (),
+        outputs_to_capture: Iterable[str] = (),
+        input_overrides: Mapping[str, PassageInputAdapter | InputArgs] | None = None,
+        output_overrides: Mapping[str, PassageOutputAdapter | Any] | None = None,
+        outputs_cache: Mapping[str, Any] | None = None,
         capture_fake_outputs_predicate: Predicate = always_false_predicate,
         capture_cache_outputs_predicate: Predicate = always_false_predicate,
         early_exit: bool = False,
         name: Optional[str] = None,
     ):
@@
-        self.input_overrides = input_overrides
-        self.output_overrides = output_overrides
-        self.outputs_cache = outputs_cache
+        self.input_overrides = input_overrides or {}
+        self.output_overrides = output_overrides or {}
+        self.outputs_cache = dict(outputs_cache or {})
@@
     def create(
         cls,
         module: nn.Module,
         *,
-        inputs_to_capture: Iterable[str] = [],
-        outputs_to_capture: Iterable[str] = [],
-        input_overrides: Mapping[str, PassageInputAdapter | InputArgs] = {},
-        output_overrides: Mapping[str, PassageOutputAdapter | Any] = {},
-        outputs_cache: dict[str, Any] = {},
+        inputs_to_capture: Iterable[str] = (),
+        outputs_to_capture: Iterable[str] = (),
+        input_overrides: Mapping[str, PassageInputAdapter | InputArgs] | None = None,
+        output_overrides: Mapping[str, PassageOutputAdapter | Any] | None = None,
+        outputs_cache: Mapping[str, Any] | None = None,
         capture_fake_outputs_predicate: Predicate = always_false_predicate,
         capture_cache_outputs_predicate: Predicate = always_false_predicate,
         early_exit: bool = False,
         name: Optional[str] = None,
     ) -> Passage:

Also applies to: 250-258

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/sewing_kit/passage.py` around lines 181 - 190, The
__init__ uses mutable default arguments (e.g., outputs_cache: dict[str, Any] =
{}, inputs_to_capture: Iterable[str] = [], outputs_to_capture: Iterable[str] =
[], input_overrides: Mapping[...] = {}, output_overrides: Mapping[...] = {}),
causing shared state between instances; change these defaults to None and inside
Passage.__init__ initialize them to fresh empty containers (e.g., if
outputs_cache is None: outputs_cache = {}), doing the same for
inputs_to_capture/outputs_to_capture/input_overrides/output_overrides so each
Passage instance gets its own local collections and no cross-instance leakage
occurs (also apply the same fix to the similar parameter list around the other
constructor occurrence noted).

Comment on lines +289 to +300
def __enter__(self):
assert self.active_context_manager is None
self.active_context_manager = Passage.active_passages_context(self)
self.active_context_manager.__enter__()
self.module_to_name_mapping = {id(v): k for k, v in self.named_modules()}

return self

def __exit__(self, exc_type, exc_val, exc_tb):
assert self.active_context_manager is not None
self.active_context_manager.__exit__(exc_type, exc_val, exc_tb)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

The context-manager path leaves stale state behind and remaps names differently from forward().

After the first with passage: ..., active_context_manager is never cleared, so is_active() stays true and the next enter hits the assert. Also, rebuilding module_to_name_mapping from self.named_modules() prefixes everything with module., which stops matching the override/capture keys used elsewhere.

Suggested fix
     def __enter__(self):
         assert self.active_context_manager is None
         self.active_context_manager = Passage.active_passages_context(self)
         self.active_context_manager.__enter__()
-        self.module_to_name_mapping = {id(v): k for k, v in self.named_modules()}
+        self.module_to_name_mapping = {id(v): k for k, v in self.module.named_modules()}
 
         return self
 
     def __exit__(self, exc_type, exc_val, exc_tb):
         assert self.active_context_manager is not None
-        self.active_context_manager.__exit__(exc_type, exc_val, exc_tb)
+        try:
+            self.active_context_manager.__exit__(exc_type, exc_val, exc_tb)
+        finally:
+            self.active_context_manager = None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/sewing_kit/passage.py` around lines 289 - 300, The
context-manager leaves stale state and builds names that don't match forward():
in Passage.__exit__ call the active context manager's __exit__ and then set
self.active_context_manager = None so future enters don't assert; in
Passage.__enter__ stop using self.named_modules() to build
module_to_name_mapping (which yields "module."-prefixed names) and instead
derive names that match the override/capture keys used by forward() — e.g.,
iterate self.modules() (or use named_modules() but strip the "module." prefix)
and assign module_to_name_mapping entries keyed by id(module) -> the exact
unprefixed name expected by override/capture so lookups align.

Comment on lines +356 to +418
def can_be_skipped(_self: PassageModuleWrapper, depth: int) -> bool:
passages_beyond_depth = Passage.active_passages_context[depth:]
module_name = Passage.module_name_relative_to_active_passage(_self)

results = [
(
module_name in passage.outputs_cache
and not any(
is_submodule_or_same(k, module_name) for k in passage.outputs_to_capture
)
and not any(
is_submodule_of(k, module_name)
for k, v in passage.input_overrides.items()
if v is not None
)
and not any(
is_submodule_of(k, module_name)
for k, v in passage.output_overrides.items()
if v is not None
)
)
for passage in passages_beyond_depth
]

result = all(results)

return result

# Defined as a static method to avoid potential collision with original class methods
@staticmethod
@dynamo_skip
def run_passage(_self: PassageModuleWrapper, depth: int, args, kwargs):
if depth + 1 > len(Passage.active_passages_context):
output = super(PassageModuleWrapper, _self).__call__(*args, **kwargs)
return output

module_name = Passage.module_name_relative_to_active_passage(_self)
passage = Passage.active_passages_context[depth]

has_output_override = module_name in passage.output_overrides
output_override = passage.output_overrides.get(module_name)

if has_output_override and not isinstance(output_override, PassageOutputAdapter):
output = output_override
else:
input_override = passage.input_overrides.get(module_name)
if input_override is not None:
original_input_args = InputArgs(*args, **kwargs)

if isinstance(input_override, PassageInputAdapter):
new_input_args = input_override(original_input_args, module_name, module)
else:
new_input_args = input_override

args, kwargs = new_input_args.args, new_input_args.kwargs

if (
output_override is None
and PassageModuleWrapper.can_be_skipped(_self, depth)
and (has_fake_tensor(args) or has_fake_tensor(kwargs))
):
cached_output = passage.outputs_cache[module_name]
return cached_output
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't let the cache fast-path skip requested input captures.

can_be_skipped() ignores inputs_to_capture. When a cached fake output exists, Line 418 returns before Lines 440-446 run, so input capture requests for that module or its descendants are silently dropped.

Suggested fix
                 (
                     module_name in passage.outputs_cache
+                    and not any(
+                        is_submodule_or_same(k, module_name) for k in passage.inputs_to_capture
+                    )
                     and not any(
                         is_submodule_or_same(k, module_name) for k in passage.outputs_to_capture
                     )
                     and not any(
                         is_submodule_of(k, module_name)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/sewing_kit/passage.py` around lines 356 - 418, The
cache fast-path currently skips modules even when inputs are requested because
can_be_skipped ignores passage.inputs_to_capture; update can_be_skipped (used by
PassageModuleWrapper.can_be_skipped called from run_passage) to also check
passage.inputs_to_capture and return False if any entry in inputs_to_capture
matches the module (using is_submodule_or_same for direct captures and/or
is_submodule_of for descendant captures), so that when run_passage would return
a cached_output it does not bypass required input capture logic.

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (6)
modelopt/torch/puzzletron/scoring.py (2)

61-70: ⚠️ Potential issue | 🟠 Major

get_solutions_to_validate fails for directory-valued solutions_path.

Line 64 unconditionally calls pd.read_json(...), but Puzzletron’s loader supports both a JSON file and a directory of *solution*.json. This will break the directory path flow and diverge from validation behavior.

Suggested fix
+from pathlib import Path
 import numpy as np
-import pandas as pd
@@
-from .tools.validate_puzzle_with_multi_replacements import validate_puzzle_solutions
+from .tools.validate_puzzle_with_multi_replacements import (
+    load_puzzle_solutions,
+    validate_puzzle_solutions,
+)
@@
-def find_missing_solutions(solutions_df, validation_dir):
-    all_solutions = np.arange(solutions_df.shape[0])
+def find_missing_solutions(num_solutions: int, validation_dir):
+    all_solutions = np.arange(num_solutions)
@@
 def get_solutions_to_validate(cfg: DictConfig):
     _solutions_to_validate = cfg.scoring.solutions_to_validate
     if _solutions_to_validate is None:
-        single_block_replacement_solutions = pd.read_json(cfg.scoring.solutions_path)
+        puzzle_solutions = load_puzzle_solutions(
+            Path(cfg.scoring.solutions_path),
+            cfg.scoring.sort_solutions_by,
+            cfg.scoring.bigger_is_better,
+        )
+        num_solutions = len(puzzle_solutions)
         if cfg.scoring.skip_existing_solutions:
             _solutions_to_validate = find_missing_solutions(
-                single_block_replacement_solutions, cfg.scoring.output_dir
+                num_solutions, cfg.scoring.output_dir
             )
         else:
-            _solutions_to_validate = np.arange(single_block_replacement_solutions.shape[0]).tolist()
+            _solutions_to_validate = np.arange(num_solutions).tolist()
     return _solutions_to_validate
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/scoring.py` around lines 61 - 70,
get_solutions_to_validate currently always calls pd.read_json on
cfg.scoring.solutions_path which fails when that path is a directory of solution
files; update the function to detect whether cfg.scoring.solutions_path is a
file or a directory, and if it is a directory, glob for files matching a pattern
like "*solution*.json", read each file into a DataFrame/record list and
concatenate them into single_block_replacement_solutions before proceeding; keep
the existing branch that calls find_missing_solutions(...) (referencing
single_block_replacement_solutions, cfg.scoring.output_dir,
cfg.scoring.skip_existing_solutions) when skip_existing_solutions is True, and
otherwise set _solutions_to_validate to the range of rows as currently done.

18-18: ⚠️ Potential issue | 🟠 Major

Remove module-wide mypy suppression.

Line 18 disables type checking for the whole module and can hide regressions in the scoring entrypoint. Use scoped ignores only where needed.

As per coding guidelines, "Use mypy for static type checking Python code."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/scoring.py` at line 18, Remove the module-wide "#
mypy: ignore-errors" suppression and instead apply targeted ignores only where
necessary: delete the line and run mypy on this module, then add "# type:
ignore" or "# type: ignore[code]" comments on the specific functions/expressions
that fail (e.g., the scoring entrypoint function(s) and any helper like
score_batch or compute_similarity) so that only those lines are suppressed while
the rest of the module remains type-checked.
modelopt/torch/puzzletron/sewing_kit/passage.py (3)

361-379: ⚠️ Potential issue | 🟠 Major

Don’t bypass requested input captures.

This skip predicate still ignores passage.inputs_to_capture. When Line 418 returns a cached output, the capture block at Lines 441-446 never runs, so explicitly requested input captures are silently dropped.

Suggested fix
                 (
                     module_name in passage.outputs_cache
+                    and not any(
+                        is_submodule_or_same(k, module_name) for k in passage.inputs_to_capture
+                    )
                     and not any(
                         is_submodule_or_same(k, module_name) for k in passage.outputs_to_capture
                     )
                     and not any(
                         is_submodule_of(k, module_name)

Also applies to: 413-419

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/sewing_kit/passage.py` around lines 361 - 379, The
skip predicate currently allows returning cached outputs even when inputs were
explicitly requested to be captured; update the boolean condition used where
module_name is checked against passage.outputs_cache to also verify that no
requested input captures target this module (i.e. add a check that not
any(is_submodule_or_same(k, module_name) for k in passage.inputs_to_capture)).
Apply this same additional check in both places where the predicate appears (the
block using passage.outputs_cache and the earlier similar predicate around lines
413-419) so that passages with inputs_to_capture are not bypassed.

289-294: ⚠️ Potential issue | 🟠 Major

Keep context-manager module names aligned with forward().

Line 293 rebuilds module_to_name_mapping from self.named_modules(), which prefixes entries with module.. That no longer matches the names produced from module.named_modules() in __init__/create(), so with passage: can miss capture/override keys that work in the normal forward() path.

Suggested fix
     def __enter__(self):
         assert self.active_context_manager is None
         self.active_context_manager = Passage.active_passages_context(self)
         self.active_context_manager.__enter__()
-        self.module_to_name_mapping = {id(v): k for k, v in self.named_modules()}
+        self.module_to_name_mapping = {id(v): k for k, v in self.module.named_modules()}

         return self
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/sewing_kit/passage.py` around lines 289 - 294, The
__enter__ method rebuilds module_to_name_mapping from self.named_modules(),
which introduces a "module." prefix and breaks alignment with the names used in
forward()/the mapping created in __init__/create(); fix by making __enter__
reuse the same naming logic as __init__/create__ (either reuse the existing
self.module_to_name_mapping set during construction instead of rebuilding, or
rebuild by applying the same normalization used in create/__init__ — e.g., strip
the "module." prefix or use the same helper that produced names there) so module
names in __enter__ match forward() and capture/override keys.

181-190: ⚠️ Potential issue | 🟠 Major

Make outputs_cache instance-local.

The {} default here is still shared across Passage instances, and Line 209 stores it by reference. A cached fake output from one passage can leak into the next and make can_be_skipped() reuse stale values.

Suggested fix
     def __init__(
         self,
         module: nn.Module,
         *,
-        inputs_to_capture: Iterable[str] = [],
-        outputs_to_capture: Iterable[str] = [],
-        input_overrides: Mapping[str, PassageInputAdapter | InputArgs] = {},
-        output_overrides: Mapping[str, PassageOutputAdapter | Any] = {},
-        outputs_cache: dict[str, Any] = {},
+        inputs_to_capture: Iterable[str] = (),
+        outputs_to_capture: Iterable[str] = (),
+        input_overrides: Mapping[str, PassageInputAdapter | InputArgs] | None = None,
+        output_overrides: Mapping[str, PassageOutputAdapter | Any] | None = None,
+        outputs_cache: Mapping[str, Any] | None = None,
         capture_fake_outputs_predicate: Predicate = always_false_predicate,
         capture_cache_outputs_predicate: Predicate = always_false_predicate,
         early_exit: bool = False,
         name: Optional[str] = None,
     ):
@@
-        self.input_overrides = input_overrides
-        self.output_overrides = output_overrides
-        self.outputs_cache = outputs_cache
+        self.input_overrides = input_overrides or {}
+        self.output_overrides = output_overrides or {}
+        self.outputs_cache = dict(outputs_cache or {})
@@
     def create(
         cls,
         module: nn.Module,
         *,
-        inputs_to_capture: Iterable[str] = [],
-        outputs_to_capture: Iterable[str] = [],
-        input_overrides: Mapping[str, PassageInputAdapter | InputArgs] = {},
-        output_overrides: Mapping[str, PassageOutputAdapter | Any] = {},
-        outputs_cache: dict[str, Any] = {},
+        inputs_to_capture: Iterable[str] = (),
+        outputs_to_capture: Iterable[str] = (),
+        input_overrides: Mapping[str, PassageInputAdapter | InputArgs] | None = None,
+        output_overrides: Mapping[str, PassageOutputAdapter | Any] | None = None,
+        outputs_cache: Mapping[str, Any] | None = None,
         capture_fake_outputs_predicate: Predicate = always_false_predicate,
         capture_cache_outputs_predicate: Predicate = always_false_predicate,
         early_exit: bool = False,
         name: Optional[str] = None,
     ) -> Passage:

Also applies to: 209-209, 250-258

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/sewing_kit/passage.py` around lines 181 - 190, The
outputs_cache default argument in Passage.__init__ is a mutable {} shared across
instances causing cached fake outputs to leak between Passage objects; change
the signature to accept outputs_cache: Optional[dict[str, Any]] = None and in
the constructor (where self.outputs_cache is set, referenced around the
assignment that stores it) initialize self.outputs_cache = {} if outputs_cache
is None, ensuring each Passage instance gets its own dict; apply the same
pattern to any other mutable default arguments in the same class/methods
referenced around lines 250-258 (replace {} defaults with None and initialize
per-instance).
modelopt/torch/puzzletron/tools/checkpoint_utils_hf.py (1)

237-250: ⚠️ Potential issue | 🟠 Major

Publish the safetensors index after shard writes succeed.

Line 241 publishes model.safetensors.index.json before any shard file has been committed. If save_subblocks() fails, the directory is left with an index that references missing shards. Save the shards first and write the index last.

Suggested change
-    # Write index (now without tied embedding)
-    index = {"metadata": {"format": "pt"}, "weight_map": weight_map}
-    index_path = checkpoint_dir / SAFE_WEIGHTS_INDEX_NAME
-    index_json = json_dumps(index)
-    _write_file_process_safe(index_json, index_path)
-
     # Phase 3: Save subblocks
     save_subblocks(
         state_dict,
         checkpoint_dir,
         weight_map=weight_map,
         multi_threaded=True,
         max_workers=max_workers,
     )
+
+    # Publish index only after all shard writes have succeeded.
+    index = {"metadata": {"format": "pt"}, "weight_map": weight_map}
+    index_path = checkpoint_dir / SAFE_WEIGHTS_INDEX_NAME
+    index_json = json_dumps(index)
+    _write_file_process_safe(index_json, index_path)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/tools/checkpoint_utils_hf.py` around lines 237 -
250, The code writes the safetensors index (index variable, index_path using
SAFE_WEIGHTS_INDEX_NAME via _write_file_process_safe) before saving shard files;
move the index write to after save_subblocks completes successfully so the index
only references committed shards. Specifically, call save_subblocks(state_dict,
checkpoint_dir, weight_map=weight_map, multi_threaded=True,
max_workers=max_workers) first and then build and atomically write the index
JSON with _write_file_process_safe(index_json, index_path) (ensure any existing
atomic/write-temp logic is retained) so failures in save_subblocks do not leave
a dangling/missing-shard index.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt/torch/puzzletron/replacement_library/build_replacement_library.py`:
- Around line 441-447: The code currently converts the deduplicated checkpoint
list to a set (deduped_checkpoint_dirs), which discards the sorted order;
instead, return an ordered sequence so downstream _build_subblocks_df (which
uses keep="first") sees the same checkpoint ordering. Replace the set(...)
wrapper with a list (or keep the .tolist() result) when creating
deduped_checkpoint_dirs from the DataFrame built from valid_checkpoint_dirs and
experiment_dirs, ensuring the DataFrame.sort_values(...).drop_duplicates(...)[
"checkpoint_dir"].tolist() is returned as an ordered list.
- Around line 83-99: Change build_replacement_library to accept an explicit
trust_remote_code: bool = False parameter and stop calling
descriptor.requires_trust_remote_code() to auto-enable remote code; instead call
requires_trust_remote_code() and if it returns True while the new
trust_remote_code parameter is False raise a clear error asking the caller to
opt in, then pass the explicit trust_remote_code parameter into
_build_subblocks_df, _build_block_library_from_subblocks,
_build_layer_replacements, and _build_single_sequence_replacement_solutions so
downstream functions use the caller-provided flag (references:
build_replacement_library, descriptor.requires_trust_remote_code,
_build_subblocks_df, _build_block_library_from_subblocks,
_build_layer_replacements, _build_single_sequence_replacement_solutions).

In `@modelopt/torch/puzzletron/sewing_kit/utils.py`:
- Around line 164-166: get_active() always returns self.activity_stack[-1] even
when this stack is used in reversed mode (where pushes/pops happen at index 0);
update get_active in the class to check the reversed flag (e.g., self.reversed
or whatever boolean the class uses for reversed mode) and return
self.activity_stack[0] when reversed is true, otherwise return
self.activity_stack[-1], keeping the existing is_active() guard so it still
returns Optional[T].

In `@modelopt/torch/puzzletron/tools/checkpoint_utils_hf.py`:
- Around line 307-340: The function currently builds safe_save_kwargs and may
set max_workers to 0 which causes ThreadPoolExecutor to raise when
safe_save_kwargs is empty; before creating the thread pool (and before
computing/using max_workers for multi_threaded saving), check if
safe_save_kwargs is empty and return early (or skip saving) to avoid launching
ThreadPoolExecutor with zero workers; update the logic around the multi_threaded
branch (references: safe_save_kwargs, max_workers, multi_threaded,
optimized_safe_save, ThreadPoolExecutor) so the empty-shard case is handled
gracefully and the function exits or proceeds without attempting to create the
executor.
- Around line 444-450: save_model_config currently mutates
model_config.block_configs in-place by converting dataclass entries to dicts;
instead, create a copy of the config (or at least a shallow copy of the
block_configs list), convert dataclass entries to dicts into that new list (use
dataclasses.is_dataclass and dataclasses.asdict when needed), assign the
converted list to the copied config, and call save_pretrained on the copy so the
live model_config remains unchanged.

---

Duplicate comments:
In `@modelopt/torch/puzzletron/scoring.py`:
- Around line 61-70: get_solutions_to_validate currently always calls
pd.read_json on cfg.scoring.solutions_path which fails when that path is a
directory of solution files; update the function to detect whether
cfg.scoring.solutions_path is a file or a directory, and if it is a directory,
glob for files matching a pattern like "*solution*.json", read each file into a
DataFrame/record list and concatenate them into
single_block_replacement_solutions before proceeding; keep the existing branch
that calls find_missing_solutions(...) (referencing
single_block_replacement_solutions, cfg.scoring.output_dir,
cfg.scoring.skip_existing_solutions) when skip_existing_solutions is True, and
otherwise set _solutions_to_validate to the range of rows as currently done.
- Line 18: Remove the module-wide "# mypy: ignore-errors" suppression and
instead apply targeted ignores only where necessary: delete the line and run
mypy on this module, then add "# type: ignore" or "# type: ignore[code]"
comments on the specific functions/expressions that fail (e.g., the scoring
entrypoint function(s) and any helper like score_batch or compute_similarity) so
that only those lines are suppressed while the rest of the module remains
type-checked.

In `@modelopt/torch/puzzletron/sewing_kit/passage.py`:
- Around line 361-379: The skip predicate currently allows returning cached
outputs even when inputs were explicitly requested to be captured; update the
boolean condition used where module_name is checked against
passage.outputs_cache to also verify that no requested input captures target
this module (i.e. add a check that not any(is_submodule_or_same(k, module_name)
for k in passage.inputs_to_capture)). Apply this same additional check in both
places where the predicate appears (the block using passage.outputs_cache and
the earlier similar predicate around lines 413-419) so that passages with
inputs_to_capture are not bypassed.
- Around line 289-294: The __enter__ method rebuilds module_to_name_mapping from
self.named_modules(), which introduces a "module." prefix and breaks alignment
with the names used in forward()/the mapping created in __init__/create(); fix
by making __enter__ reuse the same naming logic as __init__/create__ (either
reuse the existing self.module_to_name_mapping set during construction instead
of rebuilding, or rebuild by applying the same normalization used in
create/__init__ — e.g., strip the "module." prefix or use the same helper that
produced names there) so module names in __enter__ match forward() and
capture/override keys.
- Around line 181-190: The outputs_cache default argument in Passage.__init__ is
a mutable {} shared across instances causing cached fake outputs to leak between
Passage objects; change the signature to accept outputs_cache:
Optional[dict[str, Any]] = None and in the constructor (where self.outputs_cache
is set, referenced around the assignment that stores it) initialize
self.outputs_cache = {} if outputs_cache is None, ensuring each Passage instance
gets its own dict; apply the same pattern to any other mutable default arguments
in the same class/methods referenced around lines 250-258 (replace {} defaults
with None and initialize per-instance).

In `@modelopt/torch/puzzletron/tools/checkpoint_utils_hf.py`:
- Around line 237-250: The code writes the safetensors index (index variable,
index_path using SAFE_WEIGHTS_INDEX_NAME via _write_file_process_safe) before
saving shard files; move the index write to after save_subblocks completes
successfully so the index only references committed shards. Specifically, call
save_subblocks(state_dict, checkpoint_dir, weight_map=weight_map,
multi_threaded=True, max_workers=max_workers) first and then build and
atomically write the index JSON with _write_file_process_safe(index_json,
index_path) (ensure any existing atomic/write-temp logic is retained) so
failures in save_subblocks do not leave a dangling/missing-shard index.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: da42c004-2d60-493e-b24f-628967934064

📥 Commits

Reviewing files that changed from the base of the PR and between 547e76d and 38d9522.

📒 Files selected for processing (8)
  • examples/puzzletron/evaluation/hf_deployable_anymodel.py
  • modelopt/torch/puzzletron/mip/mip_with_multi_layer_replacements.py
  • modelopt/torch/puzzletron/replacement_library/build_replacement_library.py
  • modelopt/torch/puzzletron/scoring.py
  • modelopt/torch/puzzletron/sewing_kit/passage.py
  • modelopt/torch/puzzletron/sewing_kit/utils.py
  • modelopt/torch/puzzletron/tools/checkpoint_utils_hf.py
  • tests/gpu/torch/puzzletron/test_puzzletron.py

Comment on lines +441 to +447
deduped_checkpoint_dirs = set(
pd.DataFrame({"checkpoint_dir": valid_checkpoint_dirs, "experiment_dir": experiment_dirs})
.sort_values("checkpoint_dir")
.drop_duplicates(subset="experiment_dir", keep="last")["checkpoint_dir"]
.tolist()
)
return deduped_checkpoint_dirs
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Preserve checkpoint order when deduplicating experiments.

The DataFrame is sorted, then immediately converted to a set, which discards that order. Later _build_subblocks_df() resolves duplicate subblocks with keep="first", so identical student configs can end up pointing at different source checkpoints across runs. Return an ordered sequence here instead of a set.

Suggested change
-def _get_last_checkpoint_from_each_experiment(
-    master_puzzle_dir: Path | str, trust_remote_code: bool = False
-) -> set[Path]:
+def _get_last_checkpoint_from_each_experiment(
+    master_puzzle_dir: Path | str, trust_remote_code: bool = False
+) -> list[Path]:
@@
-    deduped_checkpoint_dirs = set(
-        pd.DataFrame({"checkpoint_dir": valid_checkpoint_dirs, "experiment_dir": experiment_dirs})
-        .sort_values("checkpoint_dir")
-        .drop_duplicates(subset="experiment_dir", keep="last")["checkpoint_dir"]
-        .tolist()
-    )
+    deduped_checkpoint_dirs = (
+        pd.DataFrame({"checkpoint_dir": valid_checkpoint_dirs, "experiment_dir": experiment_dirs})
+        .sort_values("checkpoint_dir")
+        .drop_duplicates(subset="experiment_dir", keep="last")["checkpoint_dir"]
+        .tolist()
+    )
     return deduped_checkpoint_dirs
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/replacement_library/build_replacement_library.py`
around lines 441 - 447, The code currently converts the deduplicated checkpoint
list to a set (deduped_checkpoint_dirs), which discards the sorted order;
instead, return an ordered sequence so downstream _build_subblocks_df (which
uses keep="first") sees the same checkpoint ordering. Replace the set(...)
wrapper with a list (or keep the .tolist() result) when creating
deduped_checkpoint_dirs from the DataFrame built from valid_checkpoint_dirs and
experiment_dirs, ensuring the DataFrame.sort_values(...).drop_duplicates(...)[
"checkpoint_dir"].tolist() is returned as an ordered list.

Comment on lines +164 to +166
def get_active(self) -> Optional[T]:
if self.is_active():
return self.activity_stack[-1]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Return the head element when reversed=True.

get_active() still always reads self.activity_stack[-1], but reversed=True pushes/pops at index 0. Passage.active_passages_context in modelopt/torch/puzzletron/sewing_kit/passage.py Line 179 uses that mode, so nested passages make module_name_relative_to_active_passage() at Lines 242-247 resolve against the outer passage instead of the current one.

Suggested fix
     def get_active(self) -> Optional[T]:
         if self.is_active():
-            return self.activity_stack[-1]
+            return self.activity_stack[0 if self.reversed else -1]
         return None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/sewing_kit/utils.py` around lines 164 - 166,
get_active() always returns self.activity_stack[-1] even when this stack is used
in reversed mode (where pushes/pops happen at index 0); update get_active in the
class to check the reversed flag (e.g., self.reversed or whatever boolean the
class uses for reversed mode) and return self.activity_stack[0] when reversed is
true, otherwise return self.activity_stack[-1], keeping the existing is_active()
guard so it still returns Optional[T].

Comment on lines +307 to +340
safe_save_kwargs = [
{"tensors": partial_state_dict, "filename": filename, "metadata": {"format": "pt"}}
for filename, partial_state_dict in filename_to_partial_state_dict.items()
]

# Auto-calculate optimal I/O workers: min(cpu_count, num_files)
if max_workers is None:
cpu_count = os.cpu_count() or 1
num_files = len(safe_save_kwargs)
max_workers = min(cpu_count, num_files)
mprint(
f" Auto-calculated I/O workers: min({cpu_count} CPUs, {num_files} files) = {max_workers}"
)
else:
mprint(f" Using specified I/O workers: {max_workers}")

prepare_time = time.time() - prepare_start_time
mprint(f" Step 4 - Prepare save args: {prepare_time:.2f}s ({len(safe_save_kwargs)} files)")

# Step 5: Save files with optimal worker count
save_start_time = time.time()
if multi_threaded:
mprint(f" Using multi-threaded saving with {max_workers} workers...")

def optimized_safe_save(kwargs):
try:
safe_save_file(**kwargs)
return True
except Exception as e:
mprint(f" Error saving {kwargs['filename']}: {e}")
return False

with concurrent.futures.ThreadPoolExecutor(max_workers=max_workers) as executor:
results = list(executor.map(optimized_safe_save, safe_save_kwargs))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Handle the empty-shard case before building the thread pool.

If safe_save_kwargs is empty, Line 316 computes max_workers = 0, and Line 339 then raises from ThreadPoolExecutor. This helper can legitimately see an empty state_dict/weight_map, so it should return early instead of crashing.

Suggested change
     safe_save_kwargs = [
         {"tensors": partial_state_dict, "filename": filename, "metadata": {"format": "pt"}}
         for filename, partial_state_dict in filename_to_partial_state_dict.items()
     ]
+
+    if not safe_save_kwargs:
+        mprint("  No shard files to save")
+        return
 
     # Auto-calculate optimal I/O workers: min(cpu_count, num_files)
     if max_workers is None:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/tools/checkpoint_utils_hf.py` around lines 307 -
340, The function currently builds safe_save_kwargs and may set max_workers to 0
which causes ThreadPoolExecutor to raise when safe_save_kwargs is empty; before
creating the thread pool (and before computing/using max_workers for
multi_threaded saving), check if safe_save_kwargs is empty and return early (or
skip saving) to avoid launching ThreadPoolExecutor with zero workers; update the
logic around the multi_threaded branch (references: safe_save_kwargs,
max_workers, multi_threaded, optimized_safe_save, ThreadPoolExecutor) so the
empty-shard case is handled gracefully and the function exits or proceeds
without attempting to create the executor.

Comment on lines +444 to +450
def save_model_config(model_config: PretrainedConfig, checkpoint_dir: Path | str) -> None:
if hasattr(model_config, "block_configs"):
model_config.block_configs = [
dataclasses.asdict(conf) if dataclasses.is_dataclass(conf) else conf
for conf in model_config.block_configs
]
model_config.save_pretrained(checkpoint_dir)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid mutating model.config during save.

save_model_config() rewrites model_config.block_configs in place from dataclasses to dicts. After save_checkpoint(model, ...), the live model.config no longer carries BlockConfig-like objects, which can break any later in-process use of the model. Serialize a copy instead.

Suggested change
+import copy
+
 def save_model_config(model_config: PretrainedConfig, checkpoint_dir: Path | str) -> None:
-    if hasattr(model_config, "block_configs"):
-        model_config.block_configs = [
+    config_to_save = copy.deepcopy(model_config)
+    if hasattr(config_to_save, "block_configs"):
+        config_to_save.block_configs = [
             dataclasses.asdict(conf) if dataclasses.is_dataclass(conf) else conf
-            for conf in model_config.block_configs
+            for conf in config_to_save.block_configs
         ]
-    model_config.save_pretrained(checkpoint_dir)
+    config_to_save.save_pretrained(checkpoint_dir)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/tools/checkpoint_utils_hf.py` around lines 444 -
450, save_model_config currently mutates model_config.block_configs in-place by
converting dataclass entries to dicts; instead, create a copy of the config (or
at least a shallow copy of the block_configs list), convert dataclass entries to
dicts into that new list (use dataclasses.is_dataclass and dataclasses.asdict
when needed), assign the converted list to the copied config, and call
save_pretrained on the copy so the live model_config remains unchanged.

## Summary

pruned FFN checkpoints created by `init_child_from_parent` were missing
custom `.py` files (e.g. `modeling_nemotron_h.py`,
`configuration_nemotron_h.py`). without them,
`AutoConfig.from_pretrained(..., trust_remote_code=True)` fails silently
and the checkpoint is excluded from the replacement library, reducing
MIP candidate diversity.

adds `copy_remote_code_files(source_dir, output_dir)` to
`checkpoint_utils_hf.py` that copies `*.py` files from the source
checkpoint root. called from `init_child_from_parent` after
`copy_tokenizer` when `descriptor.requires_trust_remote_code()` is true.



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **New Features**
* Added a new utility to automatically copy Python model files from
checkpoints during training initialization.

* **Tests**
* Added unit tests covering selective Python file copying, preservation
of existing files, and proper handling of missing source directories.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
@kevalmorabia97 kevalmorabia97 requested a review from a team as a code owner April 13, 2026 18:38
@kevalmorabia97 kevalmorabia97 requested a review from meenchen April 13, 2026 18:38
…#1220)

## Summary
- Unified `examples/puzzletron/mbridge_distillation/distill_hf.py`
(AnyModel-specific) into `examples/megatron_bridge/distill.py` (general)
- The single script now handles both standard HF and Puzzletron AnyModel
checkpoints.
- Added `--hf_export_path` / `--student_hf_model` args for inline HF
export after distillation.
- Merged AnyModel integration test into
`tests/examples/megatron_bridge/test_distill.py`
- test models use `vocab_size=128` (instead of default 102) for TP
divisibility including 8.
 - Moved MMLU distillation results into `megatron_bridge/README.md`
   - puzzletron README now redirects to the consolidated docs.


Limitation discovered during consolidation:
HF export via `--hf_export_path` seems to currently not work for
Puzzletron AnyModel (heterogeneous) checkpoints. Megatron-Bridge's
`export_ckpt` cannot reload heterogeneous model configs from saved
checkpoints (`heterogeneous_layers_config_encoded_json` is `None` during
`__post_init__` in `heterogeneous_config.py`). This affects both inline
`--hf_export_path` and the separate `convert_checkpoints.py export`
script.

The original `distill_hf.py` README documented this as supported, but I
think it might have been broken there too (on the side of
Megatron-Bridge). The consolidated README now documents this as a known
limitation. HF export for standard models works fine via both methods.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

## Release Notes

* **New Features**
* Added support for Puzzletron AnyModel checkpoints in distillation
pipeline.
* Introduced inline HuggingFace export capability during distillation
process.

* **Documentation**
* Updated distillation guide with clearer conversion workflows and
optional HuggingFace export instructions.
  * Added distillation benchmarks and performance recommendations.

* **Bug Fixes & Improvements**
  * Streamlined test infrastructure and workflow configuration.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: jrausch <jrausch@nvidia.com>
Signed-off-by: root <root@pool0-00848.cm.cluster>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
@kevalmorabia97 kevalmorabia97 requested a review from a team as a code owner April 13, 2026 19:16
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/megatron_bridge/distill.py`:
- Around line 56-57: The current use of contextlib.suppress(ImportError) around
the import modelopt.torch.puzzletron.plugins.mbridge masks real import errors
inside that module; replace this with the repository's lazy plugin loader by
calling import_plugin("modelopt.torch.puzzletron.plugins.mbridge") (or
equivalent import_plugin API) so that only the "module not installed" case is
handled by the plugin system and real exceptions during module initialization
surface; if import_plugin returns a boolean or raises a specific
NotInstalled/PluginNotFound exception, catch only that specific condition
instead of broadly suppressing ImportError, and remove
contextlib.suppress(ImportError) and the direct import to ensure proper
registration and visible failures.
- Around line 342-345: The config copy is using the wrong source; change the
AutoConfig.from_pretrained call (the block that calls
AutoConfig.from_pretrained(...).save_pretrained(args.hf_export_path)) to load
from args.student_hf_model instead of args.student_hf_path so the exported
config matches the templated export; keep trust_remote_code and the
save_pretrained(args.hf_export_path) call unchanged.

In `@examples/megatron_bridge/README.md`:
- Around line 161-176: Add a clear caveat under "Converting to Hugging Face
format (optional)" noting that the inline export flags --hf_export_path and
--student_hf_model in distill.py do not work for heterogeneous Puzzletron
"AnyModel" checkpoints; instruct users to use the "Separate conversion" workflow
(Megatron-Bridge conversion script) for AnyModel/Puzzletron outputs or to run
offline conversion after distillation instead of relying on the inline
--hf_export_path path.

In `@examples/puzzletron/README.md`:
- Around line 288-296: The README currently includes --trust-remote-code
unconditionally in the example vllm commands (vllm bench latency and vllm bench
throughput); change the examples to omit --trust-remote-code by default, add a
short conditional note immediately after each command explaining that
--trust-remote-code should only be added when the model requires custom Python
execution and only for trusted checkpoints, and show an optional example
invocation that includes --trust-remote-code for trusted models only.
- Around line 40-42: Update the bare-metal install example in README.md to quote
the extras spec so the shell doesn't expand it: replace the unquoted pip install
-e .[hf,puzzletron] command with a quoted extras form (e.g., pip install -e
".[hf,puzzletron]" or pip install -e '.[hf,puzzletron]') used in the
examples/puzzletron README snippet to match the NeMo-container safe form and
prevent zsh globbing.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: d90f3313-257d-4f77-9a3e-d5740008a777

📥 Commits

Reviewing files that changed from the base of the PR and between 6395b1e and d88dfcb.

📒 Files selected for processing (7)
  • .github/workflows/example_tests.yml
  • examples/megatron_bridge/README.md
  • examples/megatron_bridge/distill.py
  • examples/megatron_bridge/results/puzzletron.md
  • examples/puzzletron/README.md
  • tests/_test_utils/torch/puzzletron/utils.py
  • tests/examples/megatron_bridge/test_distill.py
✅ Files skipped from review due to trivial changes (1)
  • examples/megatron_bridge/results/puzzletron.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • .github/workflows/example_tests.yml

Comment thread examples/megatron_bridge/distill.py Outdated
Comment thread examples/megatron_bridge/distill.py
Comment thread examples/megatron_bridge/README.md
Comment thread examples/puzzletron/README.md Outdated
Comment thread examples/puzzletron/README.md
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
examples/megatron_bridge/distill.py (1)

342-345: ⚠️ Potential issue | 🟠 Major

Use student_hf_model instead of student_hf_path for config export.

The export bridge is created from args.student_hf_model (line 332-333), but the config is copied from args.student_hf_path (line 343-344). For Puzzletron/AnyModel runs, student_hf_path may point to a heterogeneous source checkpoint, causing a mismatched config.json in the exported HF checkpoint.

Suggested fix
             # Copy config.json from student_hf_path (handles both local paths and HF model IDs)
             AutoConfig.from_pretrained(
-                args.student_hf_path, trust_remote_code=args.trust_remote_code
+                args.student_hf_model, trust_remote_code=args.trust_remote_code
             ).save_pretrained(args.hf_export_path)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/megatron_bridge/distill.py` around lines 342 - 345, The config
export is using args.student_hf_path but the export bridge and model selection
use args.student_hf_model, which can cause mismatched config.json for
heterogeneous checkpoints; update the AutoConfig.from_pretrained call (the one
invoking AutoConfig.from_pretrained(...).save_pretrained(args.hf_export_path))
to load from args.student_hf_model instead of args.student_hf_path so the
exported HF checkpoint uses the same model identifier used to create the export
bridge.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@examples/megatron_bridge/distill.py`:
- Around line 342-345: The config export is using args.student_hf_path but the
export bridge and model selection use args.student_hf_model, which can cause
mismatched config.json for heterogeneous checkpoints; update the
AutoConfig.from_pretrained call (the one invoking
AutoConfig.from_pretrained(...).save_pretrained(args.hf_export_path)) to load
from args.student_hf_model instead of args.student_hf_path so the exported HF
checkpoint uses the same model identifier used to create the export bridge.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 1f6aecf3-5fc1-4fb8-84b8-8eb1eb22a9fa

📥 Commits

Reviewing files that changed from the base of the PR and between d88dfcb and 3f41819.

📒 Files selected for processing (5)
  • .github/CODEOWNERS
  • examples/llm_eval/lm_eval_hf.py
  • examples/megatron_bridge/README.md
  • examples/megatron_bridge/distill.py
  • examples/puzzletron/README.md
✅ Files skipped from review due to trivial changes (2)
  • .github/CODEOWNERS
  • examples/megatron_bridge/README.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • examples/llm_eval/lm_eval_hf.py

Copy link
Copy Markdown
Collaborator

@cjluo-nv cjluo-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a massive 24K+ line PR adding the Puzzletron heterogeneous pruning algorithm. The code quality is generally solid with good documentation, comprehensive test coverage (9 model configs tested e2e), and a clean plugin architecture. However, at ~24K lines, this is extremely large and there are several issues worth addressing.

Key Concerns:

  1. PR Size (24K+ lines): This is ~24x the recommended limit. While the PR description shows it was already reviewed in ~18 smaller PRs on the feature branch, the final merge into main is still very large. The config YAML files alone represent massive duplication.

  2. Ruff lint suppression: The entire modelopt/torch/puzzletron/* directory has all major lint categories disabled in pyproject.toml. This is a significant code quality concession that should have a timeline for remediation.

  3. Potential bug in _patched_decoder_layer_init: The patcher uses decoder_layer_classes.index(self.__class__) to find the right original __init__, but if a subclass appears that isn't in the list, this will raise ValueError instead of a descriptive error.

  4. copy_remote_codecopy_hf_ckpt_remote_code rename is a breaking change: The function was renamed and its implementation changed from hf_hub_download to snapshot_download. The old test file was deleted (test_hf_checkpoint_utils.py from tests/gpu_megatron/) and a new one added in tests/unit/. Any external code using copy_remote_code will break.

  5. snapshot_download behavior change: The new copy_hf_ckpt_remote_code uses snapshot_download with allow_patterns=["*.py"] which downloads all .py files including those in subdirectories, whereas the old implementation only copied top-level .py files. This could pull unexpected files.

  6. Massive config duplication: The YAML config files across model directories (llama, qwen2, qwen3, mistral, nemotron, etc.) are nearly identical with only model-specific fields differing. These should use shared base configs more aggressively.

  7. sewing_kit module is substantial (~1800 lines) with its own distributed communication layer. It's unclear if this duplicates functionality already available in ModelOpt's distributed utils.

Comment thread modelopt/torch/export/plugins/hf_checkpoint_utils.py
if _block_configs is None:
return orig_inits[decoder_layer_classes.index(self.__class__)](
self, config, *args, **kwargs
)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential bug: decoder_layer_classes.index(self.__class__) will raise ValueError if self.__class__ is a dynamically-generated subclass that isn't in the list. Consider using a dict lookup or adding a try/except with a descriptive error message:

try:
    idx = decoder_layer_classes.index(self.__class__)
except ValueError:
    raise TypeError(
        f"Decoder layer class {self.__class__} not in patched classes: {decoder_layer_classes}"
    )

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not important to address

Comment thread modelopt/torch/puzzletron/anymodel/puzzformer/patcher.py
Comment thread pyproject.toml
Comment thread modelopt/torch/puzzletron/entrypoint.py
Comment thread modelopt/torch/prune/importance_hooks/base_hooks.py
Comment thread modelopt/torch/prune/importance_hooks/base_hooks.py
Comment thread examples/megatron_bridge/distill.py
attn_keys = [key for key in attn_keys if key in new_state_dict.keys()]
if len(attn_keys) > 0 and all(key in keys for key in attn_keys):
for key in attn_keys:
keys_to_remove[key] = keys[key]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mutable default keys_to_remove: keys_to_remove is passed as a dict parameter that gets mutated in-place. While this works because callers always pass their own dict, the pattern is error-prone. Consider returning it as part of the function return value instead of relying on mutation.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not important to address

Comment thread modelopt/torch/puzzletron/sewing_kit/passage.py
Copy link
Copy Markdown
Collaborator

@cjluo-nv cjluo-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved to unblock based on the feedback that

None of these are blockers.
1 - more extensive review done in feature branch PRs so fine to add 24k LOC to main
2 - We will continuously keep improve code quality (removing suppressed mypy / lint exceptions added for now)
3 - not blocker
4,5 - not important
6 - follow-up to unify yamls
7 - fine

j-rausch and others added 5 commits April 15, 2026 09:29
## Summary

- lm-eval 0.4.10 has API changes that break `lm_eval_hf.py`, which
monkey-patches `HFLM.create_from_arg_obj`
- Pin `examples/puzzletron/requirements.txt` from `lm-eval==0.4.10` to
`lm-eval==0.4.8` to match `examples/llm_eval/requirements.txt`
- Add version compatibility warning to `lm_eval_hf.py` for early
detection if versions drift again



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Chores**
* Updated the lm-eval dependency version to 0.4.8 in the Puzzletron
example
* Added compatibility version checking to the LM Eval example with
appropriate notifications

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: jrausch <jrausch@nvidia.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
…tical params (#1258)

### Summary
- on hybrid models (e.g. Nemotron-H), `calculate_subblock_params` builds
a 1-layer model to count per-layer params by setting
`num_hidden_layers=1`
- it left `hybrid_override_pattern` at full length, so the 1-layer model
always built layer 0 (`pattern[0]` = Mamba). every FFN variant reported
the same Mamba param count regardless of `intermediate_size`
- this made MIP unable to differentiate FFN sizes

### Fix
- truncate `hybrid_override_pattern` to the single character matching
the subblock being measured before instantiating the 1-layer model
- per iteration/layer, we deep copy the model config and create a
per-layer model config with fixed pattern
- activates only when `hybrid_override_pattern` is present; non-hybrid
models (Llama, Qwen, etc.) are unaffected

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Bug Fixes**
* Enhanced accuracy of parameter counting for hybrid FFN model
configurations.

* **Tests**
  * Added unit tests for hybrid pattern truncation functionality.
* Added GPU validation tests for Nemotron-H model parameter
calculations.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: jrausch <jrausch@nvidia.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
@kevalmorabia97 kevalmorabia97 merged commit 361f7e3 into main Apr 15, 2026
45 checks passed
@kevalmorabia97 kevalmorabia97 deleted the feature/puzzletron branch April 15, 2026 19:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants