Merge puzzletron compression algorithm#1121
Conversation
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
…iv.org/abs/2411.19146 (#464) Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com> Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com> Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com> Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com> Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com> Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com> Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com> Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com> Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
## What does this PR do? Add decilm modelling code --------- Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
## What does this PR do? Compress tutorial (PoC) + compress cli app. --------- Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com> Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com> Signed-off-by: Liana Mikaelyan <lmikaelyan@nvidia.com> Signed-off-by: Liana Mikaelyan <45925959+LianaMikael@users.noreply.github.com> Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com> Co-authored-by: Liana Mikaelyan <lmikaelyan@nvidia.com> Co-authored-by: Liana Mikaelyan <45925959+LianaMikael@users.noreply.github.com>
#545) ## What does this PR do? Add llama converter (no dependency on internal Nvidia code) - part 1/2 - change top-level dependencies in convert_llama3_to_decilm.py from puzzle_tools.... to modelopt..... - added modelopt.torch._compress.tools module - remove tokenization_mistral.py - not used scope of 2/2 part (will come once part 1/2 is merged): - change all deeper dependencies from from puzzle_tools.... to modelopt.... - test_convert_llama3_config_to_decilm_config.py should run without any internal nvidia dependencies --------- Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
…ia code) (#552) ## What does this PR do? llama converter is self-contained now (no dependency on internal nvidia code) --------- Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
## What does this PR do? Make score pruning activations self-contained (no dependency on internal Nvidia code) - 1/6 - add integration test for attention pruning (pre-step to make adding activation scoring safer) --------- Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
## What does this PR do? - Add score_pruning_activations.py Notes: - validate_model.py still depends on Nvidia internal code (will be changed in the subsequent MR) - sharded_checkpoint_utils.py - for now it needs to use DeciLM from internal Nvidia code, to be changed in the next MR --------- Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com> Signed-off-by: Daniel Korzekwa <daniel.korzekwa@gmail.com> Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
## What does this PR do? Add activation hooks used for pruning --------- Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
…ng is self-contained now (#584) ## What does this PR do? Add sewing kit and utilities used for pruning scoring - pruning scoring is self-contained now - no dependency on internal Nvidia code. --------- Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com> Signed-off-by: Daniel Korzekwa <daniel.korzekwa@gmail.com> Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
## What does this PR do?
- Add L2NormHook and use it in megatron.py
- Using L2NormHook removes code duplication between
_DynamicSelfAttention and _DynamicMLP
This is the first step towards reusing activation scores logic across
Minitron and Puzzle. Next steps:
- complete redesign of megatron.py - move other activation hooks logic
to hooks.py
- then combined those hooks.py with a similar hooks.py functoriality in
puzzle
(modelopt/torch/_compress/activation_scoring/activation_hooks/hooks.py)
Questions:
- why in the code before and after this redesign we store temp variables
in two ways _register_temp_attribute and self.hook_handle)?
```
self._register_temp_attribute("_activation_hook", activation_hook)
# TODO: confusion: why hook_handle is removed manually in export() and not using _register_temp_attribute?
self.hook_handle = self.linear_fc2.register_forward_hook(activation_hook)
```
---------
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <daniel.korzekwa@gmail.com>
Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
## What does this PR do? Add pruning checkpoints for the compress algorithm. --------- Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
## What does this PR do? Add build replacement library to the compress algorithm. --------- Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
## What does this PR do? Add subblock stats to the compress algorithm. --------- Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
## What does this PR do? Add 1-block scoring to the compress algorithm. --------- Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com> Signed-off-by: Daniel Korzekwa <daniel.korzekwa@gmail.com> Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
…utionHook (#610) ## What does this PR do? Add checkpoint save/load to ForwardHook + add IterativeChannelContributionHook. --------- Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
## What does this PR do? Add MIP step to the compress algorithm. --------- Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com> Signed-off-by: Daniel Korzekwa <daniel.korzekwa@gmail.com> Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
## What does this PR do? **Type of change:** Improvement <!-- Use one of the following: Bug fix, new feature, new example, new tests, documentation. --> - Fix tests for 2-gpu: Some places hard-coded cpu device for distributed communications which was causing this issue - Remove unused constrain_search_space.py - Remove `is_multi_layer_puzzle: False` case - Remove `use_greedy_search: False` case - Remove knapsack mip case - Remove unused `num_solutions` and `minimal_diversity` flags ## Testing <!-- Mention how have you tested your change if applicable. --> - GH CICD test passing - Tested on 2-gpu setup locally as well <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit # Release Notes * **Refactor** * Optimized solver implementation with improved library integration. * Simplified model compression configuration by removing deprecated search options. * Consolidated optimization paths for streamlined processing. * **Chores** * Updated dependencies for improved compatibility. * **Documentation** * Clarified Model-Optimizer installation instructions in examples. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
…hooks analysis (#670) ## What does this PR do? Fix a bug in IterativeChannelContributionHook + tools for activation hooks analysis --------- Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com> Signed-off-by: Daniel Korzekwa <daniel.korzekwa@gmail.com> Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
…unctions (#667) ## What does this PR do? - Remove `runtime.py` and directly use `modelopt.torch.utils.distributed` functions - Read model_dtype and autocast_dtype from validate_model_defaults.yaml instead of runtime object - Remove more unused functions - Remove unnecessary parse_args for intermediate steps, and improve docstrings ## Testing <!-- Mention how have you tested your change if applicable. --> - No change in functionality - Existing tests passing --------- Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
2669a26 to
977d60a
Compare
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 4
Note
Due to the large number of review comments, Critical severity comments were prioritized as inline comments.
♻️ Duplicate comments (5)
modelopt/torch/puzzletron/mip/sweep.py (1)
255-268:⚠️ Potential issue | 🟠 MajorIsolate each sweep iteration from shared Hydra state.
This loop still mutates the shared
hydra_cfgin place before handing it tolaunch_mip_and_realize_model(). Any downstream mutation leaks into later compression rates, so one sweep point can contaminate the next. Clone the config inside the loop before patchingtarget_memory.🐛 Proposed fix
- hydra_cfg.mip.human_constraints.target_memory = target_memory_mib - - # Run MIP and realize models (reuse existing distributed logic!) - solution_paths = mip_and_realize_models.launch_mip_and_realize_model(hydra_cfg) + iter_cfg = OmegaConf.create(OmegaConf.to_container(hydra_cfg, resolve=False)) + iter_cfg.mip.human_constraints.target_memory = target_memory_mib + + # Run MIP and realize models (reuse existing distributed logic!) + solution_paths = mip_and_realize_models.launch_mip_and_realize_model(iter_cfg)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/mip/sweep.py` around lines 255 - 268, The loop is mutating the shared hydra_cfg in-place (setting hydra_cfg.mip.human_constraints.target_memory) before calling mip_and_realize_models.launch_mip_and_realize_model(), which lets downstream runs see previous mutations; fix it by making a clone/copy of hydra_cfg at the top of each iteration (e.g., deep copy) and apply the target_memory change to that cloned config (not the original) so that launch_mip_and_realize_model(hydra_cfg_clone) receives an isolated config per compression_rate; update references to hydra_cfg.mip.human_constraints.target_memory to use the cloned config instead.modelopt/torch/puzzletron/mip/run_puzzle.py (1)
411-424:⚠️ Potential issue | 🟡 MinorThese validation guards still miss “present but
None” arguments.
parse_args()always createssubblock_stats_args,mip_constraints, andhuman_constraintswithNonedefaults, sohasattr()is always true here. Missing config skips the friendly CLI error and fails later in_load_all_*instead.🐛 Proposed fix
- if not hasattr(args, "subblock_stats_args") and "subblock_stats_args" not in puzzle_profile: + if getattr(args, "subblock_stats_args", None) is None and "subblock_stats_args" not in puzzle_profile: mprint( "error: Must specify `subblock_stats_args` in either puzzle_profile or as a commandline arg." ) sys.exit(1) @@ - not hasattr(args, "mip_constraints") - and not hasattr(args, "human_constraints") + getattr(args, "mip_constraints", None) is None + and getattr(args, "human_constraints", None) is None and "mip_constraints" not in puzzle_profile and "human_constraints" not in puzzle_profile ):🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/mip/run_puzzle.py` around lines 411 - 424, The guards use hasattr(args, "...") which is always true because parse_args() sets defaults, so change the checks for subblock_stats_args, mip_constraints, and human_constraints to treat None as missing: when verifying before calling the loaders (the block around args and puzzle_profile checks in run_puzzle.py), test both that the key is present in puzzle_profile OR that getattr(args, "<name>") is not None (e.g., getattr(args, "subblock_stats_args") is not None), and similarly for mip_constraints and human_constraints, so the CLI-friendly sys.exit path runs when values are missing or explicitly None before reaching _load_all_*.modelopt/torch/puzzletron/pruning/pruning_utils.py (1)
197-209:⚠️ Potential issue | 🟠 MajorKey the activation-log cache by directory.
Lines 197-209 still keep a single process-global
ACTIVATIONS_LOG. Beyond the earlier race concern, the firstactivations_log_dirwins for the whole process, so a second model or test run silently reuses the wrong ranking data.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/pruning/pruning_utils.py` around lines 197 - 209, The global ACTIVATIONS_LOG is currently a single flat dict leading to cross-run pollution; change the cache to be keyed by activations_log_dir inside _cache_activations_log so each directory stores its own sub-dict (e.g. ACTIVATIONS_LOG[activations_log_dir] = {...}) and update the load-check to only populate when that key is missing, using activations_log_dir (from mlp_init_config) as the unique cache key; update any code that reads ACTIVATIONS_LOG to first index by directory (or provide a getter that returns ACTIVATIONS_LOG[activations_log_dir]) so multiple models/tests do not share the same activation entries.modelopt/torch/puzzletron/anymodel/models/nemotron_h_v2/nemotron_h_v2_model_descriptor.py (1)
55-58:⚠️ Potential issue | 🟠 MajorThe Nemotron fallback still imports the whole shared remote-code cache.
Lines 55-58 walk
transformers_modules.__path__wholesale when the class isn't already loaded. That reintroduces the earlier issue: one Nemotron lookup can execute top-level code from unrelated cachedtrust_remote_coderepos instead of only the selected checkpoint package.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/anymodel/models/nemotron_h_v2/nemotron_h_v2_model_descriptor.py` around lines 55 - 58, The fallback currently walks the entire transformers_modules.__path__ via pkgutil.walk_packages and importlib.import_module, which can execute top-level code from unrelated trust_remote_code caches; change the logic to only enumerate/import submodules within the specific checkpoint package instead of the whole transformers_modules namespace: use the selected checkpoint package's path (or its prefix) when calling pkgutil.iter_modules/walk_packages and importlib.import_module (e.g., derive a single package path for the checkpoint and pass that to pkgutil or importlib) and then call inspect.getmembers on those module objects; avoid iterating over transformers_modules.__path__ wholesale to prevent importing unrelated cached repos.modelopt/torch/puzzletron/tools/checkpoint_utils.py (1)
126-132:⚠️ Potential issue | 🟡 Minor
init_empty_module()still races on the global default dtype.Lines 126-132 fixed exception safety, but concurrent calls can still interleave
torch.set_default_dtype()and instantiate modules under the wrong dtype. This still needs the lock requested in the earlier review.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/tools/checkpoint_utils.py` around lines 126 - 132, The code still races on the global torch default dtype; add a module-level threading.Lock (e.g., _default_dtype_lock = threading.Lock()) and acquire it around the dtype change and module instantiation to serialize access: inside init_empty_module, move torch.get_default_dtype(), torch.set_default_dtype(dtype) and the call to skip_init(module_cls, ...) into a with _default_dtype_lock: block, restore the original dtype in a finally inside that locked region so no other thread can interleave torch.set_default_dtype() and skip_init; keep using skip_init and preserve the existing exception-safety behavior.
🟡 Minor comments (6)
modelopt/torch/puzzletron/puzzletron_nas_plugin.py-108-108 (1)
108-108:⚠️ Potential issue | 🟡 MinorDocstring typo in API reference.
Line 108 says
mnt.search(); should bemtn.search().🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/puzzletron_nas_plugin.py` at line 108, Fix the docstring typo in puzzletron_nas_plugin.py by replacing the incorrect reference "mnt.search()" with the correct API call "mtn.search()" in the PuzzletronNASPlugin (or the module-level docstring) so the documentation correctly points to the mtn.search() method used for NAS searches; update any other occurrences in the same file to ensure consistency.modelopt/torch/puzzletron/puzzletron_nas_plugin.py-25-27 (1)
25-27:⚠️ Potential issue | 🟡 MinorMove
hydraimport inside functions that use it to avoid import-time failures when puzzletron extras are not installed.The bare
import hydraat line 25 will causeModuleNotFoundErrorwhen the puzzletron subpackage is imported without the[puzzletron]optional extra installed. Sincehydrais only used inside the conversion and search execution paths (viahydra.utils.instantiate()at lines 126 and 227), move the import statement into those functions using local imports. This follows the codebase's established pattern for optional dependencies and preserves the optionality of the puzzletron feature.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/puzzletron_nas_plugin.py` around lines 25 - 27, Remove the top-level "import hydra" and instead add a local "import hydra" inside each function that calls hydra.utils.instantiate (the conversion path function that calls hydra.utils.instantiate and the search execution function that calls hydra.utils.instantiate); this follows the optional-dependency pattern so importing the module won't raise ModuleNotFoundError when puzzletron extras aren't installed—locate the hydra.utils.instantiate calls and insert a local import hydra at the start of those functions and delete the module-level import.modelopt/torch/puzzletron/sewing_kit/utils.py-164-167 (1)
164-167:⚠️ Potential issue | 🟡 MinorRespect
reversed=Trueinget_active().When
reversed=True, the active entry is inserted at index0, but Line 166 always returns the last element. That reports the wrong context for reversed stacks. Return index0in that mode.Suggested fix
def get_active(self) -> Optional[T]: if self.is_active(): - return self.activity_stack[-1] + return self.activity_stack[0] if self.reversed else self.activity_stack[-1] return None🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/sewing_kit/utils.py` around lines 164 - 167, The get_active() method always returns activity_stack[-1], which is wrong when the stack was created with reversed=True (entries are pushed at index 0); update get_active() to check the stack's reversed mode and return activity_stack[0] when reversed is True and activity_stack[-1] otherwise (still returning None when is_active() is False); reference the get_active method and the activity_stack and reversed flag to locate and adjust the logic accordingly.modelopt/torch/puzzletron/anymodel/converter/convert_any_model.py-54-56 (1)
54-56:⚠️ Potential issue | 🟡 MinorValidate
input_direarly to fail fast with a clear error.If
input_diris wrong, failure will happen deeper in conversion with less actionable errors.Proposed fix
input_dir = Path(input_dir) output_dir = Path(output_dir) + if not input_dir.is_dir(): + raise FileNotFoundError(f"Input checkpoint directory does not exist: {input_dir}") output_dir.mkdir(parents=True, exist_ok=True)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/anymodel/converter/convert_any_model.py` around lines 54 - 56, Validate the provided input_dir immediately after converting it to a Path: after input_dir = Path(input_dir) check input_dir.exists() and input_dir.is_dir() and raise a clear exception (e.g., FileNotFoundError or ValueError) with a helpful message including the provided path so the caller fails fast; update the code in convert_any_model.py (look for the input_dir variable in that module) to perform this early validation before creating or using output_dir or proceeding with conversion.modelopt/torch/puzzletron/block_config.py-276-282 (1)
276-282:⚠️ Potential issue | 🟡 MinorHandle empty
parallel_blocksbefore indexing it.If a serialized config contains
parallel_blocks: [], Line 278 raisesIndexErrorduring deserialization. An empty list should just round-trip as “no parallel blocks”.🩹 Proposed fix
def __post_init__(self): super().__post_init__() - if (self.parallel_blocks is not None) and isinstance(self.parallel_blocks[0], dict): + if self.parallel_blocks and isinstance(self.parallel_blocks[0], dict): initialized_block_configs = [ BlockConfig(**block_config) for block_config in self.parallel_blocks ] self._force_setattr("parallel_blocks", initialized_block_configs)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/block_config.py` around lines 276 - 282, The __post_init__ in BlockConfig currently indexes parallel_blocks[0] and will raise IndexError for an empty list; update the condition to first ensure parallel_blocks is truthy (non-empty) before checking element types (e.g., use "if self.parallel_blocks and isinstance(self.parallel_blocks[0], dict)") so empty lists are left as-is and only lists with dict items are converted to BlockConfig instances via BlockConfig(...) and set via _force_setattr("parallel_blocks", ...); reference __post_init__, parallel_blocks, BlockConfig, and _force_setattr when making the change.modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py-321-374 (1)
321-374:⚠️ Potential issue | 🟡 MinorDeduplicate scenarios created during the current run as well.
Lines 321-374 only guard against
"args"blocks that already exist on disk, and Lines 330-332 always append the teacher hidden size. If the config already contains that size—or any duplicate batch/hidden-size combo—the same scenario is emitted twice in one invocation.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py` around lines 321 - 374, The loop only checks existing on-disk args via subblock_stats_args but can still emit duplicates within the same run (especially after appending teacher hidden size to model_hidden_sizes); fix by tracking seen scenarios for this invocation: create a runtime set (e.g., curr_run_args) of immutabledict args and initialize it from subblock_stats_args, then for each generated curr_subblock_stats compute key = immutabledict(curr_subblock_stats["args"]) and skip/raise if key in curr_run_args; after appending curr_subblock_stats also add key to curr_run_args (and/or update subblock_stats_args) to prevent duplicates during the same run; also ensure model_hidden_sizes is deduplicated before iteration (e.g., build a unique list or set from model_hidden_sizes + [lm_config.hidden_size]) so the teacher hidden size doesn’t cause duplicate iterations.
🧹 Nitpick comments (6)
modelopt/torch/puzzletron/plugins/mbridge/qwen3.py (1)
26-26: ExportPuzzletronQwen3AnyModelBridgein__all__for consistency.Registration works today, but an empty
__all__makes this module’s public surface inconsistent withllama.pyand less predictable for star-import consumers.♻️ Proposed change
-__all__ = [] +__all__ = ["PuzzletronQwen3AnyModelBridge"]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/plugins/mbridge/qwen3.py` at line 26, The module currently exports an empty __all__, which hides PuzzletronQwen3AnyModelBridge from star-imports; update the module-level __all__ to include the class name "PuzzletronQwen3AnyModelBridge" so the public API is consistent with other bridge modules (e.g., llama.py) and predictable for consumers using from module import *; locate the class definition PuzzletronQwen3AnyModelBridge and add its exact identifier to the __all__ list.modelopt/torch/puzzletron/puzzletron_nas_plugin.py (1)
133-133: Hardcoded output subdir should be config-driven.Line 133 hardcodes
"ckpts/teacher"(and already has TODO). Promote it toPuzzletronConfigto avoid path coupling across workflows.I can draft the config field and wiring patch if you want.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/puzzletron_nas_plugin.py` at line 133, Replace the hardcoded hf_ckpt_teacher_dir = "ckpts/teacher" by adding a new config field on PuzzletronConfig (e.g., hf_ckpt_teacher_subdir or teacher_checkpoint_subdir) and use that field where hf_ckpt_teacher_dir is referenced in puzzletron_nas_plugin.py; update the plugin to read the value from the PuzzletronConfig instance (falling back to the previous default "ckpts/teacher" if unset) so all paths are configurable and not hardcoded.modelopt/torch/puzzletron/tools/post_init_sparse.py (1)
29-30: Make the base strategy fail fast instead of deferring broken defaults to runtime.
do_sparsity()callscalculate_masks(...)andfilter_function(name, modules), but the base class currently provides a docstring-onlycalculate_masksand a zero-argfilter_function. That leavesSparsityMethodinstantiable but unusable.♻️ Minimal fail-fast change
class SparsityMethod: def calculate_masks(self, state_dict: dict[str, torch.Tensor]) -> dict[str, torch.Tensor]: - """Gets a model state_dict, returns a state_dict-like mask_dict with masks""" + raise NotImplementedError - def filter_function(self): - pass + def filter_function(self, name, modules_to_sparsify_in_block): + raise NotImplementedErrorAlso applies to: 54-55, 64-78
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/tools/post_init_sparse.py` around lines 29 - 30, The base class SparsityMethod is currently instantiable but unusable; update SparsityMethod so calculate_masks(self, state_dict: dict[str, torch.Tensor]) raises NotImplementedError with a clear message instead of a docstring, and change filter_function to accept the expected signature filter_function(self, name: str, modules: Sequence[nn.Module]) (or similar used by do_sparsity) and raise NotImplementedError as well; ensure do_sparsity still calls calculate_masks(...) and filter_function(name, modules) but will now fail fast on the base class rather than deferring broken defaults to runtime.modelopt/torch/puzzletron/tools/logger.py (1)
21-21: Remove the unused launcher side-effect import.The logger already reads
LOCAL_RANK,RANK, andWORLD_SIZEdirectly, so Line 21 only adds a deprecated Torch dependency to this module's import path. If a supported runtime dropstorch.distributed.launch, importing the logger will fail before any logging code runs.Suggested fix
-import torch.distributed.launch # noqa: F401🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/tools/logger.py` at line 21, Remove the unused side-effect import "import torch.distributed.launch" from logger.py because the logger already reads environment variables (LOCAL_RANK, RANK, WORLD_SIZE) directly; locate the import statement in logger.py and delete it so the module no longer depends on the deprecated torch.distributed.launch runtime, keeping any existing references to LOCAL_RANK/RANK/WORLD_SIZE intact.examples/puzzletron/requirements.txt (1)
2-3: Pin the new example deps to tested, approved versions.
math-verifyandrayare open-ended here. That makes the example environment drift over time, andrayhas had critical advisories on some released versions. Please pin the versions/ranges you actually test, and verify whethermath-verifyneeds the extra codeowner approval required for non-permissive licenses.As per coding guidelines, "Any addition of new PIP dependencies in pyproject.toml or requirements.txt that are not permissive licenses ... must be reviewed and approved by
@NVIDIA/modelopt-setup-codeownerswith an explicit justification in the PR description."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/puzzletron/requirements.txt` around lines 2 - 3, The requirements.txt entries for the puzzletron example use open-ended package specs ("math-verify" and "ray"); update examples/puzzletron/requirements.txt to pin both packages to the specific versions or tight version ranges you actually test (e.g., ray==X.Y.Z or ray>=A.B,<C.D) to prevent drift, and confirm whether math-verify's license is non-permissive—if so, add the required approval from `@NVIDIA/modelopt-setup-codeowners` and an explicit justification in the PR description before merging.modelopt/torch/puzzletron/anymodel/converter/convert_any_model.py (1)
15-15: Avoid file-wide mypy suppression.Line 15 disables all type checking for this module. Please prefer targeted ignores on specific lines/issues instead of blanket suppression.
As per coding guidelines: "Use mypy for static type checking Python code".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/anymodel/converter/convert_any_model.py` at line 15, Remove the file-wide "# mypy: ignore-errors" at the top of convert_any_model.py and replace it with targeted type fixes or inline ignores for the specific symbols that fail type checking; identify the offending functions/classes (e.g., any public functions or classes in convert_any_model.py such as convert_any_model, AnyModelConverter, or helper functions) by running mypy on this module, fix obvious typing issues (add annotations, correct return types) and where necessary use precise inline ignores like "# type: ignore[<error-code>]" on the specific lines rather than suppressing checks for the whole file.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: be30b5b3-7077-4f8b-b84f-9646c31871be
📒 Files selected for processing (110)
.github/CODEOWNERS.github/workflows/_example_tests_runner.yml.github/workflows/example_tests.yml.github/workflows/gpu_tests.yml.pre-commit-config.yamlCHANGELOG.rstdocs/source/conf.pyexamples/llm_eval/README.mdexamples/llm_eval/lm_eval_hf.pyexamples/puzzletron/README.mdexamples/puzzletron/evaluation/hf_deployable_anymodel.pyexamples/puzzletron/main.pyexamples/puzzletron/mbridge_distillation/README.mdexamples/puzzletron/mbridge_distillation/distill_hf.pyexamples/puzzletron/requirements.txtmodelopt/torch/prune/importance_hooks/base_hooks.pymodelopt/torch/prune/importance_hooks/compare_module_outputs.pymodelopt/torch/prune/importance_hooks/expert_removal_hooks.pymodelopt/torch/puzzletron/__init__.pymodelopt/torch/puzzletron/activation_scoring/__init__.pymodelopt/torch/puzzletron/activation_scoring/activation_hooks/__init__.pymodelopt/torch/puzzletron/activation_scoring/activation_hooks/utils.pymodelopt/torch/puzzletron/activation_scoring/score_pruning_activations.pymodelopt/torch/puzzletron/anymodel/README.mdmodelopt/torch/puzzletron/anymodel/__init__.pymodelopt/torch/puzzletron/anymodel/converter/__init__.pymodelopt/torch/puzzletron/anymodel/converter/base.pymodelopt/torch/puzzletron/anymodel/converter/convert_any_model.pymodelopt/torch/puzzletron/anymodel/converter/converter_factory.pymodelopt/torch/puzzletron/anymodel/model_descriptor/__init__.pymodelopt/torch/puzzletron/anymodel/model_descriptor/base.pymodelopt/torch/puzzletron/anymodel/model_descriptor/model_descriptor_factory.pymodelopt/torch/puzzletron/anymodel/models/__init__.pymodelopt/torch/puzzletron/anymodel/models/gpt_oss/__init__.pymodelopt/torch/puzzletron/anymodel/models/gpt_oss/gpt_oss_converter.pymodelopt/torch/puzzletron/anymodel/models/gpt_oss/gpt_oss_model_descriptor.pymodelopt/torch/puzzletron/anymodel/models/gpt_oss/gpt_oss_pruned_to_mxfp4.pymodelopt/torch/puzzletron/anymodel/models/llama/__init__.pymodelopt/torch/puzzletron/anymodel/models/llama/llama_converter.pymodelopt/torch/puzzletron/anymodel/models/llama/llama_model_descriptor.pymodelopt/torch/puzzletron/anymodel/models/mistral_small/__init__.pymodelopt/torch/puzzletron/anymodel/models/mistral_small/mistral_small_converter.pymodelopt/torch/puzzletron/anymodel/models/mistral_small/mistral_small_model_descriptor.pymodelopt/torch/puzzletron/anymodel/models/nemotron_h/__init__.pymodelopt/torch/puzzletron/anymodel/models/nemotron_h/nemotron_h_converter.pymodelopt/torch/puzzletron/anymodel/models/nemotron_h/nemotron_h_model_descriptor.pymodelopt/torch/puzzletron/anymodel/models/nemotron_h_v2/__init__.pymodelopt/torch/puzzletron/anymodel/models/nemotron_h_v2/nemotron_h_v2_converter.pymodelopt/torch/puzzletron/anymodel/models/nemotron_h_v2/nemotron_h_v2_model_descriptor.pymodelopt/torch/puzzletron/anymodel/models/qwen2/__init__.pymodelopt/torch/puzzletron/anymodel/models/qwen2/qwen2_converter.pymodelopt/torch/puzzletron/anymodel/models/qwen2/qwen2_model_descriptor.pymodelopt/torch/puzzletron/anymodel/models/qwen3/__init__.pymodelopt/torch/puzzletron/anymodel/models/qwen3/qwen3_converter.pymodelopt/torch/puzzletron/anymodel/models/qwen3/qwen3_model_descriptor.pymodelopt/torch/puzzletron/anymodel/models/qwen3_vl/__init__.pymodelopt/torch/puzzletron/anymodel/models/qwen3_vl/qwen3_vl_converter.pymodelopt/torch/puzzletron/anymodel/models/qwen3_vl/qwen3_vl_model_descriptor.pymodelopt/torch/puzzletron/anymodel/puzzformer/__init__.pymodelopt/torch/puzzletron/anymodel/puzzformer/no_op.pymodelopt/torch/puzzletron/anymodel/puzzformer/patcher.pymodelopt/torch/puzzletron/block_config.pymodelopt/torch/puzzletron/build_library_and_stats.pymodelopt/torch/puzzletron/dataset/__init__.pymodelopt/torch/puzzletron/dataset/prepare_dataset.pymodelopt/torch/puzzletron/entrypoint.pymodelopt/torch/puzzletron/mip/__init__.pymodelopt/torch/puzzletron/mip/mip_and_realize_models.pymodelopt/torch/puzzletron/mip/mip_with_multi_layer_replacements.pymodelopt/torch/puzzletron/mip/run_puzzle.pymodelopt/torch/puzzletron/mip/sweep.pymodelopt/torch/puzzletron/mip/utils.pymodelopt/torch/puzzletron/plugins/__init__.pymodelopt/torch/puzzletron/plugins/mbridge/__init__.pymodelopt/torch/puzzletron/plugins/mbridge/base.pymodelopt/torch/puzzletron/plugins/mbridge/llama.pymodelopt/torch/puzzletron/plugins/mbridge/qwen3.pymodelopt/torch/puzzletron/pruning/__init__.pymodelopt/torch/puzzletron/pruning/expert_removal_pruning_mixin.pymodelopt/torch/puzzletron/pruning/ffn_intermediate_pruning_mixin.pymodelopt/torch/puzzletron/pruning/kv_heads_pruning_mixin.pymodelopt/torch/puzzletron/pruning/pruning_ckpts.pymodelopt/torch/puzzletron/pruning/pruning_mixin.pymodelopt/torch/puzzletron/pruning/pruning_utils.pymodelopt/torch/puzzletron/puzzletron_nas_plugin.pymodelopt/torch/puzzletron/replacement_library/__init__.pymodelopt/torch/puzzletron/replacement_library/build_replacement_library.pymodelopt/torch/puzzletron/replacement_library/library.pymodelopt/torch/puzzletron/replacement_library/replacement_utils.pymodelopt/torch/puzzletron/scoring/__init__.pymodelopt/torch/puzzletron/scoring/score.pymodelopt/torch/puzzletron/sewing_kit/__init__.pymodelopt/torch/puzzletron/sewing_kit/core.pymodelopt/torch/puzzletron/sewing_kit/passage/__init__.pymodelopt/torch/puzzletron/sewing_kit/passage/core.pymodelopt/torch/puzzletron/sewing_kit/utils.pymodelopt/torch/puzzletron/subblock_stats/__init__.pymodelopt/torch/puzzletron/subblock_stats/calc_subblock_params_and_memory.pymodelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.pymodelopt/torch/puzzletron/tools/__init__.pymodelopt/torch/puzzletron/tools/bypassed_training/__init__.pymodelopt/torch/puzzletron/tools/bypassed_training/child_init.pymodelopt/torch/puzzletron/tools/bypassed_training/init_child_from_parent.pymodelopt/torch/puzzletron/tools/checkpoint_utils.pymodelopt/torch/puzzletron/tools/checkpoint_utils_hf.pymodelopt/torch/puzzletron/tools/common.pymodelopt/torch/puzzletron/tools/hydra_utils.pymodelopt/torch/puzzletron/tools/kd_model.pymodelopt/torch/puzzletron/tools/logger.pymodelopt/torch/puzzletron/tools/post_init_sparse.py
✅ Files skipped from review due to trivial changes (27)
- examples/llm_eval/README.md
- .github/CODEOWNERS
- CHANGELOG.rst
- modelopt/torch/puzzletron/anymodel/models/nemotron_h_v2/init.py
- modelopt/torch/puzzletron/activation_scoring/init.py
- modelopt/torch/puzzletron/anymodel/models/qwen2/init.py
- modelopt/torch/puzzletron/mip/init.py
- modelopt/torch/puzzletron/anymodel/model_descriptor/init.py
- modelopt/torch/puzzletron/sewing_kit/passage/init.py
- modelopt/torch/puzzletron/anymodel/models/llama/init.py
- modelopt/torch/puzzletron/anymodel/models/nemotron_h/init.py
- modelopt/torch/puzzletron/scoring/init.py
- modelopt/torch/puzzletron/anymodel/models/qwen3/init.py
- modelopt/torch/puzzletron/anymodel/models/gpt_oss/init.py
- modelopt/torch/puzzletron/anymodel/models/qwen3_vl/init.py
- examples/puzzletron/mbridge_distillation/README.md
- modelopt/torch/puzzletron/anymodel/models/mistral_small/mistral_small_converter.py
- modelopt/torch/puzzletron/anymodel/models/llama/llama_converter.py
- modelopt/torch/puzzletron/mip/utils.py
- modelopt/torch/puzzletron/tools/kd_model.py
- modelopt/torch/puzzletron/anymodel/models/qwen2/qwen2_converter.py
- modelopt/torch/puzzletron/anymodel/models/gpt_oss/gpt_oss_converter.py
- modelopt/torch/puzzletron/tools/hydra_utils.py
- modelopt/torch/puzzletron/anymodel/README.md
- modelopt/torch/puzzletron/tools/bypassed_training/child_init.py
- modelopt/torch/puzzletron/sewing_kit/passage/core.py
- modelopt/torch/puzzletron/anymodel/models/nemotron_h/nemotron_h_model_descriptor.py
🚧 Files skipped from review as they are similar to previous changes (27)
- .pre-commit-config.yaml
- modelopt/torch/puzzletron/dataset/init.py
- modelopt/torch/puzzletron/anymodel/models/mistral_small/init.py
- modelopt/torch/puzzletron/activation_scoring/activation_hooks/init.py
- modelopt/torch/puzzletron/sewing_kit/init.py
- modelopt/torch/puzzletron/dataset/prepare_dataset.py
- modelopt/torch/puzzletron/tools/common.py
- modelopt/torch/prune/importance_hooks/compare_module_outputs.py
- modelopt/torch/puzzletron/build_library_and_stats.py
- modelopt/torch/puzzletron/anymodel/models/qwen3/qwen3_converter.py
- modelopt/torch/puzzletron/init.py
- modelopt/torch/puzzletron/anymodel/models/init.py
- modelopt/torch/puzzletron/anymodel/puzzformer/no_op.py
- .github/workflows/example_tests.yml
- examples/puzzletron/main.py
- modelopt/torch/puzzletron/activation_scoring/activation_hooks/utils.py
- modelopt/torch/puzzletron/anymodel/model_descriptor/model_descriptor_factory.py
- modelopt/torch/puzzletron/anymodel/init.py
- modelopt/torch/puzzletron/anymodel/converter/converter_factory.py
- modelopt/torch/puzzletron/tools/init.py
- modelopt/torch/puzzletron/anymodel/models/nemotron_h/nemotron_h_converter.py
- modelopt/torch/puzzletron/mip/mip_and_realize_models.py
- modelopt/torch/puzzletron/pruning/pruning_mixin.py
- modelopt/torch/puzzletron/tools/bypassed_training/init_child_from_parent.py
- modelopt/torch/puzzletron/anymodel/puzzformer/init.py
- modelopt/torch/puzzletron/anymodel/models/qwen2/qwen2_model_descriptor.py
- modelopt/torch/puzzletron/sewing_kit/core.py
| def get_triton_output(self): | ||
| return ( | ||
| Tensor(name="sentences", shape=(-1,), dtype=bytes), | ||
| Tensor(name="logits", shape=(-1,), dtype=np.single), | ||
| Tensor(name="scores", shape=(-1,), dtype=np.single), | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
For PyTriton Tensor(shape=...), does the declared shape exclude only the batch dimension, and must every declared output be returned with matching rank on each request?
💡 Result:
Yes, for PyTriton Tensor(shape=...), the declared shape excludes the batch dimension when max_batch_size > 0 (which is the default), and every declared output must be returned with a matching rank (and shape, when strict=True) on each request.
Citations:
- 1: https://triton-inference-server.github.io/pytriton/0.2.4/binding_configuration/
- 2: https://docs.nvidia.com/deeplearning/triton-inference-server/archives/triton_inference_server_1140/user-guide/docs/model_configuration.html
- 3: https://docs.nvidia.com/deeplearning/triton-inference-server/archives/triton_inference_server_2310/user-guide/docs/user_guide/model_configuration.html
- 4: https://triton-inference-server.github.io/pytriton/latest/reference/model_config/
- 5: https://triton-inference-server.github.io/pytriton/0.3.1/binding_models/
🏁 Script executed:
cd examples/puzzletron/evaluation && wc -l hf_deployable_anymodel.pyRepository: NVIDIA/Model-Optimizer
Length of output: 95
🏁 Script executed:
cd examples/puzzletron/evaluation && sed -n '330,350p' hf_deployable_anymodel.pyRepository: NVIDIA/Model-Optimizer
Length of output: 1035
🏁 Script executed:
cd examples/puzzletron/evaluation && sed -n '395,435p' hf_deployable_anymodel.pyRepository: NVIDIA/Model-Optimizer
Length of output: 1721
Fix Triton output signature to match handler returns.
Lines 341-346 declare logits and scores as rank-1 outputs (shape=(-1,)), but the handler returns rank-3 tensors after transpose(1, 0, 2) at lines 417 and 423. Additionally, the error path (line 431) omits these outputs entirely, creating a contract violation where declared outputs are not always present. PyTriton requires all declared outputs to be returned with matching rank and presence on every request.
Also applies to: 401-429
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@examples/puzzletron/evaluation/hf_deployable_anymodel.py` around lines 341 -
346, The Triton output signature declared in get_triton_output must match the
actual tensors returned by the request handler: change the shapes/types for
"logits" and "scores" to the correct rank-3 shapes (matching the tensors
produced after transpose(1, 0, 2)) and ensure the handler always returns all
declared outputs even on error; update get_triton_output (and any duplicate
signatures in the 401-429 region) to declare "logits" and "scores" with the
proper three-dimensional shape and modify the error branch in the handler so it
returns placeholder tensors for "logits" and "scores" (e.g., zeros or
appropriate empty arrays with the correct dtype and rank) alongside "sentences"
to satisfy PyTriton’s requirement that all declared outputs are present and
rank-compatible.
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 6
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@modelopt/torch/puzzletron/scoring.py`:
- Line 18: Remove the module-wide "# mypy: ignore-errors" directive at the top
of scoring.py and instead fix the underlying type issues or apply scoped
ignores: run mypy, correct function/type annotations on the scoring entrypoints
(e.g., any top-level functions or classes that implement scoring or exports in
scoring.py), add precise per-line or per-symbol ignores (e.g., "# type:
ignore[arg-type]" or annotate with typing like Any, Optional, Callable) only
where necessary, and re-run mypy to ensure no module-wide suppression remains.
- Around line 84-86: Wrap the call to launch_scoring(cfg) in a try/finally so
that dist.cleanup() is always executed even if launch_scoring raises;
specifically, after calling dist.setup(timeout=cfg.nccl_timeout_minutes) enter a
try block, call launch_scoring(cfg) inside it, and call dist.cleanup() in the
finally block (re-raising any exception if needed) to ensure the process group
is torn down reliably.
- Around line 61-70: get_solutions_to_validate currently calls
pd.read_json(cfg.scoring.solutions_path) unconditionally which will fail if
solutions_path is a directory; update get_solutions_to_validate to mirror
load_puzzle_solutions behavior: convert cfg.scoring.solutions_path to a Path, if
is_file() use pd.read_json to produce single_block_replacement_solutions,
otherwise iterate over solution_*.json files in the directory, read each JSON
and concatenate into a DataFrame/list; then continue using
find_missing_solutions or np.arange(...) as before (refer to
get_solutions_to_validate, pd.read_json, load_puzzle_solutions,
find_missing_solutions and cfg.scoring.solutions_path to locate the relevant
code).
In `@modelopt/torch/puzzletron/sewing_kit/passage.py`:
- Around line 181-190: The __init__ uses mutable default arguments (e.g.,
outputs_cache: dict[str, Any] = {}, inputs_to_capture: Iterable[str] = [],
outputs_to_capture: Iterable[str] = [], input_overrides: Mapping[...] = {},
output_overrides: Mapping[...] = {}), causing shared state between instances;
change these defaults to None and inside Passage.__init__ initialize them to
fresh empty containers (e.g., if outputs_cache is None: outputs_cache = {}),
doing the same for
inputs_to_capture/outputs_to_capture/input_overrides/output_overrides so each
Passage instance gets its own local collections and no cross-instance leakage
occurs (also apply the same fix to the similar parameter list around the other
constructor occurrence noted).
- Around line 356-418: The cache fast-path currently skips modules even when
inputs are requested because can_be_skipped ignores passage.inputs_to_capture;
update can_be_skipped (used by PassageModuleWrapper.can_be_skipped called from
run_passage) to also check passage.inputs_to_capture and return False if any
entry in inputs_to_capture matches the module (using is_submodule_or_same for
direct captures and/or is_submodule_of for descendant captures), so that when
run_passage would return a cached_output it does not bypass required input
capture logic.
- Around line 289-300: The context-manager leaves stale state and builds names
that don't match forward(): in Passage.__exit__ call the active context
manager's __exit__ and then set self.active_context_manager = None so future
enters don't assert; in Passage.__enter__ stop using self.named_modules() to
build module_to_name_mapping (which yields "module."-prefixed names) and instead
derive names that match the override/capture keys used by forward() — e.g.,
iterate self.modules() (or use named_modules() but strip the "module." prefix)
and assign module_to_name_mapping entries keyed by id(module) -> the exact
unprefixed name expected by override/capture so lookups align.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 47033822-a1f6-49a1-a096-6c7e29002d82
📒 Files selected for processing (5)
modelopt/torch/puzzletron/__init__.pymodelopt/torch/puzzletron/dataset/prepare_dataset.pymodelopt/torch/puzzletron/scoring.pymodelopt/torch/puzzletron/sewing_kit/passage.pymodelopt/torch/puzzletron/utils/validate_runtime_pipeline.py
🚧 Files skipped from review as they are similar to previous changes (2)
- modelopt/torch/puzzletron/init.py
- modelopt/torch/puzzletron/dataset/prepare_dataset.py
| def get_solutions_to_validate(cfg: DictConfig): | ||
| _solutions_to_validate = cfg.scoring.solutions_to_validate | ||
| if _solutions_to_validate is None: | ||
| single_block_replacement_solutions = pd.read_json(cfg.scoring.solutions_path) | ||
| if cfg.scoring.skip_existing_solutions: | ||
| _solutions_to_validate = find_missing_solutions( | ||
| single_block_replacement_solutions, cfg.scoring.output_dir | ||
| ) | ||
| else: | ||
| _solutions_to_validate = np.arange(single_block_replacement_solutions.shape[0]).tolist() |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's locate and read the file
find . -name "scoring.py" -path "*/puzzletron/*" | head -5Repository: NVIDIA/Model-Optimizer
Length of output: 104
🏁 Script executed:
# Get the file and look at the relevant lines
cat -n modelopt/torch/puzzletron/scoring.py | sed -n '50,80p'Repository: NVIDIA/Model-Optimizer
Length of output: 1535
🏁 Script executed:
# Search for the validator contract/schema definition
rg -n "solutions_path|solutions_to_validate" modelopt/torch/puzzletron/ -A 2 -B 2 | head -50Repository: NVIDIA/Model-Optimizer
Length of output: 5099
🏁 Script executed:
# Look for find_missing_solutions function
rg -n "def find_missing_solutions" modelopt/torch/puzzletron/ -A 10Repository: NVIDIA/Model-Optimizer
Length of output: 1014
🏁 Script executed:
# Look for extract_solution_id function mentioned in suggested fix
rg -n "extract_solution_id" modelopt/torch/puzzletron/ -A 5Repository: NVIDIA/Model-Optimizer
Length of output: 1006
🏁 Script executed:
# Look at load_puzzle_solutions function to see how it handles both file and directory
rg -n "def load_puzzle_solutions" modelopt/torch/puzzletron/ -A 20Repository: NVIDIA/Model-Optimizer
Length of output: 2440
🏁 Script executed:
# Check imports to see what's available in scoring.py
head -30 modelopt/torch/puzzletron/scoring.pyRepository: NVIDIA/Model-Optimizer
Length of output: 1099
🏁 Script executed:
# Check if load_puzzle_solutions is used in scoring.py
rg -n "load_puzzle_solutions" modelopt/torch/puzzletron/scoring.pyRepository: NVIDIA/Model-Optimizer
Length of output: 48
🏁 Script executed:
# Check if load_puzzle_solutions is imported/available in scoring.py module
rg -n "load_puzzle_solutions|from.*tools" modelopt/torch/puzzletron/scoring.pyRepository: NVIDIA/Model-Optimizer
Length of output: 248
🏁 Script executed:
# Let's look at the complete extract_solution_id function to confirm its behavior
cat -n modelopt/torch/puzzletron/scoring.py | sed -n '37,45p'Repository: NVIDIA/Model-Optimizer
Length of output: 406
🏁 Script executed:
# Check if there's usage documentation or test cases that show solutions_path as directory
rg -n "solutions_path" modelopt/torch/puzzletron/ -B 2 -A 2 | grep -i "dir\|glob" | head -20Repository: NVIDIA/Model-Optimizer
Length of output: 981
Support directory-valued solutions_path before calling pd.read_json.
The validator contract allows solutions_path to be either a single JSON file or a directory of solution_<id>.json files, but pd.read_json() on line 64 only handles the file case and will fail when given a directory path. This mirrors the issue handled by load_puzzle_solutions() in validate_puzzle_with_multi_replacements.py, which correctly checks is_file() vs directory and processes accordingly.
Suggested fix
def get_solutions_to_validate(cfg: DictConfig):
_solutions_to_validate = cfg.scoring.solutions_to_validate
if _solutions_to_validate is None:
- single_block_replacement_solutions = pd.read_json(cfg.scoring.solutions_path)
+ solutions_path = cfg.scoring.solutions_path
+ if os.path.isdir(solutions_path):
+ files = glob(os.path.join(solutions_path, "solution_*.json"))
+ all_solution_ids = sorted(
+ s for s in (extract_solution_id(os.path.basename(f)) for f in files) if s is not None
+ )
+ else:
+ single_block_replacement_solutions = pd.read_json(solutions_path)
+ all_solution_ids = np.arange(single_block_replacement_solutions.shape[0]).tolist()
+
if cfg.scoring.skip_existing_solutions:
- _solutions_to_validate = find_missing_solutions(
- single_block_replacement_solutions, cfg.scoring.output_dir
- )
+ _solutions_to_validate = np.setdiff1d(
+ all_solution_ids,
+ [
+ s
+ for s in (
+ extract_solution_id(os.path.basename(p))
+ for p in glob(os.path.join(cfg.scoring.output_dir, "solution_*.json"))
+ )
+ if s is not None
+ ],
+ ).tolist()
else:
- _solutions_to_validate = np.arange(single_block_replacement_solutions.shape[0]).tolist()
+ _solutions_to_validate = all_solution_ids
return _solutions_to_validate🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modelopt/torch/puzzletron/scoring.py` around lines 61 - 70,
get_solutions_to_validate currently calls
pd.read_json(cfg.scoring.solutions_path) unconditionally which will fail if
solutions_path is a directory; update get_solutions_to_validate to mirror
load_puzzle_solutions behavior: convert cfg.scoring.solutions_path to a Path, if
is_file() use pd.read_json to produce single_block_replacement_solutions,
otherwise iterate over solution_*.json files in the directory, read each JSON
and concatenate into a DataFrame/list; then continue using
find_missing_solutions or np.arange(...) as before (refer to
get_solutions_to_validate, pd.read_json, load_puzzle_solutions,
find_missing_solutions and cfg.scoring.solutions_path to locate the relevant
code).
| def __init__( | ||
| self, | ||
| module: nn.Module, | ||
| *, | ||
| inputs_to_capture: Iterable[str] = [], | ||
| outputs_to_capture: Iterable[str] = [], | ||
| input_overrides: Mapping[str, PassageInputAdapter | InputArgs] = {}, | ||
| output_overrides: Mapping[str, PassageOutputAdapter | Any] = {}, | ||
| outputs_cache: dict[str, Any] = {}, | ||
| capture_fake_outputs_predicate: Predicate = always_false_predicate, |
There was a problem hiding this comment.
Make outputs_cache instance-local.
The default {} here is shared across calls. A fake output cached by one Passage can leak into the next one and make can_be_skipped() reuse stale values on an unrelated run.
Suggested fix
def __init__(
self,
module: nn.Module,
*,
- inputs_to_capture: Iterable[str] = [],
- outputs_to_capture: Iterable[str] = [],
- input_overrides: Mapping[str, PassageInputAdapter | InputArgs] = {},
- output_overrides: Mapping[str, PassageOutputAdapter | Any] = {},
- outputs_cache: dict[str, Any] = {},
+ inputs_to_capture: Iterable[str] = (),
+ outputs_to_capture: Iterable[str] = (),
+ input_overrides: Mapping[str, PassageInputAdapter | InputArgs] | None = None,
+ output_overrides: Mapping[str, PassageOutputAdapter | Any] | None = None,
+ outputs_cache: Mapping[str, Any] | None = None,
capture_fake_outputs_predicate: Predicate = always_false_predicate,
capture_cache_outputs_predicate: Predicate = always_false_predicate,
early_exit: bool = False,
name: Optional[str] = None,
):
@@
- self.input_overrides = input_overrides
- self.output_overrides = output_overrides
- self.outputs_cache = outputs_cache
+ self.input_overrides = input_overrides or {}
+ self.output_overrides = output_overrides or {}
+ self.outputs_cache = dict(outputs_cache or {})
@@
def create(
cls,
module: nn.Module,
*,
- inputs_to_capture: Iterable[str] = [],
- outputs_to_capture: Iterable[str] = [],
- input_overrides: Mapping[str, PassageInputAdapter | InputArgs] = {},
- output_overrides: Mapping[str, PassageOutputAdapter | Any] = {},
- outputs_cache: dict[str, Any] = {},
+ inputs_to_capture: Iterable[str] = (),
+ outputs_to_capture: Iterable[str] = (),
+ input_overrides: Mapping[str, PassageInputAdapter | InputArgs] | None = None,
+ output_overrides: Mapping[str, PassageOutputAdapter | Any] | None = None,
+ outputs_cache: Mapping[str, Any] | None = None,
capture_fake_outputs_predicate: Predicate = always_false_predicate,
capture_cache_outputs_predicate: Predicate = always_false_predicate,
early_exit: bool = False,
name: Optional[str] = None,
) -> Passage:Also applies to: 250-258
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modelopt/torch/puzzletron/sewing_kit/passage.py` around lines 181 - 190, The
__init__ uses mutable default arguments (e.g., outputs_cache: dict[str, Any] =
{}, inputs_to_capture: Iterable[str] = [], outputs_to_capture: Iterable[str] =
[], input_overrides: Mapping[...] = {}, output_overrides: Mapping[...] = {}),
causing shared state between instances; change these defaults to None and inside
Passage.__init__ initialize them to fresh empty containers (e.g., if
outputs_cache is None: outputs_cache = {}), doing the same for
inputs_to_capture/outputs_to_capture/input_overrides/output_overrides so each
Passage instance gets its own local collections and no cross-instance leakage
occurs (also apply the same fix to the similar parameter list around the other
constructor occurrence noted).
| def __enter__(self): | ||
| assert self.active_context_manager is None | ||
| self.active_context_manager = Passage.active_passages_context(self) | ||
| self.active_context_manager.__enter__() | ||
| self.module_to_name_mapping = {id(v): k for k, v in self.named_modules()} | ||
|
|
||
| return self | ||
|
|
||
| def __exit__(self, exc_type, exc_val, exc_tb): | ||
| assert self.active_context_manager is not None | ||
| self.active_context_manager.__exit__(exc_type, exc_val, exc_tb) | ||
|
|
There was a problem hiding this comment.
The context-manager path leaves stale state behind and remaps names differently from forward().
After the first with passage: ..., active_context_manager is never cleared, so is_active() stays true and the next enter hits the assert. Also, rebuilding module_to_name_mapping from self.named_modules() prefixes everything with module., which stops matching the override/capture keys used elsewhere.
Suggested fix
def __enter__(self):
assert self.active_context_manager is None
self.active_context_manager = Passage.active_passages_context(self)
self.active_context_manager.__enter__()
- self.module_to_name_mapping = {id(v): k for k, v in self.named_modules()}
+ self.module_to_name_mapping = {id(v): k for k, v in self.module.named_modules()}
return self
def __exit__(self, exc_type, exc_val, exc_tb):
assert self.active_context_manager is not None
- self.active_context_manager.__exit__(exc_type, exc_val, exc_tb)
+ try:
+ self.active_context_manager.__exit__(exc_type, exc_val, exc_tb)
+ finally:
+ self.active_context_manager = None🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modelopt/torch/puzzletron/sewing_kit/passage.py` around lines 289 - 300, The
context-manager leaves stale state and builds names that don't match forward():
in Passage.__exit__ call the active context manager's __exit__ and then set
self.active_context_manager = None so future enters don't assert; in
Passage.__enter__ stop using self.named_modules() to build
module_to_name_mapping (which yields "module."-prefixed names) and instead
derive names that match the override/capture keys used by forward() — e.g.,
iterate self.modules() (or use named_modules() but strip the "module." prefix)
and assign module_to_name_mapping entries keyed by id(module) -> the exact
unprefixed name expected by override/capture so lookups align.
| def can_be_skipped(_self: PassageModuleWrapper, depth: int) -> bool: | ||
| passages_beyond_depth = Passage.active_passages_context[depth:] | ||
| module_name = Passage.module_name_relative_to_active_passage(_self) | ||
|
|
||
| results = [ | ||
| ( | ||
| module_name in passage.outputs_cache | ||
| and not any( | ||
| is_submodule_or_same(k, module_name) for k in passage.outputs_to_capture | ||
| ) | ||
| and not any( | ||
| is_submodule_of(k, module_name) | ||
| for k, v in passage.input_overrides.items() | ||
| if v is not None | ||
| ) | ||
| and not any( | ||
| is_submodule_of(k, module_name) | ||
| for k, v in passage.output_overrides.items() | ||
| if v is not None | ||
| ) | ||
| ) | ||
| for passage in passages_beyond_depth | ||
| ] | ||
|
|
||
| result = all(results) | ||
|
|
||
| return result | ||
|
|
||
| # Defined as a static method to avoid potential collision with original class methods | ||
| @staticmethod | ||
| @dynamo_skip | ||
| def run_passage(_self: PassageModuleWrapper, depth: int, args, kwargs): | ||
| if depth + 1 > len(Passage.active_passages_context): | ||
| output = super(PassageModuleWrapper, _self).__call__(*args, **kwargs) | ||
| return output | ||
|
|
||
| module_name = Passage.module_name_relative_to_active_passage(_self) | ||
| passage = Passage.active_passages_context[depth] | ||
|
|
||
| has_output_override = module_name in passage.output_overrides | ||
| output_override = passage.output_overrides.get(module_name) | ||
|
|
||
| if has_output_override and not isinstance(output_override, PassageOutputAdapter): | ||
| output = output_override | ||
| else: | ||
| input_override = passage.input_overrides.get(module_name) | ||
| if input_override is not None: | ||
| original_input_args = InputArgs(*args, **kwargs) | ||
|
|
||
| if isinstance(input_override, PassageInputAdapter): | ||
| new_input_args = input_override(original_input_args, module_name, module) | ||
| else: | ||
| new_input_args = input_override | ||
|
|
||
| args, kwargs = new_input_args.args, new_input_args.kwargs | ||
|
|
||
| if ( | ||
| output_override is None | ||
| and PassageModuleWrapper.can_be_skipped(_self, depth) | ||
| and (has_fake_tensor(args) or has_fake_tensor(kwargs)) | ||
| ): | ||
| cached_output = passage.outputs_cache[module_name] | ||
| return cached_output |
There was a problem hiding this comment.
Don't let the cache fast-path skip requested input captures.
can_be_skipped() ignores inputs_to_capture. When a cached fake output exists, Line 418 returns before Lines 440-446 run, so input capture requests for that module or its descendants are silently dropped.
Suggested fix
(
module_name in passage.outputs_cache
+ and not any(
+ is_submodule_or_same(k, module_name) for k in passage.inputs_to_capture
+ )
and not any(
is_submodule_or_same(k, module_name) for k in passage.outputs_to_capture
)
and not any(
is_submodule_of(k, module_name)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modelopt/torch/puzzletron/sewing_kit/passage.py` around lines 356 - 418, The
cache fast-path currently skips modules even when inputs are requested because
can_be_skipped ignores passage.inputs_to_capture; update can_be_skipped (used by
PassageModuleWrapper.can_be_skipped called from run_passage) to also check
passage.inputs_to_capture and return False if any entry in inputs_to_capture
matches the module (using is_submodule_or_same for direct captures and/or
is_submodule_of for descendant captures), so that when run_passage would return
a cached_output it does not bypass required input capture logic.
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 5
♻️ Duplicate comments (6)
modelopt/torch/puzzletron/scoring.py (2)
61-70:⚠️ Potential issue | 🟠 Major
get_solutions_to_validatefails for directory-valuedsolutions_path.Line 64 unconditionally calls
pd.read_json(...), but Puzzletron’s loader supports both a JSON file and a directory of*solution*.json. This will break the directory path flow and diverge from validation behavior.Suggested fix
+from pathlib import Path import numpy as np -import pandas as pd @@ -from .tools.validate_puzzle_with_multi_replacements import validate_puzzle_solutions +from .tools.validate_puzzle_with_multi_replacements import ( + load_puzzle_solutions, + validate_puzzle_solutions, +) @@ -def find_missing_solutions(solutions_df, validation_dir): - all_solutions = np.arange(solutions_df.shape[0]) +def find_missing_solutions(num_solutions: int, validation_dir): + all_solutions = np.arange(num_solutions) @@ def get_solutions_to_validate(cfg: DictConfig): _solutions_to_validate = cfg.scoring.solutions_to_validate if _solutions_to_validate is None: - single_block_replacement_solutions = pd.read_json(cfg.scoring.solutions_path) + puzzle_solutions = load_puzzle_solutions( + Path(cfg.scoring.solutions_path), + cfg.scoring.sort_solutions_by, + cfg.scoring.bigger_is_better, + ) + num_solutions = len(puzzle_solutions) if cfg.scoring.skip_existing_solutions: _solutions_to_validate = find_missing_solutions( - single_block_replacement_solutions, cfg.scoring.output_dir + num_solutions, cfg.scoring.output_dir ) else: - _solutions_to_validate = np.arange(single_block_replacement_solutions.shape[0]).tolist() + _solutions_to_validate = np.arange(num_solutions).tolist() return _solutions_to_validate🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/scoring.py` around lines 61 - 70, get_solutions_to_validate currently always calls pd.read_json on cfg.scoring.solutions_path which fails when that path is a directory of solution files; update the function to detect whether cfg.scoring.solutions_path is a file or a directory, and if it is a directory, glob for files matching a pattern like "*solution*.json", read each file into a DataFrame/record list and concatenate them into single_block_replacement_solutions before proceeding; keep the existing branch that calls find_missing_solutions(...) (referencing single_block_replacement_solutions, cfg.scoring.output_dir, cfg.scoring.skip_existing_solutions) when skip_existing_solutions is True, and otherwise set _solutions_to_validate to the range of rows as currently done.
18-18:⚠️ Potential issue | 🟠 MajorRemove module-wide mypy suppression.
Line 18 disables type checking for the whole module and can hide regressions in the scoring entrypoint. Use scoped ignores only where needed.
As per coding guidelines, "Use mypy for static type checking Python code."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/scoring.py` at line 18, Remove the module-wide "# mypy: ignore-errors" suppression and instead apply targeted ignores only where necessary: delete the line and run mypy on this module, then add "# type: ignore" or "# type: ignore[code]" comments on the specific functions/expressions that fail (e.g., the scoring entrypoint function(s) and any helper like score_batch or compute_similarity) so that only those lines are suppressed while the rest of the module remains type-checked.modelopt/torch/puzzletron/sewing_kit/passage.py (3)
361-379:⚠️ Potential issue | 🟠 MajorDon’t bypass requested input captures.
This skip predicate still ignores
passage.inputs_to_capture. When Line 418 returns a cached output, the capture block at Lines 441-446 never runs, so explicitly requested input captures are silently dropped.Suggested fix
( module_name in passage.outputs_cache + and not any( + is_submodule_or_same(k, module_name) for k in passage.inputs_to_capture + ) and not any( is_submodule_or_same(k, module_name) for k in passage.outputs_to_capture ) and not any( is_submodule_of(k, module_name)Also applies to: 413-419
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/sewing_kit/passage.py` around lines 361 - 379, The skip predicate currently allows returning cached outputs even when inputs were explicitly requested to be captured; update the boolean condition used where module_name is checked against passage.outputs_cache to also verify that no requested input captures target this module (i.e. add a check that not any(is_submodule_or_same(k, module_name) for k in passage.inputs_to_capture)). Apply this same additional check in both places where the predicate appears (the block using passage.outputs_cache and the earlier similar predicate around lines 413-419) so that passages with inputs_to_capture are not bypassed.
289-294:⚠️ Potential issue | 🟠 MajorKeep context-manager module names aligned with
forward().Line 293 rebuilds
module_to_name_mappingfromself.named_modules(), which prefixes entries withmodule.. That no longer matches the names produced frommodule.named_modules()in__init__/create(), sowith passage:can miss capture/override keys that work in the normalforward()path.Suggested fix
def __enter__(self): assert self.active_context_manager is None self.active_context_manager = Passage.active_passages_context(self) self.active_context_manager.__enter__() - self.module_to_name_mapping = {id(v): k for k, v in self.named_modules()} + self.module_to_name_mapping = {id(v): k for k, v in self.module.named_modules()} return self🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/sewing_kit/passage.py` around lines 289 - 294, The __enter__ method rebuilds module_to_name_mapping from self.named_modules(), which introduces a "module." prefix and breaks alignment with the names used in forward()/the mapping created in __init__/create(); fix by making __enter__ reuse the same naming logic as __init__/create__ (either reuse the existing self.module_to_name_mapping set during construction instead of rebuilding, or rebuild by applying the same normalization used in create/__init__ — e.g., strip the "module." prefix or use the same helper that produced names there) so module names in __enter__ match forward() and capture/override keys.
181-190:⚠️ Potential issue | 🟠 MajorMake
outputs_cacheinstance-local.The
{}default here is still shared acrossPassageinstances, and Line 209 stores it by reference. A cached fake output from one passage can leak into the next and makecan_be_skipped()reuse stale values.Suggested fix
def __init__( self, module: nn.Module, *, - inputs_to_capture: Iterable[str] = [], - outputs_to_capture: Iterable[str] = [], - input_overrides: Mapping[str, PassageInputAdapter | InputArgs] = {}, - output_overrides: Mapping[str, PassageOutputAdapter | Any] = {}, - outputs_cache: dict[str, Any] = {}, + inputs_to_capture: Iterable[str] = (), + outputs_to_capture: Iterable[str] = (), + input_overrides: Mapping[str, PassageInputAdapter | InputArgs] | None = None, + output_overrides: Mapping[str, PassageOutputAdapter | Any] | None = None, + outputs_cache: Mapping[str, Any] | None = None, capture_fake_outputs_predicate: Predicate = always_false_predicate, capture_cache_outputs_predicate: Predicate = always_false_predicate, early_exit: bool = False, name: Optional[str] = None, ): @@ - self.input_overrides = input_overrides - self.output_overrides = output_overrides - self.outputs_cache = outputs_cache + self.input_overrides = input_overrides or {} + self.output_overrides = output_overrides or {} + self.outputs_cache = dict(outputs_cache or {}) @@ def create( cls, module: nn.Module, *, - inputs_to_capture: Iterable[str] = [], - outputs_to_capture: Iterable[str] = [], - input_overrides: Mapping[str, PassageInputAdapter | InputArgs] = {}, - output_overrides: Mapping[str, PassageOutputAdapter | Any] = {}, - outputs_cache: dict[str, Any] = {}, + inputs_to_capture: Iterable[str] = (), + outputs_to_capture: Iterable[str] = (), + input_overrides: Mapping[str, PassageInputAdapter | InputArgs] | None = None, + output_overrides: Mapping[str, PassageOutputAdapter | Any] | None = None, + outputs_cache: Mapping[str, Any] | None = None, capture_fake_outputs_predicate: Predicate = always_false_predicate, capture_cache_outputs_predicate: Predicate = always_false_predicate, early_exit: bool = False, name: Optional[str] = None, ) -> Passage:Also applies to: 209-209, 250-258
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/sewing_kit/passage.py` around lines 181 - 190, The outputs_cache default argument in Passage.__init__ is a mutable {} shared across instances causing cached fake outputs to leak between Passage objects; change the signature to accept outputs_cache: Optional[dict[str, Any]] = None and in the constructor (where self.outputs_cache is set, referenced around the assignment that stores it) initialize self.outputs_cache = {} if outputs_cache is None, ensuring each Passage instance gets its own dict; apply the same pattern to any other mutable default arguments in the same class/methods referenced around lines 250-258 (replace {} defaults with None and initialize per-instance).modelopt/torch/puzzletron/tools/checkpoint_utils_hf.py (1)
237-250:⚠️ Potential issue | 🟠 MajorPublish the safetensors index after shard writes succeed.
Line 241 publishes
model.safetensors.index.jsonbefore any shard file has been committed. Ifsave_subblocks()fails, the directory is left with an index that references missing shards. Save the shards first and write the index last.Suggested change
- # Write index (now without tied embedding) - index = {"metadata": {"format": "pt"}, "weight_map": weight_map} - index_path = checkpoint_dir / SAFE_WEIGHTS_INDEX_NAME - index_json = json_dumps(index) - _write_file_process_safe(index_json, index_path) - # Phase 3: Save subblocks save_subblocks( state_dict, checkpoint_dir, weight_map=weight_map, multi_threaded=True, max_workers=max_workers, ) + + # Publish index only after all shard writes have succeeded. + index = {"metadata": {"format": "pt"}, "weight_map": weight_map} + index_path = checkpoint_dir / SAFE_WEIGHTS_INDEX_NAME + index_json = json_dumps(index) + _write_file_process_safe(index_json, index_path)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/tools/checkpoint_utils_hf.py` around lines 237 - 250, The code writes the safetensors index (index variable, index_path using SAFE_WEIGHTS_INDEX_NAME via _write_file_process_safe) before saving shard files; move the index write to after save_subblocks completes successfully so the index only references committed shards. Specifically, call save_subblocks(state_dict, checkpoint_dir, weight_map=weight_map, multi_threaded=True, max_workers=max_workers) first and then build and atomically write the index JSON with _write_file_process_safe(index_json, index_path) (ensure any existing atomic/write-temp logic is retained) so failures in save_subblocks do not leave a dangling/missing-shard index.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@modelopt/torch/puzzletron/replacement_library/build_replacement_library.py`:
- Around line 441-447: The code currently converts the deduplicated checkpoint
list to a set (deduped_checkpoint_dirs), which discards the sorted order;
instead, return an ordered sequence so downstream _build_subblocks_df (which
uses keep="first") sees the same checkpoint ordering. Replace the set(...)
wrapper with a list (or keep the .tolist() result) when creating
deduped_checkpoint_dirs from the DataFrame built from valid_checkpoint_dirs and
experiment_dirs, ensuring the DataFrame.sort_values(...).drop_duplicates(...)[
"checkpoint_dir"].tolist() is returned as an ordered list.
- Around line 83-99: Change build_replacement_library to accept an explicit
trust_remote_code: bool = False parameter and stop calling
descriptor.requires_trust_remote_code() to auto-enable remote code; instead call
requires_trust_remote_code() and if it returns True while the new
trust_remote_code parameter is False raise a clear error asking the caller to
opt in, then pass the explicit trust_remote_code parameter into
_build_subblocks_df, _build_block_library_from_subblocks,
_build_layer_replacements, and _build_single_sequence_replacement_solutions so
downstream functions use the caller-provided flag (references:
build_replacement_library, descriptor.requires_trust_remote_code,
_build_subblocks_df, _build_block_library_from_subblocks,
_build_layer_replacements, _build_single_sequence_replacement_solutions).
In `@modelopt/torch/puzzletron/sewing_kit/utils.py`:
- Around line 164-166: get_active() always returns self.activity_stack[-1] even
when this stack is used in reversed mode (where pushes/pops happen at index 0);
update get_active in the class to check the reversed flag (e.g., self.reversed
or whatever boolean the class uses for reversed mode) and return
self.activity_stack[0] when reversed is true, otherwise return
self.activity_stack[-1], keeping the existing is_active() guard so it still
returns Optional[T].
In `@modelopt/torch/puzzletron/tools/checkpoint_utils_hf.py`:
- Around line 307-340: The function currently builds safe_save_kwargs and may
set max_workers to 0 which causes ThreadPoolExecutor to raise when
safe_save_kwargs is empty; before creating the thread pool (and before
computing/using max_workers for multi_threaded saving), check if
safe_save_kwargs is empty and return early (or skip saving) to avoid launching
ThreadPoolExecutor with zero workers; update the logic around the multi_threaded
branch (references: safe_save_kwargs, max_workers, multi_threaded,
optimized_safe_save, ThreadPoolExecutor) so the empty-shard case is handled
gracefully and the function exits or proceeds without attempting to create the
executor.
- Around line 444-450: save_model_config currently mutates
model_config.block_configs in-place by converting dataclass entries to dicts;
instead, create a copy of the config (or at least a shallow copy of the
block_configs list), convert dataclass entries to dicts into that new list (use
dataclasses.is_dataclass and dataclasses.asdict when needed), assign the
converted list to the copied config, and call save_pretrained on the copy so the
live model_config remains unchanged.
---
Duplicate comments:
In `@modelopt/torch/puzzletron/scoring.py`:
- Around line 61-70: get_solutions_to_validate currently always calls
pd.read_json on cfg.scoring.solutions_path which fails when that path is a
directory of solution files; update the function to detect whether
cfg.scoring.solutions_path is a file or a directory, and if it is a directory,
glob for files matching a pattern like "*solution*.json", read each file into a
DataFrame/record list and concatenate them into
single_block_replacement_solutions before proceeding; keep the existing branch
that calls find_missing_solutions(...) (referencing
single_block_replacement_solutions, cfg.scoring.output_dir,
cfg.scoring.skip_existing_solutions) when skip_existing_solutions is True, and
otherwise set _solutions_to_validate to the range of rows as currently done.
- Line 18: Remove the module-wide "# mypy: ignore-errors" suppression and
instead apply targeted ignores only where necessary: delete the line and run
mypy on this module, then add "# type: ignore" or "# type: ignore[code]"
comments on the specific functions/expressions that fail (e.g., the scoring
entrypoint function(s) and any helper like score_batch or compute_similarity) so
that only those lines are suppressed while the rest of the module remains
type-checked.
In `@modelopt/torch/puzzletron/sewing_kit/passage.py`:
- Around line 361-379: The skip predicate currently allows returning cached
outputs even when inputs were explicitly requested to be captured; update the
boolean condition used where module_name is checked against
passage.outputs_cache to also verify that no requested input captures target
this module (i.e. add a check that not any(is_submodule_or_same(k, module_name)
for k in passage.inputs_to_capture)). Apply this same additional check in both
places where the predicate appears (the block using passage.outputs_cache and
the earlier similar predicate around lines 413-419) so that passages with
inputs_to_capture are not bypassed.
- Around line 289-294: The __enter__ method rebuilds module_to_name_mapping from
self.named_modules(), which introduces a "module." prefix and breaks alignment
with the names used in forward()/the mapping created in __init__/create(); fix
by making __enter__ reuse the same naming logic as __init__/create__ (either
reuse the existing self.module_to_name_mapping set during construction instead
of rebuilding, or rebuild by applying the same normalization used in
create/__init__ — e.g., strip the "module." prefix or use the same helper that
produced names there) so module names in __enter__ match forward() and
capture/override keys.
- Around line 181-190: The outputs_cache default argument in Passage.__init__ is
a mutable {} shared across instances causing cached fake outputs to leak between
Passage objects; change the signature to accept outputs_cache:
Optional[dict[str, Any]] = None and in the constructor (where self.outputs_cache
is set, referenced around the assignment that stores it) initialize
self.outputs_cache = {} if outputs_cache is None, ensuring each Passage instance
gets its own dict; apply the same pattern to any other mutable default arguments
in the same class/methods referenced around lines 250-258 (replace {} defaults
with None and initialize per-instance).
In `@modelopt/torch/puzzletron/tools/checkpoint_utils_hf.py`:
- Around line 237-250: The code writes the safetensors index (index variable,
index_path using SAFE_WEIGHTS_INDEX_NAME via _write_file_process_safe) before
saving shard files; move the index write to after save_subblocks completes
successfully so the index only references committed shards. Specifically, call
save_subblocks(state_dict, checkpoint_dir, weight_map=weight_map,
multi_threaded=True, max_workers=max_workers) first and then build and
atomically write the index JSON with _write_file_process_safe(index_json,
index_path) (ensure any existing atomic/write-temp logic is retained) so
failures in save_subblocks do not leave a dangling/missing-shard index.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: da42c004-2d60-493e-b24f-628967934064
📒 Files selected for processing (8)
examples/puzzletron/evaluation/hf_deployable_anymodel.pymodelopt/torch/puzzletron/mip/mip_with_multi_layer_replacements.pymodelopt/torch/puzzletron/replacement_library/build_replacement_library.pymodelopt/torch/puzzletron/scoring.pymodelopt/torch/puzzletron/sewing_kit/passage.pymodelopt/torch/puzzletron/sewing_kit/utils.pymodelopt/torch/puzzletron/tools/checkpoint_utils_hf.pytests/gpu/torch/puzzletron/test_puzzletron.py
| deduped_checkpoint_dirs = set( | ||
| pd.DataFrame({"checkpoint_dir": valid_checkpoint_dirs, "experiment_dir": experiment_dirs}) | ||
| .sort_values("checkpoint_dir") | ||
| .drop_duplicates(subset="experiment_dir", keep="last")["checkpoint_dir"] | ||
| .tolist() | ||
| ) | ||
| return deduped_checkpoint_dirs |
There was a problem hiding this comment.
Preserve checkpoint order when deduplicating experiments.
The DataFrame is sorted, then immediately converted to a set, which discards that order. Later _build_subblocks_df() resolves duplicate subblocks with keep="first", so identical student configs can end up pointing at different source checkpoints across runs. Return an ordered sequence here instead of a set.
Suggested change
-def _get_last_checkpoint_from_each_experiment(
- master_puzzle_dir: Path | str, trust_remote_code: bool = False
-) -> set[Path]:
+def _get_last_checkpoint_from_each_experiment(
+ master_puzzle_dir: Path | str, trust_remote_code: bool = False
+) -> list[Path]:
@@
- deduped_checkpoint_dirs = set(
- pd.DataFrame({"checkpoint_dir": valid_checkpoint_dirs, "experiment_dir": experiment_dirs})
- .sort_values("checkpoint_dir")
- .drop_duplicates(subset="experiment_dir", keep="last")["checkpoint_dir"]
- .tolist()
- )
+ deduped_checkpoint_dirs = (
+ pd.DataFrame({"checkpoint_dir": valid_checkpoint_dirs, "experiment_dir": experiment_dirs})
+ .sort_values("checkpoint_dir")
+ .drop_duplicates(subset="experiment_dir", keep="last")["checkpoint_dir"]
+ .tolist()
+ )
return deduped_checkpoint_dirs🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modelopt/torch/puzzletron/replacement_library/build_replacement_library.py`
around lines 441 - 447, The code currently converts the deduplicated checkpoint
list to a set (deduped_checkpoint_dirs), which discards the sorted order;
instead, return an ordered sequence so downstream _build_subblocks_df (which
uses keep="first") sees the same checkpoint ordering. Replace the set(...)
wrapper with a list (or keep the .tolist() result) when creating
deduped_checkpoint_dirs from the DataFrame built from valid_checkpoint_dirs and
experiment_dirs, ensuring the DataFrame.sort_values(...).drop_duplicates(...)[
"checkpoint_dir"].tolist() is returned as an ordered list.
| def get_active(self) -> Optional[T]: | ||
| if self.is_active(): | ||
| return self.activity_stack[-1] |
There was a problem hiding this comment.
Return the head element when reversed=True.
get_active() still always reads self.activity_stack[-1], but reversed=True pushes/pops at index 0. Passage.active_passages_context in modelopt/torch/puzzletron/sewing_kit/passage.py Line 179 uses that mode, so nested passages make module_name_relative_to_active_passage() at Lines 242-247 resolve against the outer passage instead of the current one.
Suggested fix
def get_active(self) -> Optional[T]:
if self.is_active():
- return self.activity_stack[-1]
+ return self.activity_stack[0 if self.reversed else -1]
return None🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modelopt/torch/puzzletron/sewing_kit/utils.py` around lines 164 - 166,
get_active() always returns self.activity_stack[-1] even when this stack is used
in reversed mode (where pushes/pops happen at index 0); update get_active in the
class to check the reversed flag (e.g., self.reversed or whatever boolean the
class uses for reversed mode) and return self.activity_stack[0] when reversed is
true, otherwise return self.activity_stack[-1], keeping the existing is_active()
guard so it still returns Optional[T].
| safe_save_kwargs = [ | ||
| {"tensors": partial_state_dict, "filename": filename, "metadata": {"format": "pt"}} | ||
| for filename, partial_state_dict in filename_to_partial_state_dict.items() | ||
| ] | ||
|
|
||
| # Auto-calculate optimal I/O workers: min(cpu_count, num_files) | ||
| if max_workers is None: | ||
| cpu_count = os.cpu_count() or 1 | ||
| num_files = len(safe_save_kwargs) | ||
| max_workers = min(cpu_count, num_files) | ||
| mprint( | ||
| f" Auto-calculated I/O workers: min({cpu_count} CPUs, {num_files} files) = {max_workers}" | ||
| ) | ||
| else: | ||
| mprint(f" Using specified I/O workers: {max_workers}") | ||
|
|
||
| prepare_time = time.time() - prepare_start_time | ||
| mprint(f" Step 4 - Prepare save args: {prepare_time:.2f}s ({len(safe_save_kwargs)} files)") | ||
|
|
||
| # Step 5: Save files with optimal worker count | ||
| save_start_time = time.time() | ||
| if multi_threaded: | ||
| mprint(f" Using multi-threaded saving with {max_workers} workers...") | ||
|
|
||
| def optimized_safe_save(kwargs): | ||
| try: | ||
| safe_save_file(**kwargs) | ||
| return True | ||
| except Exception as e: | ||
| mprint(f" Error saving {kwargs['filename']}: {e}") | ||
| return False | ||
|
|
||
| with concurrent.futures.ThreadPoolExecutor(max_workers=max_workers) as executor: | ||
| results = list(executor.map(optimized_safe_save, safe_save_kwargs)) |
There was a problem hiding this comment.
Handle the empty-shard case before building the thread pool.
If safe_save_kwargs is empty, Line 316 computes max_workers = 0, and Line 339 then raises from ThreadPoolExecutor. This helper can legitimately see an empty state_dict/weight_map, so it should return early instead of crashing.
Suggested change
safe_save_kwargs = [
{"tensors": partial_state_dict, "filename": filename, "metadata": {"format": "pt"}}
for filename, partial_state_dict in filename_to_partial_state_dict.items()
]
+
+ if not safe_save_kwargs:
+ mprint(" No shard files to save")
+ return
# Auto-calculate optimal I/O workers: min(cpu_count, num_files)
if max_workers is None:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modelopt/torch/puzzletron/tools/checkpoint_utils_hf.py` around lines 307 -
340, The function currently builds safe_save_kwargs and may set max_workers to 0
which causes ThreadPoolExecutor to raise when safe_save_kwargs is empty; before
creating the thread pool (and before computing/using max_workers for
multi_threaded saving), check if safe_save_kwargs is empty and return early (or
skip saving) to avoid launching ThreadPoolExecutor with zero workers; update the
logic around the multi_threaded branch (references: safe_save_kwargs,
max_workers, multi_threaded, optimized_safe_save, ThreadPoolExecutor) so the
empty-shard case is handled gracefully and the function exits or proceeds
without attempting to create the executor.
| def save_model_config(model_config: PretrainedConfig, checkpoint_dir: Path | str) -> None: | ||
| if hasattr(model_config, "block_configs"): | ||
| model_config.block_configs = [ | ||
| dataclasses.asdict(conf) if dataclasses.is_dataclass(conf) else conf | ||
| for conf in model_config.block_configs | ||
| ] | ||
| model_config.save_pretrained(checkpoint_dir) |
There was a problem hiding this comment.
Avoid mutating model.config during save.
save_model_config() rewrites model_config.block_configs in place from dataclasses to dicts. After save_checkpoint(model, ...), the live model.config no longer carries BlockConfig-like objects, which can break any later in-process use of the model. Serialize a copy instead.
Suggested change
+import copy
+
def save_model_config(model_config: PretrainedConfig, checkpoint_dir: Path | str) -> None:
- if hasattr(model_config, "block_configs"):
- model_config.block_configs = [
+ config_to_save = copy.deepcopy(model_config)
+ if hasattr(config_to_save, "block_configs"):
+ config_to_save.block_configs = [
dataclasses.asdict(conf) if dataclasses.is_dataclass(conf) else conf
- for conf in model_config.block_configs
+ for conf in config_to_save.block_configs
]
- model_config.save_pretrained(checkpoint_dir)
+ config_to_save.save_pretrained(checkpoint_dir)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modelopt/torch/puzzletron/tools/checkpoint_utils_hf.py` around lines 444 -
450, save_model_config currently mutates model_config.block_configs in-place by
converting dataclass entries to dicts; instead, create a copy of the config (or
at least a shallow copy of the block_configs list), convert dataclass entries to
dicts into that new list (use dataclasses.is_dataclass and dataclasses.asdict
when needed), assign the converted list to the copied config, and call
save_pretrained on the copy so the live model_config remains unchanged.
## Summary pruned FFN checkpoints created by `init_child_from_parent` were missing custom `.py` files (e.g. `modeling_nemotron_h.py`, `configuration_nemotron_h.py`). without them, `AutoConfig.from_pretrained(..., trust_remote_code=True)` fails silently and the checkpoint is excluded from the replacement library, reducing MIP candidate diversity. adds `copy_remote_code_files(source_dir, output_dir)` to `checkpoint_utils_hf.py` that copies `*.py` files from the source checkpoint root. called from `init_child_from_parent` after `copy_tokenizer` when `descriptor.requires_trust_remote_code()` is true. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added a new utility to automatically copy Python model files from checkpoints during training initialization. * **Tests** * Added unit tests covering selective Python file copying, preservation of existing files, and proper handling of missing source directories. <!-- end of auto-generated comment: release notes by coderabbit.ai --> Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com> Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
…#1220) ## Summary - Unified `examples/puzzletron/mbridge_distillation/distill_hf.py` (AnyModel-specific) into `examples/megatron_bridge/distill.py` (general) - The single script now handles both standard HF and Puzzletron AnyModel checkpoints. - Added `--hf_export_path` / `--student_hf_model` args for inline HF export after distillation. - Merged AnyModel integration test into `tests/examples/megatron_bridge/test_distill.py` - test models use `vocab_size=128` (instead of default 102) for TP divisibility including 8. - Moved MMLU distillation results into `megatron_bridge/README.md` - puzzletron README now redirects to the consolidated docs. Limitation discovered during consolidation: HF export via `--hf_export_path` seems to currently not work for Puzzletron AnyModel (heterogeneous) checkpoints. Megatron-Bridge's `export_ckpt` cannot reload heterogeneous model configs from saved checkpoints (`heterogeneous_layers_config_encoded_json` is `None` during `__post_init__` in `heterogeneous_config.py`). This affects both inline `--hf_export_path` and the separate `convert_checkpoints.py export` script. The original `distill_hf.py` README documented this as supported, but I think it might have been broken there too (on the side of Megatron-Bridge). The consolidated README now documents this as a known limitation. HF export for standard models works fine via both methods. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes * **New Features** * Added support for Puzzletron AnyModel checkpoints in distillation pipeline. * Introduced inline HuggingFace export capability during distillation process. * **Documentation** * Updated distillation guide with clearer conversion workflows and optional HuggingFace export instructions. * Added distillation benchmarks and performance recommendations. * **Bug Fixes & Improvements** * Streamlined test infrastructure and workflow configuration. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: jrausch <jrausch@nvidia.com> Signed-off-by: root <root@pool0-00848.cm.cluster> Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com> Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@examples/megatron_bridge/distill.py`:
- Around line 56-57: The current use of contextlib.suppress(ImportError) around
the import modelopt.torch.puzzletron.plugins.mbridge masks real import errors
inside that module; replace this with the repository's lazy plugin loader by
calling import_plugin("modelopt.torch.puzzletron.plugins.mbridge") (or
equivalent import_plugin API) so that only the "module not installed" case is
handled by the plugin system and real exceptions during module initialization
surface; if import_plugin returns a boolean or raises a specific
NotInstalled/PluginNotFound exception, catch only that specific condition
instead of broadly suppressing ImportError, and remove
contextlib.suppress(ImportError) and the direct import to ensure proper
registration and visible failures.
- Around line 342-345: The config copy is using the wrong source; change the
AutoConfig.from_pretrained call (the block that calls
AutoConfig.from_pretrained(...).save_pretrained(args.hf_export_path)) to load
from args.student_hf_model instead of args.student_hf_path so the exported
config matches the templated export; keep trust_remote_code and the
save_pretrained(args.hf_export_path) call unchanged.
In `@examples/megatron_bridge/README.md`:
- Around line 161-176: Add a clear caveat under "Converting to Hugging Face
format (optional)" noting that the inline export flags --hf_export_path and
--student_hf_model in distill.py do not work for heterogeneous Puzzletron
"AnyModel" checkpoints; instruct users to use the "Separate conversion" workflow
(Megatron-Bridge conversion script) for AnyModel/Puzzletron outputs or to run
offline conversion after distillation instead of relying on the inline
--hf_export_path path.
In `@examples/puzzletron/README.md`:
- Around line 288-296: The README currently includes --trust-remote-code
unconditionally in the example vllm commands (vllm bench latency and vllm bench
throughput); change the examples to omit --trust-remote-code by default, add a
short conditional note immediately after each command explaining that
--trust-remote-code should only be added when the model requires custom Python
execution and only for trusted checkpoints, and show an optional example
invocation that includes --trust-remote-code for trusted models only.
- Around line 40-42: Update the bare-metal install example in README.md to quote
the extras spec so the shell doesn't expand it: replace the unquoted pip install
-e .[hf,puzzletron] command with a quoted extras form (e.g., pip install -e
".[hf,puzzletron]" or pip install -e '.[hf,puzzletron]') used in the
examples/puzzletron README snippet to match the NeMo-container safe form and
prevent zsh globbing.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: d90f3313-257d-4f77-9a3e-d5740008a777
📒 Files selected for processing (7)
.github/workflows/example_tests.ymlexamples/megatron_bridge/README.mdexamples/megatron_bridge/distill.pyexamples/megatron_bridge/results/puzzletron.mdexamples/puzzletron/README.mdtests/_test_utils/torch/puzzletron/utils.pytests/examples/megatron_bridge/test_distill.py
✅ Files skipped from review due to trivial changes (1)
- examples/megatron_bridge/results/puzzletron.md
🚧 Files skipped from review as they are similar to previous changes (1)
- .github/workflows/example_tests.yml
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
There was a problem hiding this comment.
♻️ Duplicate comments (1)
examples/megatron_bridge/distill.py (1)
342-345:⚠️ Potential issue | 🟠 MajorUse
student_hf_modelinstead ofstudent_hf_pathfor config export.The export bridge is created from
args.student_hf_model(line 332-333), but the config is copied fromargs.student_hf_path(line 343-344). For Puzzletron/AnyModel runs,student_hf_pathmay point to a heterogeneous source checkpoint, causing a mismatchedconfig.jsonin the exported HF checkpoint.Suggested fix
# Copy config.json from student_hf_path (handles both local paths and HF model IDs) AutoConfig.from_pretrained( - args.student_hf_path, trust_remote_code=args.trust_remote_code + args.student_hf_model, trust_remote_code=args.trust_remote_code ).save_pretrained(args.hf_export_path)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/megatron_bridge/distill.py` around lines 342 - 345, The config export is using args.student_hf_path but the export bridge and model selection use args.student_hf_model, which can cause mismatched config.json for heterogeneous checkpoints; update the AutoConfig.from_pretrained call (the one invoking AutoConfig.from_pretrained(...).save_pretrained(args.hf_export_path)) to load from args.student_hf_model instead of args.student_hf_path so the exported HF checkpoint uses the same model identifier used to create the export bridge.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@examples/megatron_bridge/distill.py`:
- Around line 342-345: The config export is using args.student_hf_path but the
export bridge and model selection use args.student_hf_model, which can cause
mismatched config.json for heterogeneous checkpoints; update the
AutoConfig.from_pretrained call (the one invoking
AutoConfig.from_pretrained(...).save_pretrained(args.hf_export_path)) to load
from args.student_hf_model instead of args.student_hf_path so the exported HF
checkpoint uses the same model identifier used to create the export bridge.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 1f6aecf3-5fc1-4fb8-84b8-8eb1eb22a9fa
📒 Files selected for processing (5)
.github/CODEOWNERSexamples/llm_eval/lm_eval_hf.pyexamples/megatron_bridge/README.mdexamples/megatron_bridge/distill.pyexamples/puzzletron/README.md
✅ Files skipped from review due to trivial changes (2)
- .github/CODEOWNERS
- examples/megatron_bridge/README.md
🚧 Files skipped from review as they are similar to previous changes (1)
- examples/llm_eval/lm_eval_hf.py
cjluo-nv
left a comment
There was a problem hiding this comment.
This is a massive 24K+ line PR adding the Puzzletron heterogeneous pruning algorithm. The code quality is generally solid with good documentation, comprehensive test coverage (9 model configs tested e2e), and a clean plugin architecture. However, at ~24K lines, this is extremely large and there are several issues worth addressing.
Key Concerns:
-
PR Size (24K+ lines): This is ~24x the recommended limit. While the PR description shows it was already reviewed in ~18 smaller PRs on the feature branch, the final merge into main is still very large. The config YAML files alone represent massive duplication.
-
Ruff lint suppression: The entire
modelopt/torch/puzzletron/*directory has all major lint categories disabled inpyproject.toml. This is a significant code quality concession that should have a timeline for remediation. -
Potential bug in
_patched_decoder_layer_init: The patcher usesdecoder_layer_classes.index(self.__class__)to find the right original__init__, but if a subclass appears that isn't in the list, this will raiseValueErrorinstead of a descriptive error. -
copy_remote_code→copy_hf_ckpt_remote_coderename is a breaking change: The function was renamed and its implementation changed fromhf_hub_downloadtosnapshot_download. The old test file was deleted (test_hf_checkpoint_utils.pyfromtests/gpu_megatron/) and a new one added intests/unit/. Any external code usingcopy_remote_codewill break. -
snapshot_downloadbehavior change: The newcopy_hf_ckpt_remote_codeusessnapshot_downloadwithallow_patterns=["*.py"]which downloads all.pyfiles including those in subdirectories, whereas the old implementation only copied top-level.pyfiles. This could pull unexpected files. -
Massive config duplication: The YAML config files across model directories (llama, qwen2, qwen3, mistral, nemotron, etc.) are nearly identical with only model-specific fields differing. These should use shared base configs more aggressively.
-
sewing_kitmodule is substantial (~1800 lines) with its own distributed communication layer. It's unclear if this duplicates functionality already available in ModelOpt's distributed utils.
| if _block_configs is None: | ||
| return orig_inits[decoder_layer_classes.index(self.__class__)]( | ||
| self, config, *args, **kwargs | ||
| ) |
There was a problem hiding this comment.
Potential bug: decoder_layer_classes.index(self.__class__) will raise ValueError if self.__class__ is a dynamically-generated subclass that isn't in the list. Consider using a dict lookup or adding a try/except with a descriptive error message:
try:
idx = decoder_layer_classes.index(self.__class__)
except ValueError:
raise TypeError(
f"Decoder layer class {self.__class__} not in patched classes: {decoder_layer_classes}"
)There was a problem hiding this comment.
not important to address
| attn_keys = [key for key in attn_keys if key in new_state_dict.keys()] | ||
| if len(attn_keys) > 0 and all(key in keys for key in attn_keys): | ||
| for key in attn_keys: | ||
| keys_to_remove[key] = keys[key] |
There was a problem hiding this comment.
Mutable default keys_to_remove: keys_to_remove is passed as a dict parameter that gets mutated in-place. While this works because callers always pass their own dict, the pattern is error-prone. Consider returning it as part of the function return value instead of relying on mutation.
There was a problem hiding this comment.
not important to address
cjluo-nv
left a comment
There was a problem hiding this comment.
Approved to unblock based on the feedback that
None of these are blockers.
1 - more extensive review done in feature branch PRs so fine to add 24k LOC to main
2 - We will continuously keep improve code quality (removing suppressed mypy / lint exceptions added for now)
3 - not blocker
4,5 - not important
6 - follow-up to unify yamls
7 - fine
## Summary - lm-eval 0.4.10 has API changes that break `lm_eval_hf.py`, which monkey-patches `HFLM.create_from_arg_obj` - Pin `examples/puzzletron/requirements.txt` from `lm-eval==0.4.10` to `lm-eval==0.4.8` to match `examples/llm_eval/requirements.txt` - Add version compatibility warning to `lm_eval_hf.py` for early detection if versions drift again <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Chores** * Updated the lm-eval dependency version to 0.4.8 in the Puzzletron example * Added compatibility version checking to the LM Eval example with appropriate notifications <!-- end of auto-generated comment: release notes by coderabbit.ai --> Signed-off-by: jrausch <jrausch@nvidia.com> Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com> Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
…tical params (#1258) ### Summary - on hybrid models (e.g. Nemotron-H), `calculate_subblock_params` builds a 1-layer model to count per-layer params by setting `num_hidden_layers=1` - it left `hybrid_override_pattern` at full length, so the 1-layer model always built layer 0 (`pattern[0]` = Mamba). every FFN variant reported the same Mamba param count regardless of `intermediate_size` - this made MIP unable to differentiate FFN sizes ### Fix - truncate `hybrid_override_pattern` to the single character matching the subblock being measured before instantiating the 1-layer model - per iteration/layer, we deep copy the model config and create a per-layer model config with fixed pattern - activates only when `hybrid_override_pattern` is present; non-hybrid models (Llama, Qwen, etc.) are unaffected <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Bug Fixes** * Enhanced accuracy of parameter counting for hybrid FFN model configurations. * **Tests** * Added unit tests for hybrid pattern truncation functionality. * Added GPU validation tests for Nemotron-H model parameter calculations. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: jrausch <jrausch@nvidia.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
What does this PR do?
Implement puzzletron compression algorithm based on Puzzle paper (https://arxiv.org/abs/2411.19146)
Th list of reviewed and merged MRs that resulted in the feature/puzzletron branch
Merging dkorzekwa/any_model to feature/puzzletron
Add anymodel directories to feature/puzzletron by danielkorzekwa · Pull Request #974 · NVIDIA/Model-Optimizer - merged
Draft: anymodel activation scoring by danielkorzekwa · Pull Request #989 · NVIDIA/Model-Optimizer - merged
Draft: Merge anymodel pruning by danielkorzekwa · Pull Request #990 · NVIDIA/Model-Optimizer - merged
Draft: Merging anymodel:build_library_and_stats by danielkorzekwa · Pull Request #993 · NVIDIA/Model-Optimizer - merged
Dkorzekwa/any model calc one block scores by danielkorzekwa · Pull Request #994 · NVIDIA/Model-Optimizer - merged
Draft: merge any_model: mip_and_realize_models by danielkorzekwa · Pull Request #995 · NVIDIA/Model-Optimizer - merged
Dkorzekwa/any model other modeqls by danielkorztiekwa · Pull Request #1007 · NVIDIA/Model-Optimizer - merged
PR to 1007: #1039 - merged
Dkorzekwa/anymodel gptoss by danielkorzekwa · Pull Request #1020 · NVIDIA/Model-Optimizer - merged
Merge any_model tutorial by danielkorzekwa · Pull Request #1035 · NVIDIA/Model-Optimizer - merged
Merge mbridge distillation for any_model by danielkorzekwa · Pull Request #1036 · NVIDIA/Model-Optimizer - merged
MR branch for the remaining difference between dkorzekwa/any_model an… by danielkorzekwa · Pull Request #1047 · NVIDIA/Model-Optimizer - merged
Dkorzekwa/decilm hf code cleanup by danielkorzekwa · Pull Request #1071 · NVIDIA/Model-Optimizer - merged
Dkorzekwa/decilm hf code cleanup 2 by danielkorzekwa · Pull Request #1073 · NVIDIA/Model-Optimizer - merged
Dkorzekwa/anymodel subblock stats by danielkorzekwa · Pull Request #1085 · NVIDIA/Model-Optimizer - merged
Dkorzekwa/anymodel subblock stats nodecilm by danielkorzekwa · Pull Request #1102 · NVIDIA/Model-Optimizer - merged
Dkorzekwa/decilm cleanup post subblockstats by danielkorzekwa · Pull Request #1103 · NVIDIA/Model-Optimizer - merged
code clean up by danielkorzekwa · Pull Request #1110 · NVIDIA/Model-Optimizer - merged
Merging into main:
Activation hooks redesign (reuse hooks component across both minitron and puzzletron) by danielkorzekwa · Pull Request #1022 · NVIDIA/Model-Optimizer - merged
Dkorzekwa/puzzletron use importance hooks from prune by danielkorzekwa · Pull Request #1115 · NVIDIA/Model-Optimizer - merged
Usage
Puzzletron tutorial:
https://github.com/NVIDIA/Model-Optimizer/tree/feature/puzzletron/examples/puzzletron
Testing
The main e2e test for compressing 9 models with Puzzletron:
https://github.com/NVIDIA/Model-Optimizer/blob/feature/puzzletron/tests/gpu/torch/puzzletron/test_puzzletron.py
2-gpu nightly tests:
Before your PR is "Ready for review"
CONTRIBUTING.md: ✅Summary by CodeRabbit
New Features
Documentation
Chores