Model-Optimizer/examples/puzzletron/GPTOSS.md at ed5fd6824c4f7f3322b2b48fb0f64ef08c8aef9c · NVIDIA/Model-Optimizer

GptOss

With this release Puzzle algorithm supports only experts removal for Gpt-Oss.

This model comes as a quantized checkpoint i.e. MoE experts matrices are quantized with MXFP4 format. In the pruning steps puzzle utilizes decompressed model (back to BF16) for statistics and scores computation. This means, during the conversion to puzzle format we decompress the model and store it as a BF16. Once the pruning is done i.e. experts to be removed are identified and the process is finished, user may want to get back the MXFP4 format of the checkpoint. To do so, there is an additional script, that takes the original and the pruned checkpoint and outputs pruned checkpoint in MXFP4 format.

python -m modelopt.torch.puzzletron.anymodel.models.gpt_oss.gpt_oss_pruned_to_mxfp4 --student-path /workspaces/any_model_gpt_oss/mip/puzzle_solutions/stats_num_params_18014757184/solutions--checkpoints/solution_0/ --original-path /workspaces/source_model_checkpoints/openai_gpt-oss-20b/ --output-path /workspaces/any_model_gpt_oss/mip/puzzle_solutions/stats_num_params_18014757184/solutions--checkpoints/mxfp4-ckpt/  --num-layers 24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GptOss

FilesExpand file tree

GPTOSS.md

Latest commit

History

GPTOSS.md

File metadata and controls

GptOss