Skip to content

Latest commit

 

History

History
14 lines (10 loc) · 1.13 KB

File metadata and controls

14 lines (10 loc) · 1.13 KB

GptOss

With this release Puzzle algorithm supports only experts removal for Gpt-Oss.

This model comes as a quantized checkpoint i.e. MoE experts matrices are quantized with MXFP4 format. In the pruning steps puzzle utilizes decompressed model (back to BF16) for statistics and scores computation. This means, during the conversion to puzzle format we decompress the model and store it as a BF16. Once the pruning is done i.e. experts to be removed are identified and the process is finished, user may want to get back the MXFP4 format of the checkpoint. To do so, there is an additional script, that takes the original and the pruned checkpoint and outputs pruned checkpoint in MXFP4 format.

python -m modelopt.torch.puzzletron.anymodel.models.gpt_oss.gpt_oss_pruned_to_mxfp4 --student-path /workspaces/any_model_gpt_oss/mip/puzzle_solutions/stats_num_params_18014757184/solutions--checkpoints/solution_0/ --original-path /workspaces/source_model_checkpoints/openai_gpt-oss-20b/ --output-path /workspaces/any_model_gpt_oss/mip/puzzle_solutions/stats_num_params_18014757184/solutions--checkpoints/mxfp4-ckpt/  --num-layers 24