Commit c11b8bf
Fix PP inference correctness in megatron_generate and megatron_importer
Two bugs caused pipeline-parallel inference to produce garbage output:
1. megatron_generate/megatron_prefill used get_forward_backward_func() (the
training pipeline scheduler), which is not designed for inference. Rewrote
both functions to use explicit P2P communication via
recv_from_prev_pipeline_rank_ / send_to_next_pipeline_rank, matching the
pattern from run_mcore_inference.
2. import_mcore_gpt_from_hf loads HF weights into stage 0's embedding but
never updates the output_layer on the last PP stage when
share_embeddings_and_output_weights=True. After import, call
model.setup_embeddings_and_output_layer() to re-run the all-reduce that
syncs the output layer from stage 0 to the last stage.
Also parametrize the megatron_generate test to cover both TP and PP.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>1 parent 303e429 commit c11b8bf
File tree
5 files changed
+211
-176
lines changed- modelopt/torch
- export/plugins
- utils/plugins
- tests/gpu_megatron/torch/utils/plugins
5 files changed
+211
-176
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
| 25 | + | |
25 | 26 | | |
26 | 27 | | |
27 | 28 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
747 | 747 | | |
748 | 748 | | |
749 | 749 | | |
| 750 | + | |
| 751 | + | |
| 752 | + | |
| 753 | + | |
| 754 | + | |
| 755 | + | |
| 756 | + | |
| 757 | + | |
| 758 | + | |
| 759 | + | |
| 760 | + | |
750 | 761 | | |
751 | 762 | | |
752 | 763 | | |
| |||
0 commit comments