-
Notifications
You must be signed in to change notification settings - Fork 193
feat: adding PP and CP for nemotron v3 models #2316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
25 commits
Select commit
Hold shift + click to select a range
db3b407
feat(nemotron_v3): pipeline parallel + MTP support, plus THD collator…
adil-a cf8f3de
chore(datasets): drop pre_rendered_chat_dataset.py from branch
adil-a 285ea0c
fix(thd_utils): compute max_seqlen from final cu_seqlens to honor TE …
adil-a 54915d2
chore(thd_utils): drop dead cu_seqlens_padded fallback in emit
adil-a 96cc3a1
docs(thd_utils): update docstrings to match post-fix cu_seqlens seman…
adil-a 8946f90
chore(thd_utils): trim verbose inline comments
adil-a aa9238e
comments
adil-a ff34ee0
fix(seq_idx): use searchsorted(right=True) to classify boundary tokens
adil-a 239fe64
chore: remove debug env-var hooks from THD packing investigation
adil-a a39736f
test(mtp): pin MTPModule cumulative left-rolling of input_ids and pos…
adil-a 868ba6d
chore(mtp): trim verbose comments in calculate_mtp_loss
adil-a eaa1aaa
docs(nemotron_v3): correct stale 'PP cannot chunk' comments in model.…
adil-a 64ec935
chore(nemotron_v3): trim verbose comments in model.py
adil-a bb238e1
fix(nemotron_v3): seq_idx tail builder uses searchsorted(right=True)
adil-a 61e13a9
docs(nemotron_v3): restore Args/Returns docstring on NemotronHForCaus…
adil-a 0da9e5e
docs(nemotron_v3): fix inaccuracies in NemotronHForCausalLM.forward d…
adil-a c05ea54
chore(moe): trim verbose comment in apply_cp loop
adil-a aa4b4e4
fix(nemotron_v3): correct seq_idx and MTP loss shape for non-pp THD p…
adil-a c9098b8
feat(moe): avoid load-time OOM via in-place views in MoE to_hf split
adil-a 6ac2135
fix(nemotron_v3): correct mamba seq_idx + MTP masking for mbs>1 THD p…
adil-a b8dcb08
fix(moe): backward-safe, always-on shared-expert overlap
adil-a f1d06d4
refactor(train_ft): simplify THD/MTP gating and MTP cu_seqlens plumbing
adil-a cb7e9ab
fix(nemotron_v3): gate cu_seqlens-from-mask on B==1 + fix stale tests
adil-a 6c49fd5
Merge branch 'main' into adil/adil-test
adil-a 14c0e7a
fix(nemotron_v3): fix context-parallel regressions (mamba seq_idx + C…
adil-a File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.