Weijiac/dsv4 bridge#4770
Draft
weijiac0619 wants to merge 436 commits into
Draft
Conversation
Signed-off-by: Pablo Garay <pagaray@nvidia.com>
Signed-off-by: Pablo Garay <pagaray@nvidia.com>
Signed-off-by: Robin Zhang <robinz@nvidia.com>
Signed-off-by: Jianbing Dong <jianbingd@nvidia.com> Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com> Signed-off-by: oliver könig <okoenig@nvidia.com> Signed-off-by: Ananth Subramaniam <ansubramania@nvidia.com> Signed-off-by: dimapihtar <dpihtar@gmail.com> Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com> Signed-off-by: Youngeun <kyeg9404@gmail.com> Signed-off-by: Maanu Grover <maanug@nvidia.com> Signed-off-by: ykarnati <ykarnati@nvidia.com> Signed-off-by: Deepak Narayanan <dnarayanan@nvidia.com> Signed-off-by: GitHub Actions <github-actions[bot]@users.noreply.github.com> Signed-off-by: Charlie Truong <chtruong@nvidia.com> Signed-off-by: Zhongbo Zhu <zhongboz@nvidia.com> Signed-off-by: Xiaowei Ren <xren@nvidia.com> Signed-off-by: Xin Yao <xiny@nvidia.com> Signed-off-by: Keshav Santhanam <ksanthanam@nvidia.com> Signed-off-by: Pablo Garay <pagaray@nvidia.com> Signed-off-by: Asha Anoosheh <aanoosheh@nvidia.com> Signed-off-by: Chen Cui <chcui@nvidia.com> Signed-off-by: Li Tao <lit@nvidia.com> Signed-off-by: lit <lit@nvidia.com> Signed-off-by: Santosh Bhavani <santosh.bhavani@live.com> Signed-off-by: Robin Zhang <robinz@nvidia.com> Signed-off-by: kunlunl <kunlunl@nvidia.com> Co-authored-by: Jianbin Chang <shjwudp@gmail.com> Co-authored-by: Deyu Fu <Deyu.Foo@gmail.com> Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com> Co-authored-by: Yashaswi Karnati <144376261+yashaswikarnati@users.noreply.github.com> Co-authored-by: Jared Casper <155158+jaredcasper@users.noreply.github.com> Co-authored-by: Antoni-Joan Solergibert <asolergibert@nvidia.com> Co-authored-by: oliver könig <okoenig@nvidia.com> Co-authored-by: Ananth Subramaniam <ansubramania@nvidia.com> Co-authored-by: Teodor-Dumitru Ene <34819528+tdene@users.noreply.github.com> Co-authored-by: Siddharth Singh <136645615+sidsingh-nvidia@users.noreply.github.com> Co-authored-by: Mcore Bot <mcore-bot@nvidia.com> Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com> Co-authored-by: Youngeun Kwon <youngeunk@nvidia.com> Co-authored-by: Lawrence McAfee <85179052+lmcafee-nvidia@users.noreply.github.com> Co-authored-by: Maanu Grover <109391026+maanug-nv@users.noreply.github.com> Co-authored-by: Lawrence McAfee <lmcafee@nvidia.com> Co-authored-by: AJ Schmidt <ajschmidt8@users.noreply.github.com> Co-authored-by: Deepak Narayanan <2724038+deepakn94@users.noreply.github.com> Co-authored-by: helen ngo <helenn@nvidia.com> Co-authored-by: GitHub Actions <github-actions[bot]@users.noreply.github.com> Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com> Co-authored-by: Robert Kirby <rkirby@nvidia.com> Co-authored-by: Teodor-Dumitru Ene <tene@nvidia.com> Co-authored-by: yeyu-nvidia <yeyu@nvidia.com> Co-authored-by: Abhinav Khattar <akhattar@nvidia.com> Co-authored-by: Roger Waleffe <rwaleffe@nvidia.com> Co-authored-by: Charlie Truong <chtruong@nvidia.com> Co-authored-by: Tong Liu <liutongt1998@gmail.com> Co-authored-by: Zhongbo Zhu <42691305+zhongbozhu@users.noreply.github.com> Co-authored-by: Xiaowei Ren <xren@nvidia.com> Co-authored-by: Xin Yao <xiny@nvidia.com> Co-authored-by: Teodor-Dumitru Ene <teodord.ene@gmail.com> Co-authored-by: Zijie Yan <zijiey@nvidia.com> Co-authored-by: root <root@pool0-01101.cm.cluster> Co-authored-by: Keshav Santhanam <ksanthanam@nvidia.com> Co-authored-by: Pablo Garay <pagaray@nvidia.com> Co-authored-by: Asha Anoosheh <aanoosheh@nvidia.com> Co-authored-by: Kan Zhu <kanz@nvidia.com> Co-authored-by: Robert Kirby <rkirby@cw-dfw-cs-001-vscode-01.cm.cluster> Co-authored-by: Jorge Albericio <jalbericiola@nvidia.com> Co-authored-by: Jon Barker <19699370+jon-barker@users.noreply.github.com> Co-authored-by: Chen Cui <chcui@nvidia.com> Co-authored-by: Pablo Garay <palenq@gmail.com> Co-authored-by: Tong Liu <tongliu@nvidia.com> Co-authored-by: Michael Wojcikiewicz <mwojcikiewic@nvidia.com> Co-authored-by: Li Tao <lit@nvidia.com> Co-authored-by: Santosh Bhavani <santosh.bhavani@live.com> Co-authored-by: Li Ruixiao <cgruixiao@outlook.com> Co-authored-by: Robin Zhang <robinz@nvidia.com> Co-authored-by: Kunlun Li <94586211+kunlunl@users.noreply.github.com>
Co-authored-by: Kunlun Li <94586211+kunlunl@users.noreply.github.com>
Signed-off-by: Deyu Fu <deyuf@nvidia.com>
Signed-off-by: Hao Wu <skyw@nvidia.com>
Co-authored-by: Xin Yao <xiny@nvidia.com>
…A2A overlap (NVIDIA#2201) Signed-off-by: Hongbin Liu <hongbinl@nvidia.com> Signed-off-by: Pingtian Li <pingtianl@nvidia.com> Co-authored-by: root <root@eos0318.eos.clusters.nvidia.com> Co-authored-by: Zijie Yan <zijiey@nvidia.com> Co-authored-by: Pingtian Li <pingtianl@nvidia.com>
Signed-off-by: Deyu Fu <deyuf@nvidia.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com> Signed-off-by: oliver könig <okoenig@nvidia.com> Signed-off-by: Ananth Subramaniam <ansubramania@nvidia.com> Signed-off-by: dimapihtar <dpihtar@gmail.com> Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com> Signed-off-by: Youngeun <kyeg9404@gmail.com> Signed-off-by: Maanu Grover <maanug@nvidia.com> Signed-off-by: ykarnati <ykarnati@nvidia.com> Signed-off-by: Deepak Narayanan <dnarayanan@nvidia.com> Signed-off-by: GitHub Actions <github-actions[bot]@users.noreply.github.com> Signed-off-by: Charlie Truong <chtruong@nvidia.com> Signed-off-by: Zhongbo Zhu <zhongboz@nvidia.com> Signed-off-by: Xiaowei Ren <xren@nvidia.com> Signed-off-by: Xin Yao <xiny@nvidia.com> Signed-off-by: Keshav Santhanam <ksanthanam@nvidia.com> Signed-off-by: Pablo Garay <pagaray@nvidia.com> Signed-off-by: Asha Anoosheh <aanoosheh@nvidia.com> Signed-off-by: Chen Cui <chcui@nvidia.com> Signed-off-by: Cory Ye <cye@nvidia.com> Signed-off-by: Guyue Huang <guyueh@nvidia.com> Signed-off-by: Deyu Fu <deyuf@nvidia.com> Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com> Co-authored-by: Yashaswi Karnati <144376261+yashaswikarnati@users.noreply.github.com> Co-authored-by: Jared Casper <155158+jaredcasper@users.noreply.github.com> Co-authored-by: Antoni-Joan Solergibert <asolergibert@nvidia.com> Co-authored-by: Jianbin Chang <shjwudp@gmail.com> Co-authored-by: oliver könig <okoenig@nvidia.com> Co-authored-by: Ananth Subramaniam <ansubramania@nvidia.com> Co-authored-by: Teodor-Dumitru Ene <34819528+tdene@users.noreply.github.com> Co-authored-by: Siddharth Singh <136645615+sidsingh-nvidia@users.noreply.github.com> Co-authored-by: Mcore Bot <mcore-bot@nvidia.com> Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com> Co-authored-by: Youngeun Kwon <youngeunk@nvidia.com> Co-authored-by: Lawrence McAfee <85179052+lmcafee-nvidia@users.noreply.github.com> Co-authored-by: Maanu Grover <109391026+maanug-nv@users.noreply.github.com> Co-authored-by: Lawrence McAfee <lmcafee@nvidia.com> Co-authored-by: AJ Schmidt <ajschmidt8@users.noreply.github.com> Co-authored-by: Deepak Narayanan <2724038+deepakn94@users.noreply.github.com> Co-authored-by: helen ngo <helenn@nvidia.com> Co-authored-by: GitHub Actions <github-actions[bot]@users.noreply.github.com> Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com> Co-authored-by: Robert Kirby <rkirby@nvidia.com> Co-authored-by: Teodor-Dumitru Ene <tene@nvidia.com> Co-authored-by: yeyu-nvidia <yeyu@nvidia.com> Co-authored-by: Abhinav Khattar <akhattar@nvidia.com> Co-authored-by: Roger Waleffe <rwaleffe@nvidia.com> Co-authored-by: Charlie Truong <chtruong@nvidia.com> Co-authored-by: Tong Liu <liutongt1998@gmail.com> Co-authored-by: Zhongbo Zhu <42691305+zhongbozhu@users.noreply.github.com> Co-authored-by: Xiaowei Ren <xren@nvidia.com> Co-authored-by: Xin Yao <xiny@nvidia.com> Co-authored-by: Teodor-Dumitru Ene <teodord.ene@gmail.com> Co-authored-by: Zijie Yan <zijiey@nvidia.com> Co-authored-by: root <root@pool0-01101.cm.cluster> Co-authored-by: Keshav Santhanam <ksanthanam@nvidia.com> Co-authored-by: Pablo Garay <pagaray@nvidia.com> Co-authored-by: Asha Anoosheh <aanoosheh@nvidia.com> Co-authored-by: Kan Zhu <kanz@nvidia.com> Co-authored-by: Robert Kirby <rkirby@cw-dfw-cs-001-vscode-01.cm.cluster> Co-authored-by: Jorge Albericio <jalbericiola@nvidia.com> Co-authored-by: Jon Barker <19699370+jon-barker@users.noreply.github.com> Co-authored-by: Chen Cui <chcui@nvidia.com> Co-authored-by: Pablo Garay <palenq@gmail.com> Co-authored-by: Tong Liu <tongliu@nvidia.com> Co-authored-by: Dennis(Zhenhuan) Liu <denliu@nvidia.com> Co-authored-by: yobi byte <yobibyte@users.noreply.github.com> Co-authored-by: Jon Barker <jbarker@cw-dfw-cs-001-vscode-01.cm.cluster> Co-authored-by: Michael Wojcikiewicz <mwojcikiewic@nvidia.com> Co-authored-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com> Co-authored-by: Shanmugam Ramasamy <shanmugamr@cw-dfw-cs-001-login-02.cm.cluster> Co-authored-by: Shanmugam Ramasamy <shanmugamr@cw-dfw-cs-001-login-01.cm.cluster> Co-authored-by: Cory Ye <44509866+cspades@users.noreply.github.com> Co-authored-by: Guyue Huang <140554423+guyueh1@users.noreply.github.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
…A#2649) Signed-off-by: Pablo Garay <pagaray@nvidia.com>
… utility (NVIDIA#2651) Signed-off-by: Maanu Grover <maanug@nvidia.com> Co-authored-by: Eric Harper <eharper@nvidia.com>
Signed-off-by: Robin Zhang <robinz@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com> Co-authored-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: Robin Zhang <robinz@nvidia.com>
…2086) Signed-off-by: kunlunl <kunlunl@nvidia.com> Co-authored-by: jianbinc <shjwudp@gmail.com>
Signed-off-by: oliver könig <okoenig@nvidia.com> Co-authored-by: Dennis Liu <denliu@nvidia.com>
…#2121) Co-authored-by: Li Tao <lit@nvidia.com>
…NVIDIA#2121)" (NVIDIA#2747) Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Co-authored-by: Hongbin Liu <lhb8125@users.noreply.github.com> Co-authored-by: Xin Yao <xiny@nvidia.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Shifang Xu <shifangx@nvidia.com> Co-authored-by: Xin Yao <xiny@nvidia.com>
…sing SMs (NVIDIA#4401) Co-authored-by: Gao Deng <gdeng@login-lyris02.lyris.clusters.nvidia.com>
- Restore dev's pyproject.toml, uv.lock, and Dockerfile.ci.dev - Update nvidia-resiliency-ext to main's revision (required for get_write_results_queue) - Fix hybrid_model.py: init_chunk_handler() missing pp_rank, delta_offload_bytes_across_pp_ranks, activation_offload_fraction params - Fix hybrid_model.py: mark_not_offloadable() -> mark_not_offload() - Run black + isort on all changed Python files Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Restore dev's nvidia-resiliency-ext revision to keep pyproject.toml and uv.lock consistent. The mismatch caused uv sync --locked to fail in CI linting. The get_write_results_queue import in torch.py is a lazy runtime import that won't be hit during linting or unit tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Previous formatting used wrong tool versions (black 24.10.0, isort 8.0.1). Re-ran with CI-pinned versions: black==24.4.2, isort==5.13.2. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The merge removed the import of ArgumentGroupFactory from argument_utils but it is still used extensively in the file. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…VIDIA#4430) Co-authored-by: Gao Deng <gdeng@login-lyris01.lyris.clusters.nvidia.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Cory Ye <cye@nvidia.com>
Signed-off-by: Xin Yao <xiny@nvidia.com> Signed-off-by: oliver könig <okoenig@nvidia.com> Signed-off-by: Cory Ye <cye@nvidia.com> Signed-off-by: dimapihtar <dpykhtar@nvidia.com> Signed-off-by: Maanu Grover <maanug@nvidia.com> Signed-off-by: Oliver Koenig <okoenig@nvidia.com> Signed-off-by: Charlie Truong <chtruong@nvidia.com> Signed-off-by: Keshav Santhanam <ksanthanam@nvidia.com> Signed-off-by: Ankur Srivastava <your_verified_email@domain.com> Signed-off-by: meg miranda <mmiranda@nvidia.com> Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com> Co-authored-by: Dennis(Zhenhuan) Liu <denliu@nvidia.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Philip Petrakian <ppetrakian@nvidia.com> Co-authored-by: Xin Yao <xiny@nvidia.com> Co-authored-by: oliver könig <okoenig@nvidia.com> Co-authored-by: Cory Ye <44509866+cspades@users.noreply.github.com> Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com> Co-authored-by: Maanu Grover <109391026+maanug-nv@users.noreply.github.com> Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com> Co-authored-by: Li Tao <lit@nvidia.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: janEbert <janpabloe@nvidia.com> Co-authored-by: Shifang Xu <shifangx@nvidia.com> Co-authored-by: Zhiyu Li <zhiyul@NVIDIA.com> Co-authored-by: melon <49278241+ezioliao@users.noreply.github.com> Co-authored-by: liaoyang <yliao@siflow.cn> Co-authored-by: Eric Harper <eharper@nvidia.com> Co-authored-by: Deepak Narayanan <deepakn94@gmail.com> Co-authored-by: Teodor-Dumitru Ene <34819528+tdene@users.noreply.github.com> Co-authored-by: Yuzhong Wang <yuzhongw@nvidia.com> Co-authored-by: Dhinesh Ponnarasan <160256912+DhineshPonnarasan@users.noreply.github.com> Co-authored-by: Seonmyeong Bak <sbak@nvidia.com> Co-authored-by: Charlie Truong <chtruong@nvidia.com> Co-authored-by: Kunlun Li <94586211+kunlunl@users.noreply.github.com> Co-authored-by: Kunlun Li <kunlunl@cw-dfw-cs-001-login-02.cm.cluster> Co-authored-by: wdykas <73254672+wdykas@users.noreply.github.com> Co-authored-by: William Dykas <wdykas@oci-hsg-cs-001-vscode-03.cm.cluster> Co-authored-by: root <root@nvl72065-T16.cm.cluster> Co-authored-by: root <root@nvl72163-T17.cm.cluster> Co-authored-by: Deepak Narayanan <dnarayanan@nvidia.com> Co-authored-by: Haoran Zhang <github@snowchord.com> Co-authored-by: Keshav Santhanam <ksanthanam@nvidia.com> Co-authored-by: Ankur Srivastava <101727556+awsankur@users.noreply.github.com> Co-authored-by: Ankur Srivastava <your_verified_email@domain.com> Co-authored-by: Antoni-Joan Solergibert <asolergibert@nvidia.com> Co-authored-by: Jon Barker <jbarker@nvidia.com> Co-authored-by: megnvidia <mmiranda@nvidia.com> Co-authored-by: Mikail Khona (NVIDIA) <mkhona@nvidia.com> Co-authored-by: Jingyue Wu <wujingyue@gmail.com> Co-authored-by: Jorge Albericio <jalbericiola@nvidia.com> Co-authored-by: Santosh Bhavani <santosh.bhavani@live.com> Co-authored-by: Youngeun Kwon <youngeunk@nvidia.com> Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
…#4473) Co-authored-by: Yu Huang <yuhuang@nvidia.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
- PyTorch fallback for fast_hadamard_transform (unavailable on aarch64/B200) - Cast mask dtype instead of assert in fused_qk_topk_naive (bf16/fp32 mismatch) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: weijiac <weijiac@nvidia.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do ?
Issue tracking
For PRs from open-source community contributors:
Linked issue:
Contribution process
Pre-checks
Code review
Feel free to message or comment the @mcore-oncall to help accelerate your merge into main. The less complex your PR is, the faster it will be approved and merged!
All PRs start as draft. If you open a non-draft PR, it will be automatically converted to draft.
Step 1: Mark PR as "Ready for Review"
.github/CODEOWNERS.Final Review might get declined if these requirements are not fulfilled.
Step 2: Final Review
For PRs that change
megatron/core, once all expert reviewers have approved, theFinal Reviewlabel is applied automatically and final reviewers are assigned.For PRs outside
megatron/core, this step is skipped.Step 3: Approved
Once all required reviewers have approved, the
Approvedlabel is applied automatically.Merge
Any member of mcore-engineers will be able to merge your PR.
For MRs into `dev` branch
The proposed review process for `dev` branch is under active discussion.MRs are mergable after one approval by either
eharper@nvidia.comorzijiey@nvidia.com.