Skip to content

Weijiac/dsv4 bridge#4770

Draft
weijiac0619 wants to merge 436 commits into
NVIDIA:mainfrom
weijiac0619:weijiac/dsv4-bridge
Draft

Weijiac/dsv4 bridge#4770
weijiac0619 wants to merge 436 commits into
NVIDIA:mainfrom
weijiac0619:weijiac/dsv4-bridge

Conversation

@weijiac0619
Copy link
Copy Markdown
Contributor

What does this PR do ?

⚠️ For major changes (either in lines of code or in its impact), please make sure to first share a design doc with the team. If you're unsure what's the best way to do so, contact the @mcore-oncall.

Issue tracking

For PRs from open-source community contributors:

  • New features: a linked issue is required. Please open a feature request and reference it here before submitting the PR.
  • Small updates (bug fixes, minor improvements): a linked issue is recommended and will accelerate the PR review process.

Linked issue:

Contribution process

Pre-checks

  • I have added relevant unit tests
  • I have added relevant functional tests
  • I have added proper typing to my code Typing guidelines
  • I have added relevant documentation
  • I have run the autoformatter.sh on my PR

Code review

Feel free to message or comment the @mcore-oncall to help accelerate your merge into main. The less complex your PR is, the faster it will be approved and merged!

All PRs start as draft. If you open a non-draft PR, it will be automatically converted to draft.

Step 1: Mark PR as "Ready for Review"

  1. When your PR is ready, click Ready for Review.
  2. An oncall reviewer is auto-assigned and expert reviewers are notified based on your changes.
    • Some PRs may jump straight to step 2. This is determined by .github/CODEOWNERS.

⚠️ Only mark as ready once merge-conflicts are resolved and the CI is passing.
Final Review might get declined if these requirements are not fulfilled.

Step 2: Final Review

For PRs that change megatron/core, once all expert reviewers have approved, the Final Review label is applied automatically and final reviewers are assigned.

For PRs outside megatron/core, this step is skipped.

Step 3: Approved

Once all required reviewers have approved, the Approved label is applied automatically.

Merge

Any member of mcore-engineers will be able to merge your PR.

For MRs into `dev` branch The proposed review process for `dev` branch is under active discussion.

MRs are mergable after one approval by either eharper@nvidia.com or zijiey@nvidia.com.

pablo-garay and others added 30 commits December 4, 2025 15:57
Signed-off-by: Pablo Garay <pagaray@nvidia.com>
Signed-off-by: Pablo Garay <pagaray@nvidia.com>
Signed-off-by: Robin Zhang <robinz@nvidia.com>
Signed-off-by: Jianbing Dong <jianbingd@nvidia.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: Ananth Subramaniam <ansubramania@nvidia.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
Signed-off-by: Youngeun <kyeg9404@gmail.com>
Signed-off-by: Maanu Grover <maanug@nvidia.com>
Signed-off-by: ykarnati <ykarnati@nvidia.com>
Signed-off-by: Deepak Narayanan <dnarayanan@nvidia.com>
Signed-off-by: GitHub Actions <github-actions[bot]@users.noreply.github.com>
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Signed-off-by: Zhongbo Zhu <zhongboz@nvidia.com>
Signed-off-by: Xiaowei Ren <xren@nvidia.com>
Signed-off-by: Xin Yao <xiny@nvidia.com>
Signed-off-by: Keshav Santhanam <ksanthanam@nvidia.com>
Signed-off-by: Pablo Garay <pagaray@nvidia.com>
Signed-off-by: Asha Anoosheh <aanoosheh@nvidia.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Li Tao <lit@nvidia.com>
Signed-off-by: lit <lit@nvidia.com>
Signed-off-by: Santosh Bhavani <santosh.bhavani@live.com>
Signed-off-by: Robin Zhang <robinz@nvidia.com>
Signed-off-by: kunlunl <kunlunl@nvidia.com>
Co-authored-by: Jianbin Chang <shjwudp@gmail.com>
Co-authored-by: Deyu Fu <Deyu.Foo@gmail.com>
Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Co-authored-by: Yashaswi Karnati <144376261+yashaswikarnati@users.noreply.github.com>
Co-authored-by: Jared Casper <155158+jaredcasper@users.noreply.github.com>
Co-authored-by: Antoni-Joan Solergibert <asolergibert@nvidia.com>
Co-authored-by: oliver könig <okoenig@nvidia.com>
Co-authored-by: Ananth Subramaniam <ansubramania@nvidia.com>
Co-authored-by: Teodor-Dumitru Ene <34819528+tdene@users.noreply.github.com>
Co-authored-by: Siddharth Singh <136645615+sidsingh-nvidia@users.noreply.github.com>
Co-authored-by: Mcore Bot <mcore-bot@nvidia.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: Youngeun Kwon <youngeunk@nvidia.com>
Co-authored-by: Lawrence McAfee <85179052+lmcafee-nvidia@users.noreply.github.com>
Co-authored-by: Maanu Grover <109391026+maanug-nv@users.noreply.github.com>
Co-authored-by: Lawrence McAfee <lmcafee@nvidia.com>
Co-authored-by: AJ Schmidt <ajschmidt8@users.noreply.github.com>
Co-authored-by: Deepak Narayanan <2724038+deepakn94@users.noreply.github.com>
Co-authored-by: helen ngo <helenn@nvidia.com>
Co-authored-by: GitHub Actions <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com>
Co-authored-by: Robert Kirby <rkirby@nvidia.com>
Co-authored-by: Teodor-Dumitru Ene <tene@nvidia.com>
Co-authored-by: yeyu-nvidia <yeyu@nvidia.com>
Co-authored-by: Abhinav Khattar <akhattar@nvidia.com>
Co-authored-by: Roger Waleffe <rwaleffe@nvidia.com>
Co-authored-by: Charlie Truong <chtruong@nvidia.com>
Co-authored-by: Tong Liu <liutongt1998@gmail.com>
Co-authored-by: Zhongbo Zhu <42691305+zhongbozhu@users.noreply.github.com>
Co-authored-by: Xiaowei Ren <xren@nvidia.com>
Co-authored-by: Xin Yao <xiny@nvidia.com>
Co-authored-by: Teodor-Dumitru Ene <teodord.ene@gmail.com>
Co-authored-by: Zijie Yan <zijiey@nvidia.com>
Co-authored-by: root <root@pool0-01101.cm.cluster>
Co-authored-by: Keshav Santhanam <ksanthanam@nvidia.com>
Co-authored-by: Pablo Garay <pagaray@nvidia.com>
Co-authored-by: Asha Anoosheh <aanoosheh@nvidia.com>
Co-authored-by: Kan Zhu <kanz@nvidia.com>
Co-authored-by: Robert Kirby <rkirby@cw-dfw-cs-001-vscode-01.cm.cluster>
Co-authored-by: Jorge Albericio <jalbericiola@nvidia.com>
Co-authored-by: Jon Barker <19699370+jon-barker@users.noreply.github.com>
Co-authored-by: Chen Cui <chcui@nvidia.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Tong Liu <tongliu@nvidia.com>
Co-authored-by: Michael Wojcikiewicz <mwojcikiewic@nvidia.com>
Co-authored-by: Li Tao <lit@nvidia.com>
Co-authored-by: Santosh Bhavani <santosh.bhavani@live.com>
Co-authored-by: Li Ruixiao <cgruixiao@outlook.com>
Co-authored-by: Robin Zhang <robinz@nvidia.com>
Co-authored-by: Kunlun Li <94586211+kunlunl@users.noreply.github.com>
Co-authored-by: Kunlun Li <94586211+kunlunl@users.noreply.github.com>
Signed-off-by: Deyu Fu <deyuf@nvidia.com>
Signed-off-by: Hao Wu <skyw@nvidia.com>
Co-authored-by: Xin Yao <xiny@nvidia.com>
…A2A overlap (NVIDIA#2201)

Signed-off-by: Hongbin Liu <hongbinl@nvidia.com>
Signed-off-by: Pingtian Li <pingtianl@nvidia.com>
Co-authored-by: root <root@eos0318.eos.clusters.nvidia.com>
Co-authored-by: Zijie Yan <zijiey@nvidia.com>
Co-authored-by: Pingtian Li <pingtianl@nvidia.com>
Signed-off-by: Deyu Fu <deyuf@nvidia.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: Ananth Subramaniam <ansubramania@nvidia.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
Signed-off-by: Youngeun <kyeg9404@gmail.com>
Signed-off-by: Maanu Grover <maanug@nvidia.com>
Signed-off-by: ykarnati <ykarnati@nvidia.com>
Signed-off-by: Deepak Narayanan <dnarayanan@nvidia.com>
Signed-off-by: GitHub Actions <github-actions[bot]@users.noreply.github.com>
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Signed-off-by: Zhongbo Zhu <zhongboz@nvidia.com>
Signed-off-by: Xiaowei Ren <xren@nvidia.com>
Signed-off-by: Xin Yao <xiny@nvidia.com>
Signed-off-by: Keshav Santhanam <ksanthanam@nvidia.com>
Signed-off-by: Pablo Garay <pagaray@nvidia.com>
Signed-off-by: Asha Anoosheh <aanoosheh@nvidia.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Cory Ye <cye@nvidia.com>
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: Deyu Fu <deyuf@nvidia.com>
Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Co-authored-by: Yashaswi Karnati <144376261+yashaswikarnati@users.noreply.github.com>
Co-authored-by: Jared Casper <155158+jaredcasper@users.noreply.github.com>
Co-authored-by: Antoni-Joan Solergibert <asolergibert@nvidia.com>
Co-authored-by: Jianbin Chang <shjwudp@gmail.com>
Co-authored-by: oliver könig <okoenig@nvidia.com>
Co-authored-by: Ananth Subramaniam <ansubramania@nvidia.com>
Co-authored-by: Teodor-Dumitru Ene <34819528+tdene@users.noreply.github.com>
Co-authored-by: Siddharth Singh <136645615+sidsingh-nvidia@users.noreply.github.com>
Co-authored-by: Mcore Bot <mcore-bot@nvidia.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: Youngeun Kwon <youngeunk@nvidia.com>
Co-authored-by: Lawrence McAfee <85179052+lmcafee-nvidia@users.noreply.github.com>
Co-authored-by: Maanu Grover <109391026+maanug-nv@users.noreply.github.com>
Co-authored-by: Lawrence McAfee <lmcafee@nvidia.com>
Co-authored-by: AJ Schmidt <ajschmidt8@users.noreply.github.com>
Co-authored-by: Deepak Narayanan <2724038+deepakn94@users.noreply.github.com>
Co-authored-by: helen ngo <helenn@nvidia.com>
Co-authored-by: GitHub Actions <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com>
Co-authored-by: Robert Kirby <rkirby@nvidia.com>
Co-authored-by: Teodor-Dumitru Ene <tene@nvidia.com>
Co-authored-by: yeyu-nvidia <yeyu@nvidia.com>
Co-authored-by: Abhinav Khattar <akhattar@nvidia.com>
Co-authored-by: Roger Waleffe <rwaleffe@nvidia.com>
Co-authored-by: Charlie Truong <chtruong@nvidia.com>
Co-authored-by: Tong Liu <liutongt1998@gmail.com>
Co-authored-by: Zhongbo Zhu <42691305+zhongbozhu@users.noreply.github.com>
Co-authored-by: Xiaowei Ren <xren@nvidia.com>
Co-authored-by: Xin Yao <xiny@nvidia.com>
Co-authored-by: Teodor-Dumitru Ene <teodord.ene@gmail.com>
Co-authored-by: Zijie Yan <zijiey@nvidia.com>
Co-authored-by: root <root@pool0-01101.cm.cluster>
Co-authored-by: Keshav Santhanam <ksanthanam@nvidia.com>
Co-authored-by: Pablo Garay <pagaray@nvidia.com>
Co-authored-by: Asha Anoosheh <aanoosheh@nvidia.com>
Co-authored-by: Kan Zhu <kanz@nvidia.com>
Co-authored-by: Robert Kirby <rkirby@cw-dfw-cs-001-vscode-01.cm.cluster>
Co-authored-by: Jorge Albericio <jalbericiola@nvidia.com>
Co-authored-by: Jon Barker <19699370+jon-barker@users.noreply.github.com>
Co-authored-by: Chen Cui <chcui@nvidia.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Tong Liu <tongliu@nvidia.com>
Co-authored-by: Dennis(Zhenhuan) Liu <denliu@nvidia.com>
Co-authored-by: yobi byte <yobibyte@users.noreply.github.com>
Co-authored-by: Jon Barker <jbarker@cw-dfw-cs-001-vscode-01.cm.cluster>
Co-authored-by: Michael Wojcikiewicz <mwojcikiewic@nvidia.com>
Co-authored-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <shanmugamr@cw-dfw-cs-001-login-02.cm.cluster>
Co-authored-by: Shanmugam Ramasamy <shanmugamr@cw-dfw-cs-001-login-01.cm.cluster>
Co-authored-by: Cory Ye <44509866+cspades@users.noreply.github.com>
Co-authored-by: Guyue Huang <140554423+guyueh1@users.noreply.github.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
… utility (NVIDIA#2651)

Signed-off-by: Maanu Grover <maanug@nvidia.com>
Co-authored-by: Eric Harper <eharper@nvidia.com>
Signed-off-by: Robin Zhang <robinz@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Co-authored-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: Robin Zhang <robinz@nvidia.com>
…2086)

Signed-off-by: kunlunl <kunlunl@nvidia.com>
Co-authored-by: jianbinc <shjwudp@gmail.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Co-authored-by: Dennis Liu <denliu@nvidia.com>
rapatel and others added 29 commits April 17, 2026 10:59
Co-authored-by: Hongbin Liu <lhb8125@users.noreply.github.com>
Co-authored-by: Xin Yao <xiny@nvidia.com>
)

Co-authored-by: Siddhartha Raman S <sraman@login-lyris02.lyris.clusters.nvidia.com>
Co-authored-by: Xin Yao <xiny@nvidia.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Shifang Xu <shifangx@nvidia.com>
Co-authored-by: Xin Yao <xiny@nvidia.com>
…sing SMs (NVIDIA#4401)

Co-authored-by: Gao Deng <gdeng@login-lyris02.lyris.clusters.nvidia.com>
- Restore dev's pyproject.toml, uv.lock, and Dockerfile.ci.dev
- Update nvidia-resiliency-ext to main's revision (required for get_write_results_queue)
- Fix hybrid_model.py: init_chunk_handler() missing pp_rank, delta_offload_bytes_across_pp_ranks, activation_offload_fraction params
- Fix hybrid_model.py: mark_not_offloadable() -> mark_not_offload()
- Run black + isort on all changed Python files

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Restore dev's nvidia-resiliency-ext revision to keep pyproject.toml
and uv.lock consistent. The mismatch caused uv sync --locked to fail
in CI linting. The get_write_results_queue import in torch.py is a
lazy runtime import that won't be hit during linting or unit tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Previous formatting used wrong tool versions (black 24.10.0, isort 8.0.1).
Re-ran with CI-pinned versions: black==24.4.2, isort==5.13.2.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The merge removed the import of ArgumentGroupFactory from
argument_utils but it is still used extensively in the file.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…VIDIA#4430)

Co-authored-by: Gao Deng <gdeng@login-lyris01.lyris.clusters.nvidia.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Xin Yao <xiny@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: Cory Ye <cye@nvidia.com>
Signed-off-by: dimapihtar <dpykhtar@nvidia.com>
Signed-off-by: Maanu Grover <maanug@nvidia.com>
Signed-off-by: Oliver Koenig <okoenig@nvidia.com>
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Signed-off-by: Keshav Santhanam <ksanthanam@nvidia.com>
Signed-off-by: Ankur Srivastava <your_verified_email@domain.com>
Signed-off-by: meg miranda <mmiranda@nvidia.com>
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
Co-authored-by: Dennis(Zhenhuan) Liu <denliu@nvidia.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Philip Petrakian <ppetrakian@nvidia.com>
Co-authored-by: Xin Yao <xiny@nvidia.com>
Co-authored-by: oliver könig <okoenig@nvidia.com>
Co-authored-by: Cory Ye <44509866+cspades@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: Maanu Grover <109391026+maanug-nv@users.noreply.github.com>
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
Co-authored-by: Li Tao <lit@nvidia.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: janEbert <janpabloe@nvidia.com>
Co-authored-by: Shifang Xu <shifangx@nvidia.com>
Co-authored-by: Zhiyu Li <zhiyul@NVIDIA.com>
Co-authored-by: melon <49278241+ezioliao@users.noreply.github.com>
Co-authored-by: liaoyang <yliao@siflow.cn>
Co-authored-by: Eric Harper <eharper@nvidia.com>
Co-authored-by: Deepak Narayanan <deepakn94@gmail.com>
Co-authored-by: Teodor-Dumitru Ene <34819528+tdene@users.noreply.github.com>
Co-authored-by: Yuzhong Wang <yuzhongw@nvidia.com>
Co-authored-by: Dhinesh Ponnarasan <160256912+DhineshPonnarasan@users.noreply.github.com>
Co-authored-by: Seonmyeong Bak <sbak@nvidia.com>
Co-authored-by: Charlie Truong <chtruong@nvidia.com>
Co-authored-by: Kunlun Li <94586211+kunlunl@users.noreply.github.com>
Co-authored-by: Kunlun Li <kunlunl@cw-dfw-cs-001-login-02.cm.cluster>
Co-authored-by: wdykas <73254672+wdykas@users.noreply.github.com>
Co-authored-by: William Dykas <wdykas@oci-hsg-cs-001-vscode-03.cm.cluster>
Co-authored-by: root <root@nvl72065-T16.cm.cluster>
Co-authored-by: root <root@nvl72163-T17.cm.cluster>
Co-authored-by: Deepak Narayanan <dnarayanan@nvidia.com>
Co-authored-by: Haoran Zhang <github@snowchord.com>
Co-authored-by: Keshav Santhanam <ksanthanam@nvidia.com>
Co-authored-by: Ankur Srivastava <101727556+awsankur@users.noreply.github.com>
Co-authored-by: Ankur Srivastava <your_verified_email@domain.com>
Co-authored-by: Antoni-Joan Solergibert <asolergibert@nvidia.com>
Co-authored-by: Jon Barker <jbarker@nvidia.com>
Co-authored-by: megnvidia <mmiranda@nvidia.com>
Co-authored-by: Mikail Khona (NVIDIA) <mkhona@nvidia.com>
Co-authored-by: Jingyue Wu <wujingyue@gmail.com>
Co-authored-by: Jorge Albericio <jalbericiola@nvidia.com>
Co-authored-by: Santosh Bhavani <santosh.bhavani@live.com>
Co-authored-by: Youngeun Kwon <youngeunk@nvidia.com>
Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
…#4473)

Co-authored-by: Yu Huang <yuhuang@nvidia.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
- PyTorch fallback for fast_hadamard_transform (unavailable on aarch64/B200)
- Cast mask dtype instead of assert in fused_qk_topk_naive (bf16/fp32 mismatch)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: weijiac <weijiac@nvidia.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 13, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.