Skip to content

cp: FSDP2 w weight prefetching and async TP optimization (#1711)#1779

Merged
akoumpa merged 1 commit intor0.4.0from
cherry-pick-1711-r0.4.0
Apr 11, 2026
Merged

cp: FSDP2 w weight prefetching and async TP optimization (#1711)#1779
akoumpa merged 1 commit intor0.4.0from
cherry-pick-1711-r0.4.0

Conversation

@ZhiyuLi-Nvidia
Copy link
Copy Markdown
Contributor

@ZhiyuLi-Nvidia ZhiyuLi-Nvidia commented Apr 10, 2026

Manually cherry-pick #1711 to r0.4.0

Conflicts break auto cherry-pick caused given missing

* feat: FSDP2 w weight prefetching and async TP optimization

Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>

* remove deferred rs feature

Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>

* add datapoints

Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>

* lint

Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>

* fix unit tests

Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>

* address claude review

Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>

* remove invalid tests and better readbility

Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>

* skip unused fsdp flag

Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>

* Apply suggestions from code review

Co-authored-by: jgerh <163925524+jgerh@users.noreply.github.com>
Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>

* refactor: use nn.Module.compile() and consolidate compile paths in infrastructure

Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>

* refactor: remove fsdp_layer_group_size flag

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>

* derive pp_enabled

Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>

* lint

Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>

* update cp and fix

Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>

* lint

Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>

* update perf

Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>

* update

Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>

* update perf

Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>

* fix test

Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>

---------

Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>
Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Co-authored-by: jgerh <163925524+jgerh@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 10, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@ZhiyuLi-Nvidia
Copy link
Copy Markdown
Contributor Author

/ok to test 80fe774

@akoumpa akoumpa changed the title feat: FSDP2 w weight prefetching and async TP optimization (#1711) cp: FSDP2 w weight prefetching and async TP optimization (#1711) Apr 10, 2026
@akoumpa akoumpa merged commit 60172f3 into r0.4.0 Apr 11, 2026
57 checks passed
@akoumpa akoumpa deleted the cherry-pick-1711-r0.4.0 branch April 11, 2026 00:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants