-
Notifications
You must be signed in to change notification settings - Fork 730
[PyTorch] Add pad_between_seqs support for non-CP and CP (A2A and P2P) with FA3 + THD (varlen)
#2596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
sudhakarsingh27
merged 32 commits into
NVIDIA:main
from
sudhakarsingh27:flash_attn_pad_bw_seqs
May 23, 2026
Merged
[PyTorch] Add pad_between_seqs support for non-CP and CP (A2A and P2P) with FA3 + THD (varlen)
#2596
Changes from all commits
Commits
Show all changes
32 commits
Select commit
Hold shift + click to select a range
10e4cfc
[PyTorch] Add pad_between_seqs support for FlashAttention 3 with CP
sudhakarsingh27 2a49dee
[PyTorch] Add pad_between_seqs tests for CP and non-CP FlashAttention
sudhakarsingh27 34e3d62
[QA] Add CP deterministic tests to L3 and support TE_PATH in FA test
sudhakarsingh27 4745f98
[PyTorch] Fix FA3 deterministic gate to match upstream backward const…
sudhakarsingh27 4be004f
[PyTorch] Disable FlashAttention 4 for pad_between_seqs with THD
sudhakarsingh27 c476f15
Merge branch 'main' of github.com:NVIDIA/TransformerEngine into flash…
sudhakarsingh27 a2b0f1b
[QA] Fix cutlass-dsl utils shadow in FA versions test
sudhakarsingh27 b94e175
Merge branch 'main' of github.com:NVIDIA/TransformerEngine into flash…
sudhakarsingh27 fc9182f
skip tests which OOM in deterministic+backward+hopper+large_configs a…
sudhakarsingh27 636666f
Merge branch 'main' of github.com:NVIDIA/TransformerEngine into flash…
sudhakarsingh27 7928bc9
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 1585ebb
Merge branch 'flash_attn_pad_bw_seqs' of github.com:sudhakarsingh27/T…
sudhakarsingh27 2464f43
make cp det and nondet tests run in parallel whenever possible
sudhakarsingh27 789ccf0
Merge branch 'main' into flash_attn_pad_bw_seqs
sudhakarsingh27 0a32185
Merge branch 'main' of github.com:NVIDIA/TransformerEngine into flash…
sudhakarsingh27 c33cf2d
Merge branch 'flash_attn_pad_bw_seqs' of github.com:sudhakarsingh27/T…
sudhakarsingh27 13ba004
[QA] L3: gate CP tests per-arch to avoid CI timeout
sudhakarsingh27 e41bb96
[QA] L3: skip pre-installed FA3 build, per-FA junit XMLs
sudhakarsingh27 7b8ca1e
b200 shouldnt run FA3 even if present
sudhakarsingh27 e02b658
Merge branch 'main' of github.com:NVIDIA/TransformerEngine into flash…
sudhakarsingh27 9389309
Merge branch 'main' of github.com:NVIDIA/TransformerEngine into flash…
sudhakarsingh27 77941e0
Merge branch 'main' of github.com:NVIDIA/TransformerEngine into flash…
sudhakarsingh27 d8e8ba4
Merge branch 'main' of https://github.com/NVIDIA/TransformerEngine in…
sudhakarsingh27 8794aa8
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 908ca2b
Merge branch 'main' into flash_attn_pad_bw_seqs
sudhakarsingh27 c4b6e07
L3: drop stale RUN_L3_TESTS=1 note; use flash_attn_3 for FA3 check
sudhakarsingh27 d3bd4e4
Address review nits: bHSS-gated OOM skip; drop Dockerfile.base specifics
sudhakarsingh27 0638d58
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 3b1e4ce
Merge branch 'main' into flash_attn_pad_bw_seqs
sudhakarsingh27 1563b10
Name the OOM-skip threshold and explain the 128*bHSS workspace observ…
sudhakarsingh27 a27e301
Reword OOM-skip comment as observations, not cuDNN-internal claims
sudhakarsingh27 2b05809
Merge branch 'main' into flash_attn_pad_bw_seqs
sudhakarsingh27 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.