Commit 18c802a
committed
fix flash_attn_supported override for large head_dim configs
The override in test_dot_product_attention unconditionally forced
flash_attn_supported=True for pad_between_seqs=False configs, including
base_5_*/base_6_* (head_dim 512/1024) where both FA2 and FA3 reject
head_dim > 256. This caused 960 "no backend available" failures across
A100, H100, and L40 in pipeline 48086204.
Restrict the override to pad_between_seqs=True only, which is the case
where FA3 supports the feature via seqused_q/seqused_k but the backend
checker doesn't know about it yet. For pad_between_seqs=False, trust
get_available_attention_backends() as-is.
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>1 parent 5fceae2 commit 18c802a
1 file changed
Lines changed: 11 additions & 17 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
197 | 197 | | |
198 | 198 | | |
199 | 199 | | |
200 | | - | |
201 | | - | |
202 | | - | |
203 | | - | |
204 | | - | |
205 | | - | |
206 | 200 | | |
207 | | - | |
208 | | - | |
209 | | - | |
210 | | - | |
211 | | - | |
212 | | - | |
213 | | - | |
214 | | - | |
215 | | - | |
216 | | - | |
217 | | - | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
218 | 212 | | |
219 | 213 | | |
220 | 214 | | |
| |||
0 commit comments