Commit ff207ea
committed
Add split-K decode SDPA dispatch to Qwen3.5 MoE attention
Dual-method export (decode T=1, prefill T>=2) lets the model use a
simple if/else on T instead of torch.cond, eliminating the GPU-to-CPU
sync overhead that torch.cond's predicate evaluation requires.
Decode calls sdpa_decode_splitk (split-K flash-decoding for high KV
occupancy), prefill calls tiled sdpa. Guard sdpa_decode_splitk
validation behind isinstance(L_q, int) so AOTI tracing with symbolic
shapes doesn't trip the L_q==1 check.
Align sdpa_decode_splitk signature with sdpa (dropout_p, is_causal,
enable_gqa) for consistent API; unsupported args fail with clear
messages.
This PR was authored with the assistance of Claude1 parent f2bcffd commit ff207ea
2 files changed
Lines changed: 61 additions & 19 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1390 | 1390 | | |
1391 | 1391 | | |
1392 | 1392 | | |
| 1393 | + | |
| 1394 | + | |
1393 | 1395 | | |
| 1396 | + | |
1394 | 1397 | | |
| 1398 | + | |
| 1399 | + | |
| 1400 | + | |
| 1401 | + | |
| 1402 | + | |
| 1403 | + | |
| 1404 | + | |
| 1405 | + | |
1395 | 1406 | | |
1396 | 1407 | | |
1397 | 1408 | | |
| 1409 | + | |
| 1410 | + | |
1398 | 1411 | | |
1399 | 1412 | | |
1400 | 1413 | | |
1401 | 1414 | | |
1402 | | - | |
1403 | | - | |
1404 | | - | |
1405 | | - | |
1406 | | - | |
1407 | | - | |
1408 | | - | |
1409 | | - | |
1410 | | - | |
1411 | | - | |
1412 | | - | |
1413 | | - | |
| 1415 | + | |
| 1416 | + | |
| 1417 | + | |
| 1418 | + | |
| 1419 | + | |
| 1420 | + | |
| 1421 | + | |
| 1422 | + | |
| 1423 | + | |
| 1424 | + | |
| 1425 | + | |
| 1426 | + | |
| 1427 | + | |
| 1428 | + | |
| 1429 | + | |
| 1430 | + | |
1414 | 1431 | | |
1415 | 1432 | | |
1416 | | - | |
1417 | 1433 | | |
1418 | 1434 | | |
1419 | 1435 | | |
1420 | 1436 | | |
1421 | 1437 | | |
1422 | 1438 | | |
1423 | | - | |
1424 | | - | |
1425 | | - | |
| 1439 | + | |
| 1440 | + | |
| 1441 | + | |
| 1442 | + | |
| 1443 | + | |
| 1444 | + | |
| 1445 | + | |
| 1446 | + | |
| 1447 | + | |
| 1448 | + | |
| 1449 | + | |
| 1450 | + | |
| 1451 | + | |
| 1452 | + | |
| 1453 | + | |
1426 | 1454 | | |
1427 | 1455 | | |
1428 | 1456 | | |
| |||
1434 | 1462 | | |
1435 | 1463 | | |
1436 | 1464 | | |
| 1465 | + | |
| 1466 | + | |
1437 | 1467 | | |
| 1468 | + | |
1438 | 1469 | | |
1439 | 1470 | | |
1440 | 1471 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
| 23 | + | |
23 | 24 | | |
24 | 25 | | |
25 | 26 | | |
| |||
285 | 286 | | |
286 | 287 | | |
287 | 288 | | |
288 | | - | |
289 | | - | |
290 | | - | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
291 | 302 | | |
292 | 303 | | |
293 | 304 | | |
| |||
0 commit comments