Commit 72dd6a6
committed
Add split-K decode SDPA dispatch to Qwen3.5 MoE attention
Dual-method export (decode T=1, prefill T>=2) lets the model use a
simple if/else on T instead of torch.cond, eliminating the GPU-to-CPU
sync overhead that torch.cond's predicate evaluation requires.
Decode calls sdpa_decode_splitk (split-K flash-decoding for high KV
occupancy), prefill calls tiled sdpa. Guard sdpa_decode_splitk
validation behind isinstance(L_q, int) so AOTI tracing with symbolic
shapes doesn't trip the L_q==1 check.
Align sdpa_decode_splitk signature with sdpa (dropout_p, is_causal,
enable_gqa) for consistent API; unsupported args fail with clear
messages.1 parent 151692c commit 72dd6a6
2 files changed
Lines changed: 60 additions & 17 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1390 | 1390 | | |
1391 | 1391 | | |
1392 | 1392 | | |
| 1393 | + | |
| 1394 | + | |
1393 | 1395 | | |
| 1396 | + | |
1394 | 1397 | | |
| 1398 | + | |
| 1399 | + | |
| 1400 | + | |
| 1401 | + | |
| 1402 | + | |
| 1403 | + | |
| 1404 | + | |
| 1405 | + | |
1395 | 1406 | | |
1396 | 1407 | | |
1397 | 1408 | | |
1398 | | - | |
1399 | | - | |
1400 | | - | |
1401 | | - | |
1402 | | - | |
1403 | | - | |
1404 | | - | |
1405 | | - | |
1406 | | - | |
| 1409 | + | |
| 1410 | + | |
| 1411 | + | |
1407 | 1412 | | |
1408 | | - | |
| 1413 | + | |
| 1414 | + | |
1409 | 1415 | | |
1410 | 1416 | | |
| 1417 | + | |
| 1418 | + | |
| 1419 | + | |
| 1420 | + | |
| 1421 | + | |
| 1422 | + | |
| 1423 | + | |
| 1424 | + | |
| 1425 | + | |
| 1426 | + | |
| 1427 | + | |
| 1428 | + | |
| 1429 | + | |
| 1430 | + | |
| 1431 | + | |
| 1432 | + | |
| 1433 | + | |
1411 | 1434 | | |
1412 | | - | |
1413 | 1435 | | |
1414 | 1436 | | |
1415 | 1437 | | |
1416 | 1438 | | |
1417 | 1439 | | |
1418 | 1440 | | |
1419 | | - | |
1420 | | - | |
1421 | | - | |
| 1441 | + | |
| 1442 | + | |
| 1443 | + | |
| 1444 | + | |
| 1445 | + | |
| 1446 | + | |
| 1447 | + | |
| 1448 | + | |
| 1449 | + | |
| 1450 | + | |
| 1451 | + | |
| 1452 | + | |
| 1453 | + | |
| 1454 | + | |
| 1455 | + | |
1422 | 1456 | | |
1423 | 1457 | | |
1424 | 1458 | | |
| |||
1430 | 1464 | | |
1431 | 1465 | | |
1432 | 1466 | | |
| 1467 | + | |
| 1468 | + | |
1433 | 1469 | | |
| 1470 | + | |
1434 | 1471 | | |
1435 | 1472 | | |
1436 | 1473 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
| 23 | + | |
| 24 | + | |
23 | 25 | | |
24 | 26 | | |
25 | 27 | | |
| |||
285 | 287 | | |
286 | 288 | | |
287 | 289 | | |
288 | | - | |
289 | | - | |
290 | | - | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
291 | 297 | | |
292 | 298 | | |
293 | 299 | | |
| |||
0 commit comments