Commit e1d3a18
fix(qwen3.5_fp8_b300): use --mm-attention-backend triton_attn
Same workaround as PR #1422 — bypass the broken flash-attn cute kernel
sm_103 assertion in the Qwen-3.5-VL vision encoder by switching only
the multi-modal attention path to triton_attn. Text decoder still uses
--attention-backend trtllm_mha.
See sgl-project/sglang#25564 + Dao-AILab/flash-attention#2572 for the
upstream root cause and the in-flight fix.1 parent 2315338 commit e1d3a18
2 files changed
Lines changed: 2 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
40 | 40 | | |
41 | 41 | | |
42 | 42 | | |
| 43 | + | |
43 | 44 | | |
44 | 45 | | |
45 | 46 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
40 | 40 | | |
41 | 41 | | |
42 | 42 | | |
| 43 | + | |
43 | 44 | | |
44 | 45 | | |
45 | 46 | | |
| |||
0 commit comments