Commit e0930b3
fix(qwen3.5_fp4_b300): use --mm-attention-backend triton_attn
Same workaround as #1422 (bf16) and #1451 (fp8) — bypass the broken
flash-attn cute kernel sm_103 assertion in the Qwen-3.5-VL vision
encoder by switching only the multi-modal attention path to triton_attn.
Text decoder still uses --attention-backend trtllm_mha.
See sgl-project/sglang#25564 (root cause: cutedsl Arch enum aliasing on
non-cu13 path collapses sm_100..sm_110f range to exclude sm_103) and
Dao-AILab/flash-attention#2572 for the upstream fix in flight.1 parent f6a1048 commit e0930b3
2 files changed
Lines changed: 2 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
73 | 73 | | |
74 | 74 | | |
75 | 75 | | |
76 | | - | |
| 76 | + | |
77 | 77 | | |
78 | 78 | | |
79 | 79 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
73 | 73 | | |
74 | 74 | | |
75 | 75 | | |
76 | | - | |
| 76 | + | |
77 | 77 | | |
78 | 78 | | |
79 | 79 | | |
| |||
0 commit comments