File tree Expand file tree Collapse file tree
docs/source/en/optimization Expand file tree Collapse file tree Original file line number Diff line number Diff line change @@ -141,10 +141,12 @@ Refer to the table below for a complete list of available attention backends and
141141| ` flash ` | [ FlashAttention] ( https://github.com/Dao-AILab/flash-attention ) | FlashAttention-2 |
142142| ` flash_hub ` | [ FlashAttention] ( https://github.com/Dao-AILab/flash-attention ) | FlashAttention-2 from kernels |
143143| ` flash_varlen ` | [ FlashAttention] ( https://github.com/Dao-AILab/flash-attention ) | Variable length FlashAttention |
144+ | ` flash_varlen_hub ` | [ FlashAttention] ( https://github.com/Dao-AILab/flash-attention ) | Variable length FlashAttention from kernels |
144145| ` aiter ` | [ AI Tensor Engine for ROCm] ( https://github.com/ROCm/aiter ) | FlashAttention for AMD ROCm |
145146| ` _flash_3 ` | [ FlashAttention] ( https://github.com/Dao-AILab/flash-attention ) | FlashAttention-3 |
146147| ` _flash_varlen_3 ` | [ FlashAttention] ( https://github.com/Dao-AILab/flash-attention ) | Variable length FlashAttention-3 |
147148| ` _flash_3_hub ` | [ FlashAttention] ( https://github.com/Dao-AILab/flash-attention ) | FlashAttention-3 from kernels |
149+ | ` _flash_3_varlen_hub ` | [ FlashAttention] ( https://github.com/Dao-AILab/flash-attention ) | Variable length FlashAttention-3 from kernels |
148150| ` sage ` | [ SageAttention] ( https://github.com/thu-ml/SageAttention ) | Quantized attention (INT8 QK) |
149151| ` sage_hub ` | [ SageAttention] ( https://github.com/thu-ml/SageAttention ) | Quantized attention (INT8 QK) from kernels |
150152| ` sage_varlen ` | [ SageAttention] ( https://github.com/thu-ml/SageAttention ) | Variable length SageAttention |
You can’t perform that action at this time.
0 commit comments