You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Mar 21, 2026. It is now read-only.
i think this issue related to flash attention v2 : Dao-AILab/flash-attention#1311 . newer flash attention version 3 was released(BETA) for H100 gpu. i think this is not a good practice for a big project like tgi to depends on selected version , 8 month ago you update the flash atten version to 2.6.1 . now we have 2.8.3 . i check your Dockerfile ,we have not an easy way to update the project .
System Info
tgi version : 3.3.4
gemma3 : 27B
gpu : h100
Information
Tasks
Reproduction
i think this issue related to flash attention v2 : Dao-AILab/flash-attention#1311 . newer flash attention version 3 was released(BETA) for H100 gpu. i think this is not a good practice for a big project like tgi to depends on selected version , 8 month ago you update the flash atten version to 2.6.1 . now we have 2.8.3 . i check your Dockerfile ,we have not an easy way to update the project .
Expected behavior
update flash attention