Skip to content
This repository was archived by the owner on Mar 21, 2026. It is now read-only.
This repository was archived by the owner on Mar 21, 2026. It is now read-only.

Gemma3: CUDA error: an illegal memory access was encountered. #3321

@Behnamhb

Description

@Behnamhb

System Info

tgi version : 3.3.4
gemma3 : 27B
gpu : h100

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

i think this issue related to flash attention v2 : Dao-AILab/flash-attention#1311 . newer flash attention version 3 was released(BETA) for H100 gpu. i think this is not a good practice for a big project like tgi to depends on selected version , 8 month ago you update the flash atten version to 2.6.1 . now we have 2.8.3 . i check your Dockerfile ,we have not an easy way to update the project .

Expected behavior

update flash attention

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions