Skip to content

Trouble installing FlashAttention on HPC cluster (GCC/GLIBC version on HPC, --no-build-isolation issue) #43

@wonjininfo

Description

@wonjininfo

Hi all,

Note: This is not a fully reproducible issue report. This is more of a record to share my installation experience with flash-attention, in case others run into similar issues, or someone happens to know easy fix.

Issue:

For the past few weeks, I have been running into installation failures when trying to build flash-attention with uv.
I tried several approaches, including building some libs from scratch and locally installing GCC 9 (some HPC nodes now have gcc9), and one setting worked (see What I have tried section), and I have been using it although this is just a half-working solution.

I think there are several issues involved in this problem.

One major factor is that we don’t have sudo access on the HPC, which limits what we can do. In addition, the newer versions of flash-attention appear to primarily target Ubuntu 22.04: Dao-AILab/flash-attention#1708

@dogeeelin Currently, the released wheel is built upon ubuntu 22.04, which has higher version of glibc. Before compiling env solved, you can build the wheels from sources or try to use another released wheel. If used torch>=2.7, we have to compile it by ourselves.

Another complication is that uv seems to have issues with --no-build-isolation:

I tried a few suggestions from the comments but couldn't get them to work. As a workaround, I activate the uv environment and install the needed libraries from there. It seems like the UV community is working on this problem, so I hope it will be resolved sooner or later.

Disclaimer

My understanding of Python dependencies, uv, virtual environments, and flash-attention is quite limited.

What I have tried

I was able to install a legacy version of flash attention that matches with GLIBC in our HPC nodes, but this setting does not work for some of the newer LLMs: for lengthy inputs, LLMs could not use flash attention and takes excessive time.
For reference these libs work on our centos HPC:

Python 3.10.14
torch '2.5.1+cu124'
transformers '4.51.3'
flash_attn '2.2.5'

Current:

I am trying to compile flash-attention 2.8.3 from source.

Update:

I successfully compiled FlashAttention 2.8.3 from source.
(FlashAttention is required for certain use cases that involve processing long input sequences.)

To compile on E3, follow these steps:

  1. Request a node with CPU > 64 cores and a GPU.
  2. activate uv env
  3. Set the GCC version to 9 or higher:
source /opt/rh/gcc-toolset-9/enable
  1. Also check nvcc version (I tried 12.6 and worked)
  2. Clone flash-attention repo and set tag as intended version.
  3. From the flash-attention folder: python setup.py install

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions