Now that the limited libtorch stable ABI (https://docs.pytorch.org/docs/stable/notes/libtorch_stable_abi.html) has matured some more, it may be ripe time to use it to benefit the hf kernels community! What do I mean? By migrating a kernel to use the libtorch ABI-stable APIs, you can now build one wheel that can run across multiple PyTorch versions. Instead of building 3 * N wheels to satisfy across torch versions 2.9, 2.10, and 2.11, you can build N wheels instead, thirding the total build time to maintain a kernel.
For a list of available APIs + which torch version they've been available, see https://docs.pytorch.org/cppdocs/stable.html. From my understanding it looks like kernels-community supports the latest 3 torch versions, meaning 2.9+ for now. Some kernels may not be able to target 2.9 and may have to wait for 2.12 is released to target 2.10, but I know for a fact that FA3 can be built stably 2.9+! Meaning from torch 2.9+, you'd not need to recompile for any new torch version anymore, making maintaining https://huggingface.co/kernels-community/flash-attn3/tree/main/build much chiller.
This issue is to gauge interest and see what we can do already. I propose the following:
A. Start with low hanging fruit like FA3
B. Identify any other low hanging fruit kernels (with Claude/LLM) and which kernels simply have to wait a release.
C. Identify if there's anything our APIs are missing + flag to PyTorch/me (also Claudable, roughly).
D. Go one by one and simplify the build story + save build resources!
Thoughts? Can we do!
cc @sayakpaul @mikaylagawarecki @albanD
Now that the limited libtorch stable ABI (https://docs.pytorch.org/docs/stable/notes/libtorch_stable_abi.html) has matured some more, it may be ripe time to use it to benefit the hf kernels community! What do I mean? By migrating a kernel to use the libtorch ABI-stable APIs, you can now build one wheel that can run across multiple PyTorch versions. Instead of building 3 * N wheels to satisfy across torch versions 2.9, 2.10, and 2.11, you can build N wheels instead, thirding the total build time to maintain a kernel.
For a list of available APIs + which torch version they've been available, see https://docs.pytorch.org/cppdocs/stable.html. From my understanding it looks like kernels-community supports the latest 3 torch versions, meaning 2.9+ for now. Some kernels may not be able to target 2.9 and may have to wait for 2.12 is released to target 2.10, but I know for a fact that FA3 can be built stably 2.9+! Meaning from torch 2.9+, you'd not need to recompile for any new torch version anymore, making maintaining https://huggingface.co/kernels-community/flash-attn3/tree/main/build much chiller.
This issue is to gauge interest and see what we can do already. I propose the following:
A. Start with low hanging fruit like FA3
B. Identify any other low hanging fruit kernels (with Claude/LLM) and which kernels simply have to wait a release.
C. Identify if there's anything our APIs are missing + flag to PyTorch/me (also Claudable, roughly).
D. Go one by one and simplify the build story + save build resources!
Thoughts? Can we do!
cc @sayakpaul @mikaylagawarecki @albanD