You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PR #359: Ring attention integration and other optimizations
Imported from GitHub PR #359
In this PR, we integrated tokamax ring attention kernels for WAN models. Below are the main changes made:
1. Added ring attention kernel and splash attention kernel under . Here is the doc for the modification we made: [Ring Attention Kernel Precision Issue](https://docs.google.com/document/d/11FPxDoT0PfdnEAGPko-6V5oblzWmwyCJUKMMqCq04e4). Modified to support
2. JITted VAE and sharded VAE: added new config (default to 1) to let users decide how to shard VAE.
3. Xprof: modified profiler code to actually use (for example ) instead of profiling the entire generation
Fix BUILD file by adding missing :kernels target to fix ModuleNotFoundError.
Copybara import of the project:
--
616bf63 by Elisa Tsai <elisatsai@google.com>:
Feat: Ring attention kernel and VAE optimization
Merging this change closes#359
PiperOrigin-RevId: 902809224
0 commit comments