flash-attn2: prefer using add_op_namespace_prefix#877
Conversation
|
note this PR has been updated to remove the unused ops folder that included non flash attn code that was unused. It also cleans up the exposed functions to removed the unused low levels functions in init. the kernel now only exposes the core top level functions (listed in all) and passes the |
sayakpaul
left a comment
There was a problem hiding this comment.
Any reason why the windows build would fail?
not 100% sure at the moment but seems to be related to the xpu path on windows. In general the windows build workflow may need some tweaks since it has some custom logic that diverges from the standard kernel-builder nix path. gonna take a look and see if there is a small change to resolve - otherwise fixing the workflow may be best to tackle in another PR |
|
Works for me. |
|
added a small PR to skip the windows xpu backend for flash attn2 since there seems to be a bug related to the cutlass fork, its possible that merging that PR and rebasing this PR on top will avoid the xpu windows build and enable the windows cuda build to succeed.. #885 |
| ) | ||
|
|
||
|
|
||
| def fwd( |
There was a problem hiding this comment.
This looks like an API break, we need to bump up the version if we remove these. Is removal necessary?
This PR fixes flash-attn2 to correctly register fake ops