Skip to content

feat/rfc/poc: Agnostic GPU#45235

Draft
kallewoof wants to merge 4 commits intohuggingface:mainfrom
kallewoof:202604-agnostic-gpu
Draft

feat/rfc/poc: Agnostic GPU#45235
kallewoof wants to merge 4 commits intohuggingface:mainfrom
kallewoof:202604-agnostic-gpu

Conversation

@kallewoof
Copy link
Copy Markdown
Contributor

@kallewoof kallewoof commented Apr 4, 2026

What does this PR do?

This PR adds a tiny "agnostic.gpu" utility that is meant to allow easy replacing of unnecessarily hard-coded vendor-specific code.

The code does not use torch.accelerator as it is still considered experimental, but feedback on that opinion is welcome.

The example provided patches the flash linear attention Fused RMS norm gate patch to not require CUDA. The upstream library claims to be platform agnostic, so in theory, this should work on other platforms as well. This is a POC/RFC only, and I haven't tested whether this works in practice, but that seems to be slightly irrelevant to what this PR is meant to provide (if FusedRMSNormGated is not, in fact, supported on e.g. XPU devices, that is an upstream issue, which does not affect the validity of this patch).

If something that does this is already present and I simply haven't found it, then I'd love to know about it. If there are other reasons that I'm not seeing as for why we need to insist on hard coded cuda checks for a seemingly platform agnostic feature, I'd love to hear that as well.

Code Agent Policy

The code was 90% human-written*. The idea was solidified/sanity-checked through chats with DeepSeek.

(* exception being the "is available" part, which was derived from a snippet DeepSeek threw at me while chatting:

    return (torch.cuda.is_available() or
            (hasattr(torch, 'xpu') and torch.xpu.is_available()) or
            (hasattr(torch.backends, 'mps') and torch.backends.mps.is_available()))

)

  • I confirm that this is not a pure code agent PR.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

These seem relevant:

@kallewoof kallewoof force-pushed the 202604-agnostic-gpu branch 2 times, most recently from 5c63621 to 391209f Compare April 4, 2026 08:16
This adds a small wrapper around common GPU calls that aims to break out of redundant vendor specific calls.
@kallewoof kallewoof force-pushed the 202604-agnostic-gpu branch from fe59e1c to 1f4860e Compare April 4, 2026 09:35
@kallewoof kallewoof changed the title Agnostic GPU feat/rfc/poc: Agnostic GPU Apr 4, 2026
@kallewoof kallewoof force-pushed the 202604-agnostic-gpu branch from 2a2c7ce to 123b776 Compare April 4, 2026 11:05
@kallewoof kallewoof force-pushed the 202604-agnostic-gpu branch from 3192fa2 to 0ff77ab Compare April 4, 2026 13:15
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 4, 2026

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=45235&sha=0ff77a

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 4, 2026

[For maintainers] Suggested jobs to run (before merge)

run-slow: olmo_hybrid, qwen3_5, qwen3_5_moe, qwen3_next

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant