hydra: add bounded-residency decode attention by newjordan · Pull Request #873 · huggingface/kernels-community

newjordan · 2026-05-18T20:27:57Z

Summary

This PR adds hydra, an experimental bounded-residency decode attention kernel
for long-context inference.

Hydra keeps a fixed resident attention set during decode: sink tokens, recent
tokens, and selected older KV pages. The goal is narrow: improve fit/usability
for specific long-context decode workloads while keeping clear evidence
boundaries and avoiding universal speedup or production-readiness claims.

Included

hydra/build.toml, flake.nix, README.md, and CARD.md
Triton/Python source under hydra/torch-ext/hydra/
import, CSR, and CUDA decode/parity tests under hydra/tests/
isolated decode benchmark under hydra/benchmarks/benchmark_hydra_decode.py
readme_example.py for source-packet validation before publication and Hub
loading after publication

Validation

Final source package tarball used for validation:

sha256: bff743b66ad67bd4c7bdd8ae190dc7672335b3a6c422af307a77a47c0942a57e

Builder gate on Vast RTX A6000, driver 570.133.20, CUDA 12.8 path:

RUN_BUILDER=1 BUILDER_VARIANT=torch210-cxx11-cu128-x86_64-linux BENCH_ITERS=20 BENCH_WARMUP=5 BENCH_TOKENS=8192 scripts/validate_hf_setup_package.sh

Result:

local package pytest: 6 passed
isolated decode smoke: 0.2166 ms/iter
kernel-builder pytest: 4 passed, 2 skipped
exit status: 0

Additional package-smoke/HF benchmark matrix was run across RTX 3060, RTX 3070,
RTX 3080, RTX 3090, RTX 4070 Ti, RTX 4090, A100 SXM4, RTX A6000, and RTX PRO
6000 Blackwell variants. These rows are used as hardware/runtime evidence only,
not as universal performance claims.

Non-claims

This PR does not claim universal speedups, production readiness, broad
model-quality preservation, or generic support across every model/GPU.
Exact-model Qwen FP8 rows are treated as proof-of-concept evidence only.

github-actions · 2026-05-18T20:28:13Z

Hi @newjordan, thanks for your interest in contributing!

This project requires that pull request authors are vouched, and you are not in the list of vouched users.

This PR will be closed automatically. See https://github.com/huggingface/kernels-community/blob/main/CONTRIBUTING.md for more details.

newjordan · 2026-05-18T20:40:58Z

Closing this upstream PR per maintainer guidance to publish Hydra as a community kernel under our own namespace. Public source: https://github.com/newjordan/hydra/tree/main/hf-kernels/hydra ; Hub repo: https://huggingface.co/Frosty40/hydra

danieldk · 2026-05-19T08:34:37Z

Closing this upstream PR per maintainer guidance to publish Hydra as a community kernel under our own namespace. Public source: https://github.com/newjordan/hydra/tree/main/hf-kernels/hydra ; Hub repo: https://huggingface.co/Frosty40/hydra

Nice! You may want to apply for being able to generate kernel-type repositories (since that's required for kernels 0.14 and later). You can do so through Settings (of the user or org) -> Account, there will be a section to do this:

hydra: add bounded-residency decode attention

ecadda2

newjordan requested review from danieldk and drbh as code owners May 18, 2026 20:27

github-actions Bot closed this May 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hydra: add bounded-residency decode attention#873

hydra: add bounded-residency decode attention#873
newjordan wants to merge 1 commit into
huggingface:mainfrom
newjordan:hydra/add-bounded-residency-decode-attention

newjordan commented May 18, 2026

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

newjordan commented May 18, 2026

Uh oh!

danieldk commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

newjordan commented May 18, 2026

Summary

Included

Validation

Non-claims

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

newjordan commented May 18, 2026

Uh oh!

danieldk commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants