Skip to content

feat: get_loaded_kernels()#428

Merged
danieldk merged 13 commits intohuggingface:mainfrom
cbensimon:feat/get-loaded-kernels
Apr 14, 2026
Merged

feat: get_loaded_kernels()#428
danieldk merged 13 commits intohuggingface:mainfrom
cbensimon:feat/get-loaded-kernels

Conversation

@cbensimon
Copy link
Copy Markdown
Contributor

@cbensimon cbensimon commented Apr 7, 2026

New get_loaded_kernels() API

Enables:

  • get_kernel (internal) caching based on repo infos (done in this PR)
  • kernels packaging (PyTorch ahead-of-time compilation context)

Example "kernels-included package" -> https://huggingface.co/cbensimon/FLUX.2-klein-4B-sm90-cu128-glibc235-r52/tree/main/package

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@cbensimon cbensimon marked this pull request as draft April 7, 2026 17:16
@cbensimon cbensimon marked this pull request as ready for review April 8, 2026 10:18
Comment thread kernels/src/kernels/utils.py Outdated
@cbensimon cbensimon changed the title feat: get_laoded_kernels() feat: get_loaded_kernels() Apr 10, 2026
@cbensimon cbensimon requested a review from danieldk April 10, 2026 16:02
Comment thread kernels/src/kernels/utils.py Outdated
backend=backend,
)
if not reload:
for loaded_kernel in get_loaded_kernels():
Copy link
Copy Markdown
Member

@danieldk danieldk Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing the lookup here makes the cached loading only work in get_kernel and not all other paths that import it. Maybe it makes more sense to do this in _import_from_path for now? We could always add a second cache keyed on the repository information if we'd like to skip network access as well in a future PR.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok as you prefer. When profiling get_kernel I noticed 95% of the time is spent in install_kernel that's why I thought it was the thing we wanted to cache. I only tried with flash-attn-3 though. Do some kernels take a long time to (re-)load (the actual loading part in _import_from_path when called a second time)?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't done any benchmarking, so I'm not sure.

Comment thread kernels/src/kernels/utils.py Outdated
Comment thread kernels/src/kernels/utils.py Outdated
Comment on lines +123 to +125
for so_path in file_path.parent.iterdir():
if so_path.is_file() and so_path.name.endswith('.so'):
op_namespace = so_path.name.split('.')[0]
Copy link
Copy Markdown
Member

@danieldk danieldk Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fragile. E.g. it doesn't work for no-arch kernels and IIRC we also have a different dylib extension name on macOS/Windows.

If we want a kernel to reveal its ops name, I think we should have a standardized API for it in kernels themselves. I think externally probing these things without standardization will break with various kernels (like it does here with noarch). The interface of kernels themselves is how downstream code should interact with kernels. kernels is only a client to fetch + load kernels, not to expose/probe kernel internals (if we can avoid it).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I see. How would you envision such a standardized API for the kernels to expose their op namespace? (metadata.json could be a good candidate given that it's auto-generated, which I'm not sure about)

Comment thread kernels/src/kernels/utils.py Outdated
Comment thread kernels/src/kernels/utils.py Outdated
Copy link
Copy Markdown
Member

@danieldk danieldk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

I think we could add this new function to the docs in a follow-up PR, but we could do that after we possibly add the metadata as well.

@danieldk danieldk merged commit 0d6cb13 into huggingface:main Apr 14, 2026
34 of 35 checks passed
@sayakpaul sayakpaul added this to the 0.15.0 milestone Apr 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants