Skip to content

Commit ab2bd13

Browse files
authored
Merge branch 'main' into xpu-skill
2 parents bf0a397 + 559c412 commit ab2bd13

95 files changed

Lines changed: 7620 additions & 559 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/pull_request_template.md

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,11 @@
55
> avoids wasted effort on changes we may not be able to merge. Please only create
66
> a PR once one of the project maintainers agrees on your outlined approach.
77
>
8-
> PRs of contributors who are not vouched for are automatically closed. You can
9-
> become a vouched contributor by discussing a change first as outlined above.
8+
> PRs of contributors who are not vouched for are automatically closed. Regular
9+
> contributors are added to the vouch list.
10+
>
11+
> For, LLM-generated changes, we prefer that you write your prompt in an issue
12+
> over a PR with LLM-generated changes.
1013
1114
## Related issue
1215

@@ -39,3 +42,7 @@ Closes #
3942
- [ ] This PR is linked to an issue that was discussed and approved
4043
- [ ] I have tested these changes locally
4144
- [ ] New/changed functionality has test coverage
45+
- LLM disclosure:
46+
- [ ] I did not use an LLM to create this PR.
47+
- [ ] I used and LLM for assistance while creating this PR.
48+
- [ ] This PR was mostly or completely generated by an LLM.

.github/workflows/build_kernel.yaml

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,7 @@ jobs:
5050
name: built-kernels-${{ matrix.arch }}
5151
path: |
5252
activation-kernel
53+
cpp20-symbols-kernel
5354
cutlass-gemm-kernel
5455
cutlass-gemm-tvm-ffi-kernel
5556
extra-data
@@ -60,8 +61,11 @@ jobs:
6061
silu-and-mul-kernel
6162
6263
test:
63-
name: Test kernels
64+
name: Test kernels (UBI ${{ matrix.ubi_version }})
6465
needs: build
66+
strategy:
67+
matrix:
68+
ubi_version: [8, 9]
6569
runs-on:
6670
group: aws-g6-12xlarge-plus
6771
steps:
@@ -76,9 +80,10 @@ jobs:
7680
- name: Build Docker image
7781
run: |
7882
docker build \
79-
-t kernel-builder:latest \
83+
-t kernel-builder:ubi${{ matrix.ubi_version }} \
84+
--build-arg UBI_VERSION=${{ matrix.ubi_version }} \
8085
-f nix-builder/tests/Dockerfile.test-kernel .
8186
8287
- name: Run Tests
8388
run: |
84-
docker run --gpus all kernel-builder:latest
89+
docker run --gpus all kernel-builder:ubi${{ matrix.ubi_version }}

.github/workflows/security-audit.yml

Lines changed: 22 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -138,12 +138,32 @@ jobs:
138138
COMMIT_URL: ${{ github.event.head_commit.url }}
139139
COMMIT_MESSAGE: ${{ github.event.head_commit.message }}
140140
COMMIT_AUTHOR: ${{ github.event.head_commit.author.username || github.event.head_commit.author.name }}
141+
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
142+
REPO: ${{ github.repository }}
143+
shell: bash
141144
run: |
142145
FINDINGS=$(cat /tmp/audit_result.txt)
143146
COMMIT_TITLE=$(printf '%s\n' "$COMMIT_MESSAGE" | head -n1)
144147
145-
printf -v HEADER '*Security Audit Finding*\n*Commit:* <%s|%s>\n*Author:* %s\n\n---\n\n' \
146-
"$COMMIT_URL" "$COMMIT_TITLE" "$COMMIT_AUTHOR"
148+
# GitHub username -> Slack member ID. Entries here are only tagged
149+
# when the GitHub API confirms the user currently has the admin or
150+
# maintain role on this repo, so stale entries are inert.
151+
declare -A SLACK_IDS=(
152+
["danieldk"]="U072206PXLK"
153+
["drbh"]="U06C9TW7RDY"
154+
["sayakpaul"]="U03AU4E7DJB"
155+
)
156+
157+
MENTION=""
158+
if [ -n "${SLACK_IDS[$COMMIT_AUTHOR]:-}" ]; then
159+
ROLE=$(gh api "repos/${REPO}/collaborators/${COMMIT_AUTHOR}/permission" --jq '.role_name' 2>/dev/null || true)
160+
if [ "$ROLE" = "admin" ] || [ "$ROLE" = "maintain" ]; then
161+
MENTION="<@${SLACK_IDS[$COMMIT_AUTHOR]}> "
162+
fi
163+
fi
164+
165+
printf -v HEADER '%s*Security Audit Finding*\n*Commit:* <%s|%s>\n*Author:* %s\n\n---\n\n' \
166+
"$MENTION" "$COMMIT_URL" "$COMMIT_TITLE" "$COMMIT_AUTHOR"
147167
148168
jq -n \
149169
--arg text "${HEADER}${FINDINGS}" \

.github/workflows/test_e2e.yaml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -89,6 +89,13 @@ jobs:
8989
cd /tmp/kernels-upload-test
9090
sed -i 's|github:huggingface/kernels|path:'"$GITHUB_WORKSPACE"'|' flake.nix
9191
92+
- name: Make flake a Git repo
93+
run: |
94+
cd /tmp/kernels-upload-test
95+
git config --global user.email "bottie@mcbotface.hf.co"
96+
git config --global user.name "Botty McBotface"
97+
git init && git add . && git commit -m "e2e test"
98+
9299
- name: Determine latest variant
93100
id: variant
94101
run: |

.pre-commit-config.yaml

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
repos:
2+
- repo: https://github.com/pre-commit/pre-commit-hooks
3+
rev: f23336e5dc4bf11588d7db19f675418cf570971b
4+
hooks:
5+
- id: no-commit-to-branch
6+
args: ["--branch", "main"]
7+
8+
- repo: https://github.com/astral-sh/ruff-pre-commit
9+
rev: b831c3dc5d27d9da294ae4e915773b99aa24a7c5
10+
hooks:
11+
- id: ruff-check
12+
args: [--fix]
13+
- id: ruff-format
14+
- repo: https://github.com/NixOS/nixfmt
15+
rev: ccb94535e519e94b77d6bda76ac1de06e9f11284
16+
hooks:
17+
- id: nixfmt-nix

docs/source/_toctree.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,3 +66,7 @@
6666
- local: builder-cli
6767
title: Builder CLI Reference
6868
title: CLI Reference
69+
- sections:
70+
- local: builder/design-nix-builder
71+
title: Nix Builder
72+
title: Design

docs/source/basic-usage.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -28,8 +28,7 @@ get the latest kernel build from the `v1` branch.
2828
Kernels within a version branch must never break the API or remove builds
2929
for older PyTorch versions. This ensures that your code will continue to work.
3030

31-
Some kernels have not yet been updated to use versioning yet. In these cases,
32-
you can use `get_kernel` without the `version` argument.
31+
Hub kernels must be loaded with either a `version` or an explicit `revision`.
3332

3433
## Checking Kernel Availability
3534

Lines changed: 115 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,115 @@
1+
# Nix Builder design
2+
3+
## Introduction
4+
5+
kernel-builder uses a Nix-based builder that orchestrates the build. The Nix
6+
builder provides:
7+
8+
- Reproducible evaluation. The same Nix builder version will always produce
9+
the same derivations (build recipes).
10+
- Largely reproducible builds by using a build sandbox that only has the
11+
dependencies specified in a derivation.
12+
- Seamless creation of different build environments (e.g. different Torch
13+
and CUDA combinations).
14+
15+
## Kernel build steps
16+
17+
A kernel derivation builds a kernel in the following steps:
18+
19+
1. Generate CMake files for the kernel using
20+
`kernel-builder create-pyproject`.
21+
2. Generate Ninja build files using CMake.
22+
3. Build the kernel using Ninja.
23+
4. Perform various checks on the compiled kernel, such as:
24+
- Verify that the kernel only uses ABI3/`manylinux_2_28` symbols.
25+
- Verify that the kernel can be loaded by the `kernels` Python package.
26+
5. Strip runpaths (ELF-embedded library directories) from kernel binaries
27+
to make the kernel distribution-independent.
28+
29+
## manylinux_2_28 compatibility
30+
31+
To achieve `manylinux_2_28` compatibility, kernels are built using a
32+
toolchain similar to the `manylinux_2_28` Docker images. This toolchain
33+
is based on the gcc toolsets from AlmaLinux 8. `manylinux_2_28` [uses
34+
AlmaLinux 8 as its base](https://github.com/pypa/manylinux#manylinux_2_28-almalinux-8-based),
35+
so we have to compile against the same glibc/libstdc++ versions to
36+
ensure compatibility.
37+
38+
We repackage the AlmaLinux 8 toolsets and libstdc++ as Nix derivations (see
39+
the `nix-builder/packages/manylinux_2_28` source directory). Then we merge
40+
various toolset packages to an unwrapped gcc that resembles unwrapped gcc in
41+
nixpkgs. Finally, we wrap binutils and gcc to combine them into a stdenv.
42+
43+
The stdenv does not reuse glibc from AlmaLinux, since its dynamic loader has
44+
hardcoded FHS paths (`/lib64` etc.) that are not valid in Nix. Using this
45+
dynamic loader results in linking errors, since the paths in the dynamic
46+
loader are used as a last resort (to link glibc libraries). So, instead we
47+
build our own glibc 2.28 package
48+
(see `nix-builder/pkgs/manylinux_2_28/stdenv.nix`) and use that.
49+
50+
## The package set pattern
51+
52+
We repackage various existing package sets as Nix derivations. For instance,
53+
this is done for ROCm, XPU, and manylinux_2_28 packages. We do this because
54+
we want these libraries to be as close as what the user would install. This
55+
avoids compatibility issues between the kernels and the official vendor
56+
packages. For instance, suppose that we built a ROCm library as a shared
57+
library and ROCm provides the same library as a static library, then compiled
58+
kernels could use symbols that cannot be resolved when installing the official
59+
ROCm packages. Similarly, using the official packages allows us to test
60+
against the official upstram packages.
61+
62+
These package sets all follow the same pattern:
63+
64+
```nix
65+
{
66+
lib,
67+
callPackage,
68+
newScope,
69+
pkgs,
70+
}:
71+
72+
{
73+
packageMetadata,
74+
}:
75+
76+
let
77+
inherit (lib.fixedPoints) extends composeManyExtensions;
78+
79+
fixedPoint = final: {
80+
inherit lib;
81+
};
82+
composed = lib.composeManyExtensions [
83+
# Base package set.
84+
(import ./components.nix { inherit packageMetadata; })
85+
86+
# Package-specific overrides.
87+
(import ./overrides.nix)
88+
89+
# Additional overlays that extend the package set.
90+
(import ./some-overlay.nix)
91+
];
92+
in
93+
lib.makeScope newScope (lib.extends composed fixedPoint)
94+
```
95+
96+
We use a fixed point to build up the package set as a list of
97+
[overlays](https://nixos.org/manual/nixpkgs/stable/#sec-overlays-definition).
98+
This has various benefits. For instance, it allows us to refine the
99+
package set incrementally and we can refer to the final versions of
100+
packages in intermediate overlays.
101+
102+
The package sets all use a similar list of overlays:
103+
104+
- An initial overlay (`components.nix`) that applies a generic builder
105+
to the package set metadata. The metadata typically comes from a Yum/DNF
106+
repository that contains RPM packages.The generic builder will extract the
107+
RPMs and move binaries, libraries, and headers to the right location. This
108+
results in a set of Nix derivations that may or may not build.
109+
- The next overlay (`overrides.nix`) fixes up derivations generated by the
110+
generic builder in the previous overlay that do not build. Fixing the
111+
derivations typically consists of adding missing dependencies and changing
112+
embedded FHS paths to Nix store paths.
113+
- Additional overlays with derivations that combine outputs from previous
114+
overlays. One typical example are derivations that construct a full compiler
115+
toolchain (e.g. `nix-builder/pkgs/manylinux_2_28/gcc-unwrapped.nix`).

docs/source/layers.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -177,9 +177,8 @@ will get the latest kernel build from the `v1` branch. Kernel layers
177177
within a version branch must never break the API or remove builds for
178178
older PyTorch versions. This ensures that your code will continue to
179179
work.
180-
181-
Some kernels have not yet been updated to use versioning yet. In these cases,
182-
you can use `LayerRepository` without the `version` argument.
180+
Hub-backed `LayerRepository` and `FuncRepository` entries must specify
181+
either a `version` or an explicit `revision`.
183182

184183
You can register a mapping, like the one above, using `register_kernel_mapping`:
185184

@@ -210,6 +209,7 @@ kernel_layer_mapping = {
210209
"cuda": FuncRepository(
211210
repo_id="kernels-community/activation",
212211
func_name="silu_and_mul",
212+
version=1,
213213
),
214214
}
215215
}

docs/source/migration.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,8 @@
66

77
Before `kernels` 0.12, kernels could be pulled from a repository
88
without specifying a version. This is deprecated in kernels 0.12
9-
and will become an error in kernels 0.14. Instead, use of a kernel
10-
should always specify a version (except for local kernels).
9+
and is an error in kernels 0.15. Instead, use of a kernel should
10+
always specify a version or revision (except for local kernels).
1111

1212
Kernels only use a major version. The kernel maintainer is responsible
1313
for never breaking a kernel within a major version and should bump up

0 commit comments

Comments
 (0)