Skip to content

Commit f4c7bab

Browse files
authored
feat: add init and benchmarks commands to docs (#276)
* feat: add init and benchmarks commands to docs * fix: improve init naming * fix: avoid lowercasing kernel name * feat: add cli ref for each command * fix: adjust init test after rebase * fix: improve docs and add missing commands * fix: update toc
1 parent 93e3c8e commit f4c7bab

15 files changed

Lines changed: 701 additions & 45 deletions

docs/source/_toctree.yml

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,3 +46,25 @@
4646
- local: cli
4747
title: Kernels CLI
4848
title: API Reference
49+
- sections:
50+
- local: cli-init
51+
title: kernels init
52+
- local: cli-upload
53+
title: kernels upload
54+
- local: cli-benchmark
55+
title: kernels benchmark
56+
- local: cli-check
57+
title: kernels check
58+
- local: cli-versions
59+
title: kernels versions
60+
- local: cli-generate-readme
61+
title: kernels generate-readme
62+
- local: cli-lock
63+
title: kernels lock
64+
- local: cli-download
65+
title: kernels download
66+
- local: cli-skills
67+
title: kernels skills
68+
- local: cli-create-and-upload-card
69+
title: kernels create-and-upload-card
70+
title: CLI Reference

docs/source/cli-benchmark.md

Lines changed: 147 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,147 @@
1+
# kernels benchmark
2+
3+
Use `kernels benchmark` to run benchmark scripts shipped with a kernel repository.
4+
5+
The command:
6+
7+
- Downloads the kernel repo at a specific **branch** or **version**
8+
- Runs all `benchmarks/benchmark*.py` scripts
9+
- Times each `benchmark_*` workload and prints a results table
10+
- Optionally saves results as JSON
11+
12+
## Installation
13+
14+
`kernels benchmark` requires extra dependencies:
15+
16+
```bash
17+
uv pip install 'kernels[benchmark]' # or pip install 'kernels[benchmark]'
18+
```
19+
20+
## Example
21+
22+
```bash
23+
kernels benchmark kernels-community/activation --version 1
24+
```
25+
26+
Example output:
27+
28+
```text
29+
Downloading kernels-community/activation@v1...
30+
Running benchmark.py...
31+
32+
GPU Apple M3 Max (30 cores)
33+
CPU Apple M3 Max
34+
OS Darwin 25.2.0
35+
PyTorch 2.10.0
36+
37+
Running SiluWorkloads on mps
38+
39+
┌───────────────┬────────────┬─────┬───────────┬────────────┬───────────┬───────────┬───────────┬───────────┬────────────┬───────────┬─────────┐
40+
│ Benchmark │ Workload │ N │ Speedup │ Mean(ms) │ Std(ms) │ Min(ms) │ Max(ms) │ IQR(ms) │ Outliers │ Ref(ms) │ Match │
41+
├───────────────┼────────────┼─────┼───────────┼────────────┼───────────┼───────────┼───────────┼───────────┼────────────┼───────────┼─────────┤
42+
│ SiluWorkloads │ large │ 100 │ 1.72x │ 6.5153 │ 0.4343 │ 6.2883 │ 8.4699 │ 0.1701 │ 8 │ 11.2048 │ ✓ │
43+
│ SiluWorkloads │ medium │ 100 │ 2.48x │ 1.1813 │ 0.3976 │ 1.04 │ 4.2146 │ 0.0698 │ 5 │ 2.9332 │ ✓ │
44+
│ SiluWorkloads │ small │ 100 │ 1.96x │ 0.4909 │ 0.2175 │ 0.4407 │ 2.6438 │ 0.0085 │ 16 │ 0.9622 │ ✓ │
45+
└───────────────┴────────────┴─────┴───────────┴────────────┴───────────┴───────────┴───────────┴───────────┴────────────┴───────────┴─────────┘
46+
47+
large: 1.72x faster (95% CI: 6.4302-6.6004ms vs ref 11.2048ms) ✓ significant
48+
medium: 2.48x faster (95% CI: 1.1034-1.2592ms vs ref 2.9332ms) ✓ significant
49+
small: 1.96x faster (95% CI: 0.4483-0.5335ms vs ref 0.9622ms) ✓ significant
50+
51+
Kernel: 2385e44 Benchmark: 5b53516
52+
```
53+
54+
## Usage
55+
56+
You must specify which revision to benchmark, either via flags or with `@...` in the repo id:
57+
58+
```bash
59+
kernels benchmark <repo_id> --version <N>
60+
kernels benchmark <repo_id> --branch <name>
61+
kernels benchmark <repo_id>@v<N>
62+
kernels benchmark <repo_id>@<branch>
63+
```
64+
65+
## Examples
66+
67+
Benchmark a tagged kernel version:
68+
69+
```bash
70+
kernels benchmark kernels-community/activation --version 1
71+
```
72+
73+
Equivalent shorthand:
74+
75+
```bash
76+
kernels benchmark kernels-community/activation@v1
77+
```
78+
79+
Benchmark a branch:
80+
81+
```bash
82+
kernels benchmark kernels-community/activation --branch main
83+
```
84+
85+
Tune warmup and iteration count:
86+
87+
```bash
88+
kernels benchmark kernels-community/activation@v1 --warmup 20 --iterations 200
89+
```
90+
91+
Save results to a file (JSON):
92+
93+
```bash
94+
kernels benchmark kernels-community/activation@v1 --output results.json
95+
```
96+
97+
Benchmark a local kernel checkout (must contain `benchmarks/`):
98+
99+
```bash
100+
kernels benchmark ./my_kernel
101+
```
102+
103+
## Output
104+
105+
- By default, a table is printed (timings in ms).
106+
- `--output <file>.json` writes a JSON payload to disk.
107+
108+
## Writing Benchmark Scripts
109+
110+
Benchmark scripts must live under `benchmarks/` in the kernel repository and match `benchmark*.py`.
111+
Each script should define one or more subclasses of `kernels.benchmark.Benchmark`.
112+
113+
Minimal example (`benchmarks/benchmark_activation.py`):
114+
115+
```python
116+
import torch
117+
118+
from kernels.benchmark import Benchmark
119+
120+
121+
class ActivationBenchmark(Benchmark):
122+
seed = 0
123+
124+
def setup(self):
125+
self.x = torch.randn(128, 1024, device=self.device, dtype=torch.float16)
126+
self.out = torch.empty(128, 512, device=self.device, dtype=torch.float16)
127+
128+
def benchmark_silu_and_mul(self):
129+
self.kernel.silu_and_mul(self.out, self.x)
130+
131+
def verify_silu_and_mul(self):
132+
# Return reference tensor; runner compares with self.out
133+
return torch.nn.functional.silu(self.x[..., :512]) * self.x[..., 512:]
134+
```
135+
136+
The runner will:
137+
138+
- Call `setup()` once per workload (or `setup_<workload>()` if present)
139+
- Warm up (`--warmup`)
140+
- Time `benchmark_<workload>()` for `--iterations`
141+
- If `verify_<workload>()` exists, check that outputs match (`torch.allclose(..., atol=1e-2)`) and show a speedup vs the reference computation
142+
143+
## Troubleshooting
144+
145+
- If the repo does not contain a `benchmarks/` directory (or no `benchmark*.py` files), the command exits with an error.
146+
- If a benchmark script defines no `Benchmark` subclasses, the command exits with an error.
147+
- If `verify_<workload>()` exists and the outputs do not match, the command exits with an error.

docs/source/cli-check.md

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
# kernels check
2+
3+
Use `kernels check` to verify that a kernel on the Hub meets compliance requirements.
4+
5+
## What It Checks
6+
7+
- Python ABI compatibility (default: 3.9)
8+
- Operating system compatibility (macOS 15.0+, manylinux_2_28)
9+
10+
## Usage
11+
12+
```bash
13+
kernels check <repo_id> [--revision <rev>] [--macos <version>] [--manylinux <version>] [--python-abi <version>]
14+
```
15+
16+
## Installation
17+
18+
`kernels check` requires an additional dependency:
19+
20+
```bash
21+
uv pip install kernel-abi-check # or pip install kernel-abi-check
22+
```
23+
24+
## Examples
25+
26+
Check a kernel on the Hub:
27+
28+
```bash
29+
kernels check kernels-community/flash-attn3
30+
```
31+
32+
Check a specific revision:
33+
34+
```bash
35+
kernels check kernels-community/flash-attn3 --revision v2
36+
```
37+
38+
Check with custom compatibility requirements:
39+
40+
```bash
41+
kernels check kernels-community/flash-attn3 --python-abi 3.10 --manylinux manylinux_2_31
42+
```
43+
44+
## Example Output
45+
46+
```text
47+
Checking variant: torch210-metal-aarch64-darwin
48+
Dynamic library _example_kernel_metal_2juixjwdznbhy.abi3.so:
49+
🐍 Python ABI 3.9 compatible
50+
🍏 compatible with macOS 15.0
51+
Checking variant: torch29-metal-aarch64-darwin
52+
Dynamic library _example_kernel_metal_vtlnpevkb6uum.abi3.so:
53+
🐍 Python ABI 3.9 compatible
54+
🍏 compatible with macOS 15.0
55+
```
56+
57+
## Options
58+
59+
| Option | Default | Description |
60+
| -------------- | ---------------- | ----------------------------------- |
61+
| `--revision` | `main` | Branch, tag, or commit SHA to check |
62+
| `--macos` | `15.0` | Minimum macOS version to require |
63+
| `--manylinux` | `manylinux_2_28` | Manylinux version to require |
64+
| `--python-abi` | `3.9` | Python ABI version to require |
65+
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
### kernels create-and-upload-card
2+
3+
Use `kernels create-and-upload-card <kernel_source_dir> --card-path README.md` to generate a basic homepage
4+
for the kernel. Find an example [here](https://hf.co/kernels-community/kernel-card-template). You can
5+
optionally push it to the Hub by specifying a `--repo-id`.

docs/source/cli-download.md

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
# kernels download
2+
3+
Use `kernels download` to download kernels that have been locked in a project's `kernels.lock` file.
4+
5+
## Usage
6+
7+
```bash
8+
kernels download <project_dir> [--all-variants]
9+
```
10+
11+
## What It Does
12+
13+
- Reads the `kernels.lock` file from the specified project directory
14+
- Downloads each locked kernel at its pinned revision (SHA)
15+
- Installs the appropriate variant for your platform (or all variants with `--all-variants`)
16+
17+
## Examples
18+
19+
Download kernels for the current project:
20+
21+
```bash
22+
kernels download .
23+
```
24+
25+
Download all build variants (useful for CI or multi-platform builds):
26+
27+
```bash
28+
kernels download . --all-variants
29+
```
30+
31+
Download kernels for a specific project:
32+
33+
```bash
34+
kernels download /path/to/my-project
35+
```
36+
37+
## Options
38+
39+
| Option | Description |
40+
| ---------------- | ----------------------------------------------------------------------------------------- |
41+
| `--all-variants` | Download all build variants of each kernel instead of just the current platform's variant |
42+
43+
## Prerequisites
44+
45+
Your project directory must contain a `kernels.lock` file. Generate one using [`kernels lock`](cli-lock.md).
46+
47+
## See Also
48+
49+
- [kernels lock](cli-lock.md) - Generate the lock file
50+
- [kernels versions](cli-versions.md) - View available kernel versions

docs/source/cli-generate-readme.md

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
# kernels generate-readme
2+
3+
Use `kernels generate-readme` to automatically generate documentation snippets for a kernel's public functions.
4+
5+
## Usage
6+
7+
```bash
8+
kernels generate-readme <repo_id> [--revision <rev>]
9+
```
10+
11+
## What It Does
12+
13+
- Downloads the specified kernel from the Hub
14+
- Inspects the kernel's public API
15+
- Generates markdown documentation snippets showing function signatures and usage
16+
17+
## Examples
18+
19+
Generate README snippets for a kernel:
20+
21+
```bash
22+
kernels generate-readme kernels-community/activation > README.md
23+
```
24+
25+
## Example Output
26+
27+
README.md snippet for `kernels-community/activation`:
28+
```md
29+
---
30+
tags:
31+
- kernels
32+
---
33+
34+
## Functions
35+
36+
### Function `fatrelu_and_mul`
37+
38+
`(out: torch.Tensor, x: torch.Tensor, threshold: float = 0.0) -> None`
39+
40+
No documentation available.
41+
42+
### Function `gelu`
43+
44+
`(out: torch.Tensor, x: torch.Tensor) -> None`
45+
46+
No documentation available.
47+
48+
### Function `gelu_and_mul`
49+
50+
`(out: torch.Tensor, x: torch.Tensor) -> None`
51+
52+
No documentation available.
53+
54+
### Function `gelu_fast`
55+
56+
`(out: torch.Tensor, x: torch.Tensor) -> None`
57+
58+
No documentation available.
59+
60+
...
61+
```

0 commit comments

Comments
 (0)