huggingface
diff --git a/‎docs/source/_toctree.yml‎
Lines changed: 22 additions & 0 deletions b/‎docs/source/_toctree.yml‎
Lines changed: 22 additions & 0 deletions
diff --git a/‎docs/source/cli-benchmark.md‎
Lines changed: 147 additions & 0 deletions b/‎docs/source/cli-benchmark.md‎
Lines changed: 147 additions & 0 deletions
diff --git a/‎docs/source/cli-check.md‎
Lines changed: 65 additions & 0 deletions b/‎docs/source/cli-check.md‎
Lines changed: 65 additions & 0 deletions
diff --git a/‎docs/source/cli-create-and-upload-card.md‎
Lines changed: 5 additions & 0 deletions b/‎docs/source/cli-create-and-upload-card.md‎
Lines changed: 5 additions & 0 deletions
diff --git a/‎docs/source/cli-download.md‎
Lines changed: 50 additions & 0 deletions b/‎docs/source/cli-download.md‎
Lines changed: 50 additions & 0 deletions
diff --git a/‎docs/source/cli-generate-readme.md‎
Lines changed: 61 additions & 0 deletions b/‎docs/source/cli-generate-readme.md‎
Lines changed: 61 additions & 0 deletions
@@ -46,3 +46,25 @@
     - local: cli
       title: Kernels CLI
   title: API Reference
+- sections:
+    - local: cli-init
+      title: kernels init
+    - local: cli-upload
+      title: kernels upload
+    - local: cli-benchmark
+      title: kernels benchmark
+    - local: cli-check
+      title: kernels check
+    - local: cli-versions
+      title: kernels versions
+    - local: cli-generate-readme
+      title: kernels generate-readme
+    - local: cli-lock
+      title: kernels lock
+    - local: cli-download
+      title: kernels download
+    - local: cli-skills
+      title: kernels skills
+    - local: cli-create-and-upload-card
+      title: kernels create-and-upload-card
+  title: CLI Reference
@@ -0,0 +1,147 @@
+# kernels benchmark
+
+Use `kernels benchmark` to run benchmark scripts shipped with a kernel repository.
+
+The command:
+
+- Downloads the kernel repo at a specific **branch** or **version**
+- Runs all `benchmarks/benchmark*.py` scripts
+- Times each `benchmark_*` workload and prints a results table
+- Optionally saves results as JSON
+
+## Installation
+
+`kernels benchmark` requires extra dependencies:
+
+```bash
+uv pip install 'kernels[benchmark]' # or pip install 'kernels[benchmark]'
+```
+
+## Example
+
+```bash
+kernels benchmark kernels-community/activation --version 1
+```
+
+Example output:
+
+```text
+Downloading kernels-community/activation@v1...
+Running benchmark.py...
+
+  GPU      Apple M3 Max (30 cores)
+  CPU      Apple M3 Max
+  OS       Darwin 25.2.0
+  PyTorch  2.10.0
+
+  Running SiluWorkloads on mps
+
+┌───────────────┬────────────┬─────┬───────────┬────────────┬───────────┬───────────┬───────────┬───────────┬────────────┬───────────┬─────────┐
+│ Benchmark     │ Workload   │   N │ Speedup   │   Mean(ms) │   Std(ms) │   Min(ms) │   Max(ms) │   IQR(ms) │   Outliers │   Ref(ms) │ Match   │
+├───────────────┼────────────┼─────┼───────────┼────────────┼───────────┼───────────┼───────────┼───────────┼────────────┼───────────┼─────────┤
+│ SiluWorkloads │ large      │ 100 │ 1.72x     │     6.5153 │    0.4343 │    6.2883 │    8.4699 │    0.1701 │          8 │   11.2048 │ ✓       │
+│ SiluWorkloads │ medium     │ 100 │ 2.48x     │     1.1813 │    0.3976 │    1.04   │    4.2146 │    0.0698 │          5 │    2.9332 │ ✓       │
+│ SiluWorkloads │ small      │ 100 │ 1.96x     │     0.4909 │    0.2175 │    0.4407 │    2.6438 │    0.0085 │         16 │    0.9622 │ ✓       │
+└───────────────┴────────────┴─────┴───────────┴────────────┴───────────┴───────────┴───────────┴───────────┴────────────┴───────────┴─────────┘
+
+  large: 1.72x faster (95% CI: 6.4302-6.6004ms vs ref 11.2048ms) ✓ significant
+  medium: 2.48x faster (95% CI: 1.1034-1.2592ms vs ref 2.9332ms) ✓ significant
+  small: 1.96x faster (95% CI: 0.4483-0.5335ms vs ref 0.9622ms) ✓ significant
+
+Kernel: 2385e44  Benchmark: 5b53516
+```
+
+## Usage
+
+You must specify which revision to benchmark, either via flags or with `@...` in the repo id:
+
+```bash
+kernels benchmark <repo_id> --version <N>
+kernels benchmark <repo_id> --branch <name>
+kernels benchmark <repo_id>@v<N>
+kernels benchmark <repo_id>@<branch>
+```
+
+## Examples
+
+Benchmark a tagged kernel version:
+
+```bash
+kernels benchmark kernels-community/activation --version 1
+```
+
+Equivalent shorthand:
+
+```bash
+kernels benchmark kernels-community/activation@v1
+```
+
+Benchmark a branch:
+
+```bash
+kernels benchmark kernels-community/activation --branch main
+```
+
+Tune warmup and iteration count:
+
+```bash
+kernels benchmark kernels-community/activation@v1 --warmup 20 --iterations 200
+```
+
+Save results to a file (JSON):
+
+```bash
+kernels benchmark kernels-community/activation@v1 --output results.json
+```
+
+Benchmark a local kernel checkout (must contain `benchmarks/`):
+
+```bash
+kernels benchmark ./my_kernel
+```
+
+## Output
+
+- By default, a table is printed (timings in ms).
+- `--output <file>.json` writes a JSON payload to disk.
+
+## Writing Benchmark Scripts
+
+Benchmark scripts must live under `benchmarks/` in the kernel repository and match `benchmark*.py`.
+Each script should define one or more subclasses of `kernels.benchmark.Benchmark`.
+
+Minimal example (`benchmarks/benchmark_activation.py`):
+
+```python
+import torch
+
+from kernels.benchmark import Benchmark
+
+
+class ActivationBenchmark(Benchmark):
+    seed = 0
+
+    def setup(self):
+        self.x = torch.randn(128, 1024, device=self.device, dtype=torch.float16)
+        self.out = torch.empty(128, 512, device=self.device, dtype=torch.float16)
+
+    def benchmark_silu_and_mul(self):
+        self.kernel.silu_and_mul(self.out, self.x)
+
+    def verify_silu_and_mul(self):
+        # Return reference tensor; runner compares with self.out
+        return torch.nn.functional.silu(self.x[..., :512]) * self.x[..., 512:]
+```
+
+The runner will:
+
+- Call `setup()` once per workload (or `setup_<workload>()` if present)
+- Warm up (`--warmup`)
+- Time `benchmark_<workload>()` for `--iterations`
+- If `verify_<workload>()` exists, check that outputs match (`torch.allclose(..., atol=1e-2)`) and show a speedup vs the reference computation
+
+## Troubleshooting
+
+- If the repo does not contain a `benchmarks/` directory (or no `benchmark*.py` files), the command exits with an error.
+- If a benchmark script defines no `Benchmark` subclasses, the command exits with an error.
+- If `verify_<workload>()` exists and the outputs do not match, the command exits with an error.
@@ -0,0 +1,65 @@
+# kernels check
+
+Use `kernels check` to verify that a kernel on the Hub meets compliance requirements.
+
+## What It Checks
+
+- Python ABI compatibility (default: 3.9)
+- Operating system compatibility (macOS 15.0+, manylinux_2_28)
+
+## Usage
+
+```bash
+kernels check <repo_id> [--revision <rev>] [--macos <version>] [--manylinux <version>] [--python-abi <version>]
+```
+
+## Installation
+
+`kernels check` requires an additional dependency:
+
+```bash
+uv pip install kernel-abi-check # or pip install kernel-abi-check
+```
+
+## Examples
+
+Check a kernel on the Hub:
+
+```bash
+kernels check kernels-community/flash-attn3
+```
+
+Check a specific revision:
+
+```bash
+kernels check kernels-community/flash-attn3 --revision v2
+```
+
+Check with custom compatibility requirements:
+
+```bash
+kernels check kernels-community/flash-attn3 --python-abi 3.10 --manylinux manylinux_2_31
+```
+
+## Example Output
+
+```text
+Checking variant: torch210-metal-aarch64-darwin
+  Dynamic library _example_kernel_metal_2juixjwdznbhy.abi3.so:
+    🐍 Python ABI 3.9 compatible
+    🍏 compatible with macOS 15.0
+Checking variant: torch29-metal-aarch64-darwin
+  Dynamic library _example_kernel_metal_vtlnpevkb6uum.abi3.so:
+    🐍 Python ABI 3.9 compatible
+    🍏 compatible with macOS 15.0
+```
+
+## Options
+
+| Option         | Default          | Description                         |
+| -------------- | ---------------- | ----------------------------------- |
+| `--revision`   | `main`           | Branch, tag, or commit SHA to check |
+| `--macos`      | `15.0`           | Minimum macOS version to require    |
+| `--manylinux`  | `manylinux_2_28` | Manylinux version to require        |
+| `--python-abi` | `3.9`            | Python ABI version to require       |
+
@@ -0,0 +1,5 @@
+### kernels create-and-upload-card
+
+Use `kernels create-and-upload-card <kernel_source_dir> --card-path README.md` to generate a basic homepage
+for the kernel. Find an example [here](https://hf.co/kernels-community/kernel-card-template). You can
+optionally push it to the Hub by specifying a `--repo-id`.
@@ -0,0 +1,50 @@
+# kernels download
+
+Use `kernels download` to download kernels that have been locked in a project's `kernels.lock` file.
+
+## Usage
+
+```bash
+kernels download <project_dir> [--all-variants]
+```
+
+## What It Does
+
+- Reads the `kernels.lock` file from the specified project directory
+- Downloads each locked kernel at its pinned revision (SHA)
+- Installs the appropriate variant for your platform (or all variants with `--all-variants`)
+
+## Examples
+
+Download kernels for the current project:
+
+```bash
+kernels download .
+```
+
+Download all build variants (useful for CI or multi-platform builds):
+
+```bash
+kernels download . --all-variants
+```
+
+Download kernels for a specific project:
+
+```bash
+kernels download /path/to/my-project
+```
+
+## Options
+
+| Option           | Description                                                                               |
+| ---------------- | ----------------------------------------------------------------------------------------- |
+| `--all-variants` | Download all build variants of each kernel instead of just the current platform's variant |
+
+## Prerequisites
+
+Your project directory must contain a `kernels.lock` file. Generate one using [`kernels lock`](cli-lock.md).
+
+## See Also
+
+- [kernels lock](cli-lock.md) - Generate the lock file
+- [kernels versions](cli-versions.md) - View available kernel versions
@@ -0,0 +1,61 @@
+# kernels generate-readme
+
+Use `kernels generate-readme` to automatically generate documentation snippets for a kernel's public functions.
+
+## Usage
+
+```bash
+kernels generate-readme <repo_id> [--revision <rev>]
+```
+
+## What It Does
+
+- Downloads the specified kernel from the Hub
+- Inspects the kernel's public API
+- Generates markdown documentation snippets showing function signatures and usage
+
+## Examples
+
+Generate README snippets for a kernel:
+
+```bash
+kernels generate-readme kernels-community/activation > README.md
+```
+
+## Example Output
+
+README.md snippet for `kernels-community/activation`:
+```md
+---
+tags:
+- kernels
+---
+
+## Functions
+
+### Function `fatrelu_and_mul`
+
+`(out: torch.Tensor, x: torch.Tensor, threshold: float = 0.0) -> None`
+
+No documentation available.
+
+### Function `gelu`
+
+`(out: torch.Tensor, x: torch.Tensor) -> None`
+
+No documentation available.
+
+### Function `gelu_and_mul`
+
+`(out: torch.Tensor, x: torch.Tensor) -> None`
+
+No documentation available.
+
+### Function `gelu_fast`
+
+`(out: torch.Tensor, x: torch.Tensor) -> None`
+
+No documentation available.
+
+...
+```