Skip to content

Commit 678b800

Browse files
Intron7Zethson
andauthored
rsc-0.15.0 release blogpost (#207)
* add blogpost * update formating * fix title * fix description * fix description * fix talbe * make table nicer * headers Signed-off-by: Lukas Heumos <lukas.heumos@posteo.net> * more details Signed-off-by: Lukas Heumos <lukas.heumos@posteo.net> --------- Signed-off-by: Lukas Heumos <lukas.heumos@posteo.net> Co-authored-by: Lukas Heumos <lukas.heumos@posteo.net>
1 parent 06fd387 commit 678b800

2 files changed

Lines changed: 231 additions & 0 deletions

File tree

assets/main.scss

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -957,6 +957,26 @@ body {
957957
Open Sans,
958958
sans-serif;
959959
}
960+
> table {
961+
width: 100%;
962+
border-collapse: collapse;
963+
margin: 1.5rem 0;
964+
font-family: "Inter", sans-serif;
965+
font-size: 1rem;
966+
th,
967+
td {
968+
padding: 0.6rem 0.9rem;
969+
border: 1px solid $overline;
970+
text-align: left;
971+
}
972+
th {
973+
background-color: $tilebg;
974+
font-weight: 600;
975+
}
976+
tbody tr:nth-child(even) {
977+
background-color: $tilebg4;
978+
}
979+
}
960980
@media (max-width: 50rem), (max-device-width: 40rem) {
961981
font-size: 1rem;
962982
line-height: 1.8rem;
Lines changed: 211 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,211 @@
1+
+++
2+
title = "rapids-singlecell 0.15.0: Prebuilt CUDA Wheels and Compiled Kernels"
3+
date = 2026-04-30T00:00:05+01:00
4+
description = "Why we moved from CuPy RawKernels to nanobind C++ extensions and other release highlights."
5+
author = "Severin Dicks, Lukas Heumos"
6+
draft = false
7+
+++
8+
9+
# Rapids-singlecell release 0.15.0
10+
11+
We are proud to announce rapids-singlecell release 0.15.0 which comes with lots of new features but also changes to the installation process.
12+
13+
## Why the packaging changes
14+
15+
In earlier versions of rapids-singlecell, all GPU kernels were written as CuPy RawKernels.
16+
These were compiled the first time you called them — in your environment, on your machine.
17+
That worked, but it came with friction:
18+
19+
- **First-call latency.**
20+
The initial invocation of a kernel-backed function could take several seconds while nvrtc compiled the CUDA source.
21+
- **Silent dtype/layout mismatches.**
22+
A RawKernel receives raw pointers.
23+
If the input array had the wrong dtype or wasn't C-contiguous, the kernel might silently produce garbage rather than raising an error.
24+
- **CUDA code trapped in Python strings.**
25+
RawKernels are defined as CUDA source inside Python string literals.
26+
That means no syntax highlighting, no autocomplete, and no compiler warnings in your editor — debugging C++ code buried in a Python string is nobody's idea of a good time.
27+
28+
Starting with 0.15.0, these kernels are compiled once at build time and shipped as nanobind/CUDA C++ extension modules inside the wheel.
29+
The result is a more conventional compiled-extension workflow: you `pip install` the package and every kernel is ready immediately.
30+
31+
### Packaging changes in detail
32+
33+
The GPU kernels that were previously CuPy RawKernels are now nanobind C++ extensions built with `scikit-build-core` and CMake.
34+
This gives us:
35+
36+
- **No runtime compilation** for any migrated kernel — the compiled code is in the wheel.
37+
- **Typed bindings at the Python/C++ boundary.**
38+
nanobind enforces dtype (e.g. float32 vs float64) and memory layout (C-contiguous vs F-contiguous) before the kernel launches, so mismatches raise a clear `TypeError` instead of producing wrong results.
39+
- **A conventional C++/CUDA project structure** with headers, shared helpers, and room for larger fused or fully C++ GPU routines.
40+
Harmony2, shipping in this release, is the first example of a more complex function built on this foundation.
41+
- **CUDA-versioned wheel packaging.**
42+
CI builds separate wheels for each CUDA major version — `rapids-singlecell-cu12` and `rapids-singlecell-cu13` — each with a `[rapids]` dependency extra that pulls in the matching RAPIDS and CuPy packages.
43+
44+
The Python API and import name are unchanged:
45+
46+
```python
47+
import rapids_singlecell as rsc
48+
```
49+
50+
Your existing analysis scripts should work without modification.
51+
52+
### CUDA-specific wheels
53+
54+
Because the kernels are now compiled binaries, we need to ship one wheel per CUDA major version.
55+
(Python wheel tags don't encode CUDA version, so we encode it in the package name — the same approach used by CuPy, PyTorch, and other CUDA-dependent packages.)
56+
57+
| Package | Build CUDA | Runtime CUDA | Blackwell (B200, GB200) |
58+
| :----------------------- | :--------: | :----------: | :---------------------- |
59+
| `rapids-singlecell-cu12` | 12.2 | 12.2 – 12.9+ | Supported via PTX JIT |
60+
| `rapids-singlecell-cu13` | 13.0 | 13.0+ | Native binaries |
61+
62+
Both wheels are available for **x86_64** and **aarch64** on Linux.
63+
64+
If you have a Blackwell GPU (B200, GB200) and want the best out-of-the-box performance, the CUDA 13 wheel includes native binaries for Blackwell architectures.
65+
The CUDA 12 wheel still supports Blackwell through PTX just-in-time compilation, so it will work, but the first kernel launch on Blackwell will be slightly slower while the driver JIT-compiles the PTX.
66+
67+
### How to install
68+
69+
#### Prebuilt wheel (recommended)
70+
71+
Pick the wheel that matches your CUDA version:
72+
73+
```bash
74+
pip install rapids-singlecell-cu13 # CUDA 13
75+
pip install rapids-singlecell-cu12 # CUDA 12
76+
```
77+
78+
This installs rapids-singlecell with precompiled kernels, but does **not** pull in the RAPIDS stack (cupy, cuml, cudf, etc.).
79+
If you manage those dependencies separately — for example, through conda — this is all you need.
80+
81+
#### Prebuilt wheel with RAPIDS dependencies
82+
83+
If you want pip to also install the matching RAPIDS and CuPy packages:
84+
85+
```bash
86+
pip install 'rapids-singlecell-cu13[rapids]' --extra-index-url=https://pypi.nvidia.com
87+
pip install 'rapids-singlecell-cu12[rapids]' --extra-index-url=https://pypi.nvidia.com
88+
```
89+
90+
Note: on the prebuilt wheels, the dependency extra is always `[rapids]`.
91+
The CUDA version is determined by which package name you install — `rapids-singlecell-cu12` or `rapids-singlecell-cu13`.
92+
If you're building from source instead, the extras are `[rapids-cu12]` and `[rapids-cu13]`.
93+
94+
#### Conda / Mamba
95+
96+
Environment files are provided in the repository:
97+
98+
```bash
99+
conda env create -f conda/rsc_rapids_26.04_cuda13.yml # Python 3.14, CUDA 13
100+
conda env create -f conda/rsc_rapids_26.04_cuda12.yml # Python 3.14, CUDA 12
101+
```
102+
103+
> **Note:** RAPIDS currently does not support `channel_priority: strict`. Use `channel_priority: flexible` instead.
104+
105+
#### Docker / Apptainer
106+
107+
Pre-built containers are available for both CUDA versions:
108+
109+
```bash
110+
docker pull ghcr.io/scverse/rapids-singlecell-cu13:latest
111+
docker run --rm --gpus all ghcr.io/scverse/rapids-singlecell-cu13:latest
112+
```
113+
114+
For HPC clusters using Apptainer/Singularity:
115+
116+
```bash
117+
apptainer pull rsc.sif docker://ghcr.io/scverse/rapids-singlecell-cu13:latest
118+
apptainer run --nv rsc.sif
119+
```
120+
121+
### Migration from 0.14.x
122+
123+
For most users, upgrading is straightforward:
124+
125+
1. **Change your pip install command.**
126+
Replace `pip install rapids-singlecell` with `pip install rapids-singlecell-cu12` or `rapids-singlecell-cu13`, depending on your CUDA version.
127+
2. **No code changes needed.**
128+
The `import rapids_singlecell as rsc` import and all public APIs remain the same.
129+
3. **Check your CUDA version.**
130+
Run `nvidia-smi` or `nvcc --version` to confirm whether you're on CUDA 12.x or CUDA 13.x, and install the matching wheel.
131+
If you're using conda, make sure the CUDA runtime library version in your environment matches the wheel you install — e.g., `cuda-cudart` from the `nvidia` channel should be 12.x for the cu12 wheel or 13.x for the cu13 wheel.
132+
133+
### What about `pip install rapids-singlecell`?
134+
135+
The plain install — `pip install rapids-singlecell`, without the `-cu12` or `-cu13` suffix — still works.
136+
It will compile the CUDA extensions from source during installation.
137+
This is perfectly functional, but please be aware of what that means: you need a CUDA toolkit with nvcc, CMake ≥ 3.24, and a compatible C++ compiler already present in your environment, and the build will take longer than downloading a prebuilt wheel.
138+
139+
When building from source, you can install the matching RAPIDS dependencies with the `[rapids-cu12]` or `[rapids-cu13]` extra:
140+
141+
```bash
142+
pip install 'rapids-singlecell[rapids-cu12]' --extra-index-url=https://pypi.nvidia.com
143+
```
144+
145+
Or install the RAPIDS stack separately before or after the build.
146+
147+
For most users, we recommend the prebuilt CUDA wheels.
148+
They're faster to install and don't require a local compiler toolchain.
149+
For more details on source builds — including how to target custom GPU architectures — see the [installation docs](https://rapids-singlecell.readthedocs.io/en/latest/installation.html).
150+
151+
Source builds are the right choice if you are:
152+
153+
- **Contributing to rapids-singlecell** and need to iterate on C++ kernel code.
154+
- **Debugging CUDA extensions** and want to compile with debug flags or sanitizers.
155+
- **Targeting a custom GPU architecture** not covered by the prebuilt wheels (e.g. a future compute capability).
156+
- **On a platform we don't publish wheels for** (though we cover x86_64 and aarch64 Linux).
157+
158+
If none of those apply to you, use the prebuilt wheel.
159+
160+
## Other highlights in 0.15.0
161+
162+
Beyond packaging, this release includes a substantial set of algorithmic and performance improvements built up across the 0.15.0 development cycle:
163+
164+
### Harmony2 and C++ harmony
165+
166+
Harmony was rewritten as a C++ nanobind kernel ([#578](https://github.com/scverse/rapids-singlecell/pull/578)), making it significantly faster and more memory-efficient.
167+
On top of that, we implemented three algorithmic improvements from the Harmony2 paper (Patikas et al. 2026): a stabilized diversity penalty, dynamic per-cluster-per-batch ridge regularization, and automatic batch pruning to prevent overintegration in biologically heterogeneous datasets ([#625](https://github.com/scverse/rapids-singlecell/pull/625)).
168+
This is also the first example of a more complex routine built on the new compiled-kernel infrastructure.
169+
170+
### Contrast-based energy distance
171+
172+
Perturbation experiments typically don't need a full k×k distance matrix between all groups — you want to compare each perturbation against one or two controls, possibly stratified by cell type.
173+
The new `contrast_distances()` API ([#603](https://github.com/scverse/rapids-singlecell/pull/603)) lets you express exactly that.
174+
You build a contrasts DataFrame — either with the `Distance.create_contrasts()` helper or by hand — where each row is a (target, reference) comparison, optionally stratified by `split_by` columns (e.g., cell type).
175+
Under the hood, the kernel deduplicates shared distance pairs across contrasts, subsets the embedding to only the referenced cells before transferring to GPU, and launches a single kernel call for all unique pairs.
176+
The result is a copy of your contrasts DataFrame with an `edistance` column appended.
177+
178+
```python
179+
from rapids_singlecell.pertpy_gpu import Distance
180+
181+
dist = Distance("edistance")
182+
183+
# Compare each perturbation against two controls, stratified by cell type
184+
contrasts = Distance.create_contrasts(
185+
adata,
186+
groupby="target_gene",
187+
selected_group=["Non_target", "Scramble"],
188+
split_by="cell_type",
189+
)
190+
191+
result = dist.contrast_distances(adata, contrasts=contrasts)
192+
```
193+
194+
`onesided_distances()` also now accepts a sequence of control group names via `selected_group`, returning a DataFrame with one column per control ([#601](https://github.com/scverse/rapids-singlecell/pull/601)).
195+
Both energy distance and co-occurrence kernels gained multi-GPU support ([#545](https://github.com/scverse/rapids-singlecell/pull/545), [#546](https://github.com/scverse/rapids-singlecell/pull/546)).
196+
197+
### More highlights
198+
199+
- **RAPIDS 26.04 and Python 3.14 support** across all CI and conda environments.
200+
- **Dask support for `highly_variable_genes`** with the Seurat v3 flavor ([#616](https://github.com/scverse/rapids-singlecell/pull/616)).
201+
- **CUDA kernel error surfacing** — launch errors are now raised instead of silently continuing ([#619](https://github.com/scverse/rapids-singlecell/pull/619)).
202+
- **Additional tutorials** such as a Pertpy-GPU tutorial ([#645](https://github.com/scverse/rapids-singlecell/pull/645))
203+
204+
A big thank you to everyone who tested the pre-releases and helped surface issues before this release went out.
205+
206+
For questions and bug reports, visit the [GitHub issue tracker](https://github.com/scverse/rapids_singlecell/issues).
207+
208+
---
209+
210+
*rapids-singlecell is part of the [scverse](https://scverse.org) ecosystem.
211+
If you use it in your research, please cite the project.*

0 commit comments

Comments
 (0)