Skip to content

Commit 797f0e8

Browse files
Julie-Fabrealejoe91chrishalcrowsamuelgarcia
authored
Bombcell integration and template metrics refactoring (#4306)
Co-authored-by: Alessio Buccino <alejoe9187@gmail.com> Co-authored-by: Chris Halcrow <57948917+chrishalcrow@users.noreply.github.com> Co-authored-by: Samuel Garcia <sam.garcia.die@gmail.com>
1 parent b0f7867 commit 797f0e8

59 files changed

Lines changed: 3477 additions & 647 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -141,6 +141,7 @@ examples/modules_gallery/**/*.zarr
141141

142142
# Files and folders generated during tests
143143
test_folder/
144+
*trained_pipeline/
144145

145146
# Mac OS
146147
.DS_Store

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,7 @@ Please [Star](https://github.com/SpikeInterface/spikeinterface/stargazers) the p
5050

5151
SpikeInterface is a Python package designed to unify preexisting spike sorting technologies into a single code base. If you use SpikeInterface, you are also using code and ideas from many other projects. Our codebase would be tiny without the amazing algorithms and formats that we interface with. See them all, and how to cite them, on our [references page](https://spikeinterface.readthedocs.io/en/latest/references.html). In the past year, we have added support for the following tools:
5252

53+
- Bombcell [Bombcell: automated curation and cell classification of spike-sorted electrophysiology data](https://doi.org/10.5281/zenodo.8172822>) ([docs](https://spikeinterface.readthedocs.io/en/stable/how_to/auto_label_units.html#bombcell))
5354
- SLAy. [SLAy-ing oversplitting errors in high-density electrophysiology spike sorting](https://www.biorxiv.org/content/10.1101/2025.06.20.660590v2) ([docs](https://spikeinterface.readthedocs.io/en/latest/modules/curation.html#auto-merging-units))
5455
- Lupin, Spykingcicus2 and Tridesclous2. [Opening the black box: a modular approach to spike sorting](https://www.biorxiv.org/content/10.64898/2026.01.23.701239v1) ([docs](https://spikeinterface.readthedocs.io/en/stable/modules/sorters.html#supported-spike-sorters))
5556
- rtsort. [RT-Sort: An action potential propagation-based algorithm for real time spike detection and sorting with millisecond latencies](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0312438) ([docs](https://spikeinterface.readthedocs.io/en/stable/modules/sorters.html#supported-spike-sorters))

doc/api.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -375,6 +375,8 @@ spikeinterface.curation
375375
.. autofunction:: remove_duplicated_spikes
376376
.. autofunction:: remove_excess_spikes
377377
.. autofunction:: threshold_metrics_label_units
378+
.. autofunction:: bombcell_label_units
379+
.. autofunction:: bombcell_get_default_thresholds
378380
.. autofunction:: model_based_label_units
379381
.. autofunction:: load_model
380382
.. autofunction:: train_model

doc/how_to/auto_label_units.rst

Lines changed: 331 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,331 @@
1+
Automatic labeling units after spike sorting
2+
============================================
3+
4+
This example shows how to automatically label units after spike sorting,
5+
using three different approaches:
6+
7+
1. Simple filter based on quality metrics
8+
2. Bombcell: heuristic approach to label units based on quality and
9+
template metrics [Fabre]\_
10+
3. UnitRefine: pre-trained classifiers to label units as noise or
11+
SUA/MUA [Jain]\_
12+
13+
.. code:: ipython3
14+
15+
import numpy as np
16+
17+
import spikeinterface as si
18+
import spikeinterface.curation as sc
19+
import spikeinterface.widgets as sw
20+
21+
from pprint import pprint
22+
23+
.. code:: ipython3
24+
25+
%matplotlib inline
26+
27+
.. code:: ipython3
28+
29+
analyzer_path = "/ssd980/working/analyzer_np2_single_shank.zarr"
30+
31+
.. code:: ipython3
32+
33+
sorting_analyzer = si.load(analyzer_path)
34+
35+
36+
.. code:: ipython3
37+
38+
sorting_analyzer
39+
40+
41+
42+
43+
.. parsed-literal::
44+
45+
SortingAnalyzer: 96 channels - 142 units - 1 segments - zarr - sparse - has recording
46+
Loaded 14 extensions: amplitude_scalings, correlograms, isi_histograms, noise_levels, principal_components, quality_metrics, random_spikes, spike_amplitudes, spike_locations, template_metrics, template_similarity, templates, unit_locations, waveforms
47+
48+
49+
50+
The ``SortingAnalyzer`` includes several metrics that we can use for
51+
curation:
52+
53+
.. code:: ipython3
54+
55+
sorting_analyzer.get_metrics_extension_data().columns
56+
57+
58+
59+
60+
.. parsed-literal::
61+
62+
Index(['amplitude_cutoff', 'amplitude_cv_median', 'amplitude_cv_range',
63+
'amplitude_median', 'd_prime', 'drift_mad', 'drift_ptp', 'drift_std',
64+
'firing_range', 'firing_rate', 'isi_violations_count',
65+
'isi_violations_ratio', 'isolation_distance', 'l_ratio', 'nn_hit_rate',
66+
'nn_miss_rate', 'noise_cutoff', 'noise_ratio', 'num_spikes',
67+
'presence_ratio', 'rp_contamination', 'rp_violations', 'sd_ratio',
68+
'silhouette', 'sliding_rp_violation', 'snr', 'sync_spike_2',
69+
'sync_spike_4', 'sync_spike_8', 'exp_decay', 'half_width',
70+
'main_peak_to_trough_ratio', 'main_to_next_extremum_duration',
71+
'num_negative_peaks', 'num_positive_peaks',
72+
'peak_after_to_trough_ratio', 'peak_after_width',
73+
'peak_before_to_peak_after_ratio', 'peak_before_to_trough_ratio',
74+
'peak_before_width', 'peak_to_trough_duration', 'recovery_slope',
75+
'repolarization_slope', 'spread', 'trough_width', 'velocity_above',
76+
'velocity_below', 'waveform_baseline_flatness'],
77+
dtype='object')
78+
79+
80+
81+
1. Quality-metrics based curation
82+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
83+
84+
A simple solution is to use a filter based on quality metrics. To do so,
85+
we can use the ``spikeinterface.curation.qualitymetrics_label_units``
86+
function and provide a set of thresholds.
87+
88+
.. code:: ipython3
89+
90+
qm_thresholds = {
91+
"snr": {"min": 5},
92+
"firing_rate": {"min": 0.1, "max": 200},
93+
"rp_contamination": {"max": 0.5}
94+
}
95+
96+
.. code:: ipython3
97+
98+
all_metrics = sorting_analyzer.get_metrics_extension_data()
99+
qm_labels = sc.threshold_metrics_label_units(all_metrics, thresholds=qm_thresholds, column_name="simple_threshold")
100+
101+
.. code:: ipython3
102+
103+
qm_labels["simple_threshold"].value_counts()
104+
105+
106+
107+
108+
.. parsed-literal::
109+
110+
simple_threshold
111+
noise 115
112+
good 27
113+
Name: count, dtype: int64
114+
115+
116+
117+
.. code:: ipython3
118+
119+
w = sw.plot_unit_labels(sorting_analyzer, qm_labels["simple_threshold"], ylims=(-300, 100))
120+
w.figure.suptitle("Quality-metrics labeling")
121+
122+
123+
124+
125+
126+
.. image:: auto_label_units_files/auto_label_units_12_1.png
127+
128+
129+
Only 27 units are labeled as *good*, and we can see from the plots that
130+
some “noisy” waveforms are not properly flagged and some visually good
131+
waveforms are labeled as noise. Let’s take a look at more powerful
132+
methods.
133+
134+
We can also check the distribution of the metrics and the thresholds
135+
across all units:
136+
137+
.. code:: ipython3
138+
139+
_ = sw.plot_metric_histograms(sorting_analyzer, qm_thresholds, figsize=(12, 7))
140+
141+
142+
143+
.. image:: auto_label_units_files/auto_label_units_14_0.png
144+
145+
146+
1. Bombcell
147+
-----------
148+
149+
**Bombcell** ([Fabre]_) is another threshold-based method that also uses
150+
quality metrics and template metrics, but in a much more refined way! It
151+
can label units as ``noise``, ``mua``, and ``good`` and further detect
152+
``non-soma`` units. It comes with some default thresholds, but
153+
user-defined thresholds can be provided from a dictionary or a JSON
154+
file.
155+
156+
.. code:: ipython3
157+
158+
bombcell_default_thresholds = sc.bombcell_get_default_thresholds()
159+
pprint(bombcell_default_thresholds)
160+
161+
162+
.. parsed-literal::
163+
164+
{'mua': {'amplitude_cutoff': {'max': 0.2, 'min': None},
165+
'amplitude_median': {'max': None, 'min': 40},
166+
'drift_ptp': {'max': 100, 'min': None},
167+
'num_spikes': {'max': None, 'min': 300},
168+
'presence_ratio': {'max': None, 'min': 0.7},
169+
'rp_contamination': {'max': 0.1, 'min': None},
170+
'snr': {'max': None, 'min': 5}},
171+
'noise': {'exp_decay': {'max': 0.1, 'min': 0.01},
172+
'num_negative_peaks': {'max': 1, 'min': None},
173+
'num_positive_peaks': {'max': 2, 'min': None},
174+
'peak_after_to_trough_ratio': {'max': 0.8, 'min': None},
175+
'peak_to_trough_duration': {'max': 0.00115, 'min': 0.0001},
176+
'waveform_baseline_flatness': {'max': 0.5, 'min': None}},
177+
'non-somatic': {'main_peak_to_trough_ratio': {'max': 0.8, 'min': None},
178+
'peak_before_to_peak_after_ratio': {'max': 3, 'min': None},
179+
'peak_before_to_trough_ratio': {'max': 3, 'min': None},
180+
'peak_before_width': {'max': None, 'min': 0.00015},
181+
'trough_width': {'max': None, 'min': 0.0002}}}
182+
183+
184+
.. code:: ipython3
185+
186+
bombcell_labels = sc.bombcell_label_units(sorting_analyzer, thresholds=bombcell_default_thresholds, label_non_somatic=True, split_non_somatic_good_mua=True)
187+
188+
.. code:: ipython3
189+
190+
bombcell_labels["bombcell_label"].value_counts()
191+
192+
193+
194+
195+
.. parsed-literal::
196+
197+
bombcell_label
198+
mua 70
199+
noise 50
200+
good 21
201+
non_soma_mua 1
202+
Name: count, dtype: int64
203+
204+
205+
206+
.. code:: ipython3
207+
208+
w = sw.plot_unit_labels(sorting_analyzer, bombcell_labels["bombcell_label"], ylims=(-300, 100))
209+
w.figure.suptitle("Bombcell labeling")
210+
211+
212+
213+
214+
215+
.. image:: auto_label_units_files/auto_label_units_19_1.png
216+
217+
218+
Bombcell uses many more metrics!
219+
220+
.. code:: ipython3
221+
222+
_ = sw.plot_metric_histograms(sorting_analyzer, bombcell_default_thresholds, figsize=(15, 10))
223+
224+
225+
226+
.. image:: auto_label_units_files/auto_label_units_21_0.png
227+
228+
229+
Bombcell also provides a specific widget to inspect the failure mode of
230+
each labeling step. The *upset* plot shows the combination of metrics
231+
that cause a failure (e.g. “noise” labeling). The top panel shows how
232+
many units failed for that combination. For example, in the following
233+
plot, we see that 9 units were labeled as “noise” because they didn’t
234+
pass the ``num_positive_peaks`` and ``num_negative_peaks`` thresholds.
235+
19 units were labeled as “mua” for poor SNR and high refractory period
236+
contamination (``rp_contamination``).
237+
238+
.. code:: ipython3
239+
240+
_ = sw.plot_bombcell_labels_upset(sorting_analyzer, unit_labels=bombcell_labels["bombcell_label"], thresholds=bombcell_default_thresholds, unit_labels_to_plot=["noise", "mua"])
241+
242+
243+
244+
.. image:: auto_label_units_files/auto_label_units_23_0.png
245+
246+
247+
248+
.. image:: auto_label_units_files/auto_label_units_23_1.png
249+
250+
251+
UnitRefine
252+
----------
253+
254+
**UnitRefine** ([Jain]_) also uses quality and template metrics, but in
255+
a different way. It uses pre-trained classifiers to trained on
256+
hand-curated data. By default, the classification is performed in two
257+
steps: first a *noise*/*neural* classifier is applied, followed by a
258+
*sua*/*mua* classifier. Several models are available on the
259+
`SpikeInterface HuggingFace
260+
page <https://huggingface.co/SpikeInterface>`__.
261+
262+
.. code:: ipython3
263+
264+
unitrefine_labels = sc.unitrefine_label_units(
265+
sorting_analyzer,
266+
noise_neural_classifier="SpikeInterface/UnitRefine_noise_neural_classifier",
267+
sua_mua_classifier="SpikeInterface/UnitRefine_sua_mua_classifier",
268+
)
269+
270+
.. code:: ipython3
271+
272+
unitrefine_labels["unitrefine_label"].value_counts()
273+
274+
275+
276+
277+
.. parsed-literal::
278+
279+
unitrefine_label
280+
sua 62
281+
noise 47
282+
mua 33
283+
Name: count, dtype: int64
284+
285+
286+
287+
.. code:: ipython3
288+
289+
w = sw.plot_unit_labels(sorting_analyzer, unitrefine_labels["unitrefine_label"], ylims=(-300, 100))
290+
w.figure.suptitle("UnitRefine labeling")
291+
292+
293+
294+
295+
.. image:: auto_label_units_files/auto_label_units_27_1.png
296+
297+
298+
**NOTE:** If you want to train your own models, see the `UnitRefine
299+
repo <%60https://github.com/anoushkajain/UnitRefine%60>`__ for
300+
instructions!
301+
302+
This “How To” demonstrated how to automatically label units after spike
303+
sorting with different strategies. We recommend running **Bombcell** and
304+
**UnitRefine** as part of your pipeline. These methods will facilitate
305+
further curation and make downstream analysis cleaner.
306+
307+
To remove units from your ``SortingAnalyzer``, you can simply use the
308+
``select_units`` function:
309+
310+
Remove units from ``SortingAnalyzer``
311+
-------------------------------------
312+
313+
After auto-labeling, we can easily remove the “noise” units for
314+
downstream analysis:
315+
316+
.. code:: ipython3
317+
318+
non_noisy_units = bombcell_labels["bombcell_label"] != "noise"
319+
sorting_analyzer_clean = sorting_analyzer.select_units(sorting_analyzer.unit_ids[non_noisy_units])
320+
321+
.. code:: ipython3
322+
323+
sorting_analyzer_clean
324+
325+
326+
327+
328+
.. parsed-literal::
329+
330+
SortingAnalyzer: 96 channels - 92 units - 1 segments - memory - sparse - has recording
331+
Loaded 14 extensions: random_spikes, waveforms, templates, amplitude_scalings, correlograms, isi_histograms, noise_levels, principal_components, spike_amplitudes, spike_locations, quality_metrics, template_metrics, template_similarity, unit_locations
73.7 KB
Loading
37.8 KB
Loading
67.4 KB
Loading
66.6 KB
Loading
117 KB
Loading
117 KB
Loading

0 commit comments

Comments
 (0)