|
| 1 | +Automatic labeling units after spike sorting |
| 2 | +============================================ |
| 3 | + |
| 4 | +This example shows how to automatically label units after spike sorting, |
| 5 | +using three different approaches: |
| 6 | + |
| 7 | +1. Simple filter based on quality metrics |
| 8 | +2. Bombcell: heuristic approach to label units based on quality and |
| 9 | + template metrics [Fabre]\_ |
| 10 | +3. UnitRefine: pre-trained classifiers to label units as noise or |
| 11 | + SUA/MUA [Jain]\_ |
| 12 | + |
| 13 | +.. code:: ipython3 |
| 14 | +
|
| 15 | + import numpy as np |
| 16 | +
|
| 17 | + import spikeinterface as si |
| 18 | + import spikeinterface.curation as sc |
| 19 | + import spikeinterface.widgets as sw |
| 20 | +
|
| 21 | + from pprint import pprint |
| 22 | +
|
| 23 | +.. code:: ipython3 |
| 24 | +
|
| 25 | + %matplotlib inline |
| 26 | +
|
| 27 | +.. code:: ipython3 |
| 28 | +
|
| 29 | + analyzer_path = "/ssd980/working/analyzer_np2_single_shank.zarr" |
| 30 | +
|
| 31 | +.. code:: ipython3 |
| 32 | +
|
| 33 | + sorting_analyzer = si.load(analyzer_path) |
| 34 | +
|
| 35 | +
|
| 36 | +.. code:: ipython3 |
| 37 | +
|
| 38 | + sorting_analyzer |
| 39 | +
|
| 40 | +
|
| 41 | +
|
| 42 | +
|
| 43 | +.. parsed-literal:: |
| 44 | +
|
| 45 | + SortingAnalyzer: 96 channels - 142 units - 1 segments - zarr - sparse - has recording |
| 46 | + Loaded 14 extensions: amplitude_scalings, correlograms, isi_histograms, noise_levels, principal_components, quality_metrics, random_spikes, spike_amplitudes, spike_locations, template_metrics, template_similarity, templates, unit_locations, waveforms |
| 47 | +
|
| 48 | +
|
| 49 | +
|
| 50 | +The ``SortingAnalyzer`` includes several metrics that we can use for |
| 51 | +curation: |
| 52 | + |
| 53 | +.. code:: ipython3 |
| 54 | +
|
| 55 | + sorting_analyzer.get_metrics_extension_data().columns |
| 56 | +
|
| 57 | +
|
| 58 | +
|
| 59 | +
|
| 60 | +.. parsed-literal:: |
| 61 | +
|
| 62 | + Index(['amplitude_cutoff', 'amplitude_cv_median', 'amplitude_cv_range', |
| 63 | + 'amplitude_median', 'd_prime', 'drift_mad', 'drift_ptp', 'drift_std', |
| 64 | + 'firing_range', 'firing_rate', 'isi_violations_count', |
| 65 | + 'isi_violations_ratio', 'isolation_distance', 'l_ratio', 'nn_hit_rate', |
| 66 | + 'nn_miss_rate', 'noise_cutoff', 'noise_ratio', 'num_spikes', |
| 67 | + 'presence_ratio', 'rp_contamination', 'rp_violations', 'sd_ratio', |
| 68 | + 'silhouette', 'sliding_rp_violation', 'snr', 'sync_spike_2', |
| 69 | + 'sync_spike_4', 'sync_spike_8', 'exp_decay', 'half_width', |
| 70 | + 'main_peak_to_trough_ratio', 'main_to_next_extremum_duration', |
| 71 | + 'num_negative_peaks', 'num_positive_peaks', |
| 72 | + 'peak_after_to_trough_ratio', 'peak_after_width', |
| 73 | + 'peak_before_to_peak_after_ratio', 'peak_before_to_trough_ratio', |
| 74 | + 'peak_before_width', 'peak_to_trough_duration', 'recovery_slope', |
| 75 | + 'repolarization_slope', 'spread', 'trough_width', 'velocity_above', |
| 76 | + 'velocity_below', 'waveform_baseline_flatness'], |
| 77 | + dtype='object') |
| 78 | +
|
| 79 | +
|
| 80 | +
|
| 81 | +1. Quality-metrics based curation |
| 82 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 83 | + |
| 84 | +A simple solution is to use a filter based on quality metrics. To do so, |
| 85 | +we can use the ``spikeinterface.curation.qualitymetrics_label_units`` |
| 86 | +function and provide a set of thresholds. |
| 87 | + |
| 88 | +.. code:: ipython3 |
| 89 | +
|
| 90 | + qm_thresholds = { |
| 91 | + "snr": {"min": 5}, |
| 92 | + "firing_rate": {"min": 0.1, "max": 200}, |
| 93 | + "rp_contamination": {"max": 0.5} |
| 94 | + } |
| 95 | +
|
| 96 | +.. code:: ipython3 |
| 97 | +
|
| 98 | + all_metrics = sorting_analyzer.get_metrics_extension_data() |
| 99 | + qm_labels = sc.threshold_metrics_label_units(all_metrics, thresholds=qm_thresholds, column_name="simple_threshold") |
| 100 | +
|
| 101 | +.. code:: ipython3 |
| 102 | +
|
| 103 | + qm_labels["simple_threshold"].value_counts() |
| 104 | +
|
| 105 | +
|
| 106 | +
|
| 107 | +
|
| 108 | +.. parsed-literal:: |
| 109 | +
|
| 110 | + simple_threshold |
| 111 | + noise 115 |
| 112 | + good 27 |
| 113 | + Name: count, dtype: int64 |
| 114 | +
|
| 115 | +
|
| 116 | +
|
| 117 | +.. code:: ipython3 |
| 118 | +
|
| 119 | + w = sw.plot_unit_labels(sorting_analyzer, qm_labels["simple_threshold"], ylims=(-300, 100)) |
| 120 | + w.figure.suptitle("Quality-metrics labeling") |
| 121 | +
|
| 122 | +
|
| 123 | +
|
| 124 | +
|
| 125 | +
|
| 126 | +.. image:: auto_label_units_files/auto_label_units_12_1.png |
| 127 | + |
| 128 | + |
| 129 | +Only 27 units are labeled as *good*, and we can see from the plots that |
| 130 | +some “noisy” waveforms are not properly flagged and some visually good |
| 131 | +waveforms are labeled as noise. Let’s take a look at more powerful |
| 132 | +methods. |
| 133 | + |
| 134 | +We can also check the distribution of the metrics and the thresholds |
| 135 | +across all units: |
| 136 | + |
| 137 | +.. code:: ipython3 |
| 138 | +
|
| 139 | + _ = sw.plot_metric_histograms(sorting_analyzer, qm_thresholds, figsize=(12, 7)) |
| 140 | +
|
| 141 | +
|
| 142 | +
|
| 143 | +.. image:: auto_label_units_files/auto_label_units_14_0.png |
| 144 | + |
| 145 | + |
| 146 | +1. Bombcell |
| 147 | +----------- |
| 148 | + |
| 149 | +**Bombcell** ([Fabre]_) is another threshold-based method that also uses |
| 150 | +quality metrics and template metrics, but in a much more refined way! It |
| 151 | +can label units as ``noise``, ``mua``, and ``good`` and further detect |
| 152 | +``non-soma`` units. It comes with some default thresholds, but |
| 153 | +user-defined thresholds can be provided from a dictionary or a JSON |
| 154 | +file. |
| 155 | + |
| 156 | +.. code:: ipython3 |
| 157 | +
|
| 158 | + bombcell_default_thresholds = sc.bombcell_get_default_thresholds() |
| 159 | + pprint(bombcell_default_thresholds) |
| 160 | +
|
| 161 | +
|
| 162 | +.. parsed-literal:: |
| 163 | +
|
| 164 | + {'mua': {'amplitude_cutoff': {'max': 0.2, 'min': None}, |
| 165 | + 'amplitude_median': {'max': None, 'min': 40}, |
| 166 | + 'drift_ptp': {'max': 100, 'min': None}, |
| 167 | + 'num_spikes': {'max': None, 'min': 300}, |
| 168 | + 'presence_ratio': {'max': None, 'min': 0.7}, |
| 169 | + 'rp_contamination': {'max': 0.1, 'min': None}, |
| 170 | + 'snr': {'max': None, 'min': 5}}, |
| 171 | + 'noise': {'exp_decay': {'max': 0.1, 'min': 0.01}, |
| 172 | + 'num_negative_peaks': {'max': 1, 'min': None}, |
| 173 | + 'num_positive_peaks': {'max': 2, 'min': None}, |
| 174 | + 'peak_after_to_trough_ratio': {'max': 0.8, 'min': None}, |
| 175 | + 'peak_to_trough_duration': {'max': 0.00115, 'min': 0.0001}, |
| 176 | + 'waveform_baseline_flatness': {'max': 0.5, 'min': None}}, |
| 177 | + 'non-somatic': {'main_peak_to_trough_ratio': {'max': 0.8, 'min': None}, |
| 178 | + 'peak_before_to_peak_after_ratio': {'max': 3, 'min': None}, |
| 179 | + 'peak_before_to_trough_ratio': {'max': 3, 'min': None}, |
| 180 | + 'peak_before_width': {'max': None, 'min': 0.00015}, |
| 181 | + 'trough_width': {'max': None, 'min': 0.0002}}} |
| 182 | +
|
| 183 | +
|
| 184 | +.. code:: ipython3 |
| 185 | +
|
| 186 | + bombcell_labels = sc.bombcell_label_units(sorting_analyzer, thresholds=bombcell_default_thresholds, label_non_somatic=True, split_non_somatic_good_mua=True) |
| 187 | +
|
| 188 | +.. code:: ipython3 |
| 189 | +
|
| 190 | + bombcell_labels["bombcell_label"].value_counts() |
| 191 | +
|
| 192 | +
|
| 193 | +
|
| 194 | +
|
| 195 | +.. parsed-literal:: |
| 196 | +
|
| 197 | + bombcell_label |
| 198 | + mua 70 |
| 199 | + noise 50 |
| 200 | + good 21 |
| 201 | + non_soma_mua 1 |
| 202 | + Name: count, dtype: int64 |
| 203 | +
|
| 204 | +
|
| 205 | +
|
| 206 | +.. code:: ipython3 |
| 207 | +
|
| 208 | + w = sw.plot_unit_labels(sorting_analyzer, bombcell_labels["bombcell_label"], ylims=(-300, 100)) |
| 209 | + w.figure.suptitle("Bombcell labeling") |
| 210 | +
|
| 211 | +
|
| 212 | +
|
| 213 | +
|
| 214 | +
|
| 215 | +.. image:: auto_label_units_files/auto_label_units_19_1.png |
| 216 | + |
| 217 | + |
| 218 | +Bombcell uses many more metrics! |
| 219 | + |
| 220 | +.. code:: ipython3 |
| 221 | +
|
| 222 | + _ = sw.plot_metric_histograms(sorting_analyzer, bombcell_default_thresholds, figsize=(15, 10)) |
| 223 | +
|
| 224 | +
|
| 225 | +
|
| 226 | +.. image:: auto_label_units_files/auto_label_units_21_0.png |
| 227 | + |
| 228 | + |
| 229 | +Bombcell also provides a specific widget to inspect the failure mode of |
| 230 | +each labeling step. The *upset* plot shows the combination of metrics |
| 231 | +that cause a failure (e.g. “noise” labeling). The top panel shows how |
| 232 | +many units failed for that combination. For example, in the following |
| 233 | +plot, we see that 9 units were labeled as “noise” because they didn’t |
| 234 | +pass the ``num_positive_peaks`` and ``num_negative_peaks`` thresholds. |
| 235 | +19 units were labeled as “mua” for poor SNR and high refractory period |
| 236 | +contamination (``rp_contamination``). |
| 237 | + |
| 238 | +.. code:: ipython3 |
| 239 | +
|
| 240 | + _ = sw.plot_bombcell_labels_upset(sorting_analyzer, unit_labels=bombcell_labels["bombcell_label"], thresholds=bombcell_default_thresholds, unit_labels_to_plot=["noise", "mua"]) |
| 241 | +
|
| 242 | +
|
| 243 | +
|
| 244 | +.. image:: auto_label_units_files/auto_label_units_23_0.png |
| 245 | + |
| 246 | + |
| 247 | + |
| 248 | +.. image:: auto_label_units_files/auto_label_units_23_1.png |
| 249 | + |
| 250 | + |
| 251 | +UnitRefine |
| 252 | +---------- |
| 253 | + |
| 254 | +**UnitRefine** ([Jain]_) also uses quality and template metrics, but in |
| 255 | +a different way. It uses pre-trained classifiers to trained on |
| 256 | +hand-curated data. By default, the classification is performed in two |
| 257 | +steps: first a *noise*/*neural* classifier is applied, followed by a |
| 258 | +*sua*/*mua* classifier. Several models are available on the |
| 259 | +`SpikeInterface HuggingFace |
| 260 | +page <https://huggingface.co/SpikeInterface>`__. |
| 261 | + |
| 262 | +.. code:: ipython3 |
| 263 | +
|
| 264 | + unitrefine_labels = sc.unitrefine_label_units( |
| 265 | + sorting_analyzer, |
| 266 | + noise_neural_classifier="SpikeInterface/UnitRefine_noise_neural_classifier", |
| 267 | + sua_mua_classifier="SpikeInterface/UnitRefine_sua_mua_classifier", |
| 268 | + ) |
| 269 | +
|
| 270 | +.. code:: ipython3 |
| 271 | +
|
| 272 | + unitrefine_labels["unitrefine_label"].value_counts() |
| 273 | +
|
| 274 | +
|
| 275 | +
|
| 276 | +
|
| 277 | +.. parsed-literal:: |
| 278 | +
|
| 279 | + unitrefine_label |
| 280 | + sua 62 |
| 281 | + noise 47 |
| 282 | + mua 33 |
| 283 | + Name: count, dtype: int64 |
| 284 | +
|
| 285 | +
|
| 286 | +
|
| 287 | +.. code:: ipython3 |
| 288 | +
|
| 289 | + w = sw.plot_unit_labels(sorting_analyzer, unitrefine_labels["unitrefine_label"], ylims=(-300, 100)) |
| 290 | + w.figure.suptitle("UnitRefine labeling") |
| 291 | +
|
| 292 | +
|
| 293 | +
|
| 294 | +
|
| 295 | +.. image:: auto_label_units_files/auto_label_units_27_1.png |
| 296 | + |
| 297 | + |
| 298 | + **NOTE:** If you want to train your own models, see the `UnitRefine |
| 299 | + repo <%60https://github.com/anoushkajain/UnitRefine%60>`__ for |
| 300 | + instructions! |
| 301 | + |
| 302 | +This “How To” demonstrated how to automatically label units after spike |
| 303 | +sorting with different strategies. We recommend running **Bombcell** and |
| 304 | +**UnitRefine** as part of your pipeline. These methods will facilitate |
| 305 | +further curation and make downstream analysis cleaner. |
| 306 | + |
| 307 | +To remove units from your ``SortingAnalyzer``, you can simply use the |
| 308 | +``select_units`` function: |
| 309 | + |
| 310 | +Remove units from ``SortingAnalyzer`` |
| 311 | +------------------------------------- |
| 312 | + |
| 313 | +After auto-labeling, we can easily remove the “noise” units for |
| 314 | +downstream analysis: |
| 315 | + |
| 316 | +.. code:: ipython3 |
| 317 | +
|
| 318 | + non_noisy_units = bombcell_labels["bombcell_label"] != "noise" |
| 319 | + sorting_analyzer_clean = sorting_analyzer.select_units(sorting_analyzer.unit_ids[non_noisy_units]) |
| 320 | +
|
| 321 | +.. code:: ipython3 |
| 322 | +
|
| 323 | + sorting_analyzer_clean |
| 324 | +
|
| 325 | +
|
| 326 | +
|
| 327 | +
|
| 328 | +.. parsed-literal:: |
| 329 | +
|
| 330 | + SortingAnalyzer: 96 channels - 92 units - 1 segments - memory - sparse - has recording |
| 331 | + Loaded 14 extensions: random_spikes, waveforms, templates, amplitude_scalings, correlograms, isi_histograms, noise_levels, principal_components, spike_amplitudes, spike_locations, quality_metrics, template_metrics, template_similarity, unit_locations |
0 commit comments