Returned svd by yger · Pull Request #3847 · SpikeInterface/spikeinterface

yger · 2025-04-08T13:46:25Z

Make an example, with SC2, of how one can avoid template estimation from raw data with SVD components with graph-based clustering (or circus clustering). Note that there is also a slight refactoring of SC2 to make the code more component-friendly while writing the paper. With such a version, there should be no memory problem with very large number of templates and/or electrodes. This is solving #3722

for more information, see https://pre-commit.ci

…turned_svd

yger · 2025-04-09T13:28:40Z

    nbefore = int(ms_before * fs / 1000.0)
    nafter = int(ms_after * fs / 1000.0)

+    sorting = np.zeros(valid_peaks.size, dtype=minimum_spike_dtype)


yger · 2025-04-09T13:44:32Z

This is good to me @samuelgarcia am, and now, with such a PR, we can even use KS clustering internally in SC2. Keeping everything else constant, the perfs are really good! Clustering is still currently the remaining component that should be optimized (graph_clustering is an attempt)

for more information, see https://pre-commit.ci

b-grimaud · 2025-04-09T16:40:17Z

I'm getting very different results than with the older version with default params on the same data.

Previous one :

Found 24348 spikes
Kept 1220 units after final merging
spykingcircus2 run time 31510.87s

New one :

Found 8092 spikes                         
Kept 26 units after final merging 
spykingcircus2 run time 648.18s

I also get some warnings :

Build distance graph over channels:  99%|██████████████████████████████████████████████████████▌| 4062/4095 [00:13<00:00, 305.77it/s]
/home/user/.conda/envs/spike/lib/python3.12/site-packages/sklearn/decomposition/_truncated_svd.py:275: RuntimeWarning: invalid value encountered in divide
  self.explained_variance_ratio_ = exp_var / full_var

However, if I lower the detection threshold this warning disappears so I'm guessing it's due to too few spikes ?

I also get this one at the end, which I didn't before, presumably when saving the analyzer ?

/home/user/Documents/GitHub/spikeinterface/src/spikeinterface/core/sortinganalyzer.py:431: UserWarning: The Recording is not serializable! The recording link will be lost for future load
  warnings.warn("The Recording is not serializable! The recording link will be lost for future load")

yger · 2025-04-09T16:49:54Z

Can you pull again and retry? I put back the circus clustering as a default (so you should not see graph clustering and face these warnings), but with svd estimation of the templates. This is faster, and in my case finding only hundreds of templates, that do make sense without memory saturation hopefully. You should not get only 26, or are we taking about the same data ? Otherwise I'll check tomorrow, but this is clearly not the numer I am getting, so something must be strange

b-grimaud · 2025-04-10T14:30:13Z

Yes, that was my bad, I got the most recent commits and it is indeed working better.

With this PR, I get :

Kept 138396 peaks for clustering
Found 3508 raw clusters, starting to clean with matching

Then OOM error as it hits remove_duplicates_via_matching.

I then merged with #3856 and get :

Kept 304231 peaks for clustering
Found 4079 raw clusters, starting to clean with matching

Then OOM again.

Looking at the commits, it doesn't look like remove_duplicates_via_matching is under any condition so I guess it's still running ?

yger · 2025-04-10T14:33:59Z

Arg yes, sorry, I'll make an option to make this optional and you should be good to go. Comment the line for now, but this will be exposed at a higher level. At least, templates are computed from svd in memory and thus not more errors because of estimate_templates()

yger · 2025-04-10T14:36:35Z

#3856 is a different issue, not for your own purpose. This is a way, for low density probes, to avoid template matching when templates are found, but instead assign labels to all peaks given the SVD projections. Because the clustering is performed only on a subset of the peaks. But this is only interesting for low density probes, where template matching can sometimes do more harm than good. In your case (dense, lots of channels), you should not activate such a mode

for more information, see https://pre-commit.ci

…turned_svd

for more information, see https://pre-commit.ci

yger · 2025-04-10T15:01:21Z

Now this is good to go for you, remove_mixtures is disabled by default, this should work on your side, let me know

samuelgarcia · 2025-04-16T11:54:30Z

bien joué!

h-mayorquin · 2025-04-16T18:13:25Z

I think this broke some tests.

b-grimaud · 2025-04-30T09:06:47Z

Apologies for the delay, I was away and had not been able to try it out more thoroughly.

I can process a 10 minutes long recording in about 18 hours with torch and ~30 jobs. The slowest step seems to be the final merging process, which is still pretty memory intensive.

I had to set up a less-than-elegant solution of having a 500 GB swap file on an SSD for some other computing tasks, which might explain why things are slow. I'm currently trying again with lower n_jobs to see if I can avoid swapping and actually run things faster, but at some point there might be diminishing returns.

Results look good with default parameters, but the quality metrics suggest some unit contamination. I'll try to mess around with clustering params to see if it can be improved.

yger · 2025-04-30T09:37:14Z

This has been merged to main, but strange that you need such a large amount of swap. Nothing is written to disk, so I'm rather curious... Note also that torch is not really faster in case of wobble and circus-omp-svd, maybe this could be the reason also of latencies. I'll investigate why the final merging is so long, because this should not be the case. How many units are found?

b-grimaud · 2025-04-30T09:49:49Z

The size of the swap space is because of something unrelated, I think at most SC2 used 25-30 GB of it.
I'll try without torch and see if it makes a difference. I also just noticed a warning at the beginning : the full recording is around 90 GB so it cannot be fully preloaded in RAM, I'll try adjusting the preprocessing memory limit just enough.
Final unit count is a little under 5000.

yger · 2025-04-30T09:54:45Z

Ok, then I'll make some tests with such high number of units. If you update from main your branch, the memory consumption should have been reduced during matching (if using circus-omp-svd) because of #3889 that I've just merged

b-grimaud · 2025-05-09T15:41:19Z

Tested a bit more, doesn't make much of a difference with or without torch.
I couldn't monitor the memory usage for the entire duration of the process, but the fix seems to help quite a bit.

I also sometimes get the following warning, especially, but not exclusively, if I reduce radius_um :

UserWarning: Units (5731, 5789, 5790) can not be merged
 with the current sparsity_overlap. Merging is skipped

yger added 10 commits April 8, 2025 09:55

WIP

8e33d08

Example of how to use SVD to estimate templates in SC2

e499302

Patching to get a working example

5be94da

WIP

2be228b

WIP

bd7c7be

WIP

1cd89f3

WIP

c28a7b6

WIP

8e455f9

WIP

6de8310

Cosmetic

3fb5fa6

yger added enhancement New feature or request sortingcomponents Related to sortingcomponents module labels Apr 8, 2025

yger assigned samuelgarcia and yger Apr 8, 2025

[pre-commit.ci] auto fixes from pre-commit.com hooks

5eec5e3

for more information, see https://pre-commit.ci

yger mentioned this pull request Apr 8, 2025

SpykingCircus2 clustering crash at template estimation #3722

Open

yger added 6 commits April 8, 2025 16:02

Patch

74f52bd

Merge branch 'returned_svd' of github.com:yger/spikeinterface into re…

1388666

…turned_svd

WIP

ff10442

WIP

d0333dd

Fix

220fe03

WIP

97e8c67

samuelgarcia reviewed Apr 9, 2025

View reviewed changes

Comment thread src/spikeinterface/sortingcomponents/clustering/circus.py

samuelgarcia reviewed Apr 9, 2025

View reviewed changes

Comment thread src/spikeinterface/sortingcomponents/clustering/graph_clustering.py

yger commented Apr 9, 2025

View reviewed changes

yger added 2 commits April 9, 2025 15:28

Merge branch 'main' into returned_svd

83dceec

WIP

2cb39eb

SpikeInterface deleted a comment from samuelgarcia Apr 9, 2025

WIP

c502f4d

[pre-commit.ci] auto fixes from pre-commit.com hooks

cc4fa8f

for more information, see https://pre-commit.ci

This was referenced Apr 10, 2025

Mystery error while running SC2 #3850

Open

Sc2 deterministic mode #3854

Merged

Method to bypass matching, and assign a labels to all peaks given templates and SVD representation #3856

Merged

yger and others added 5 commits April 10, 2025 16:54

Make delete_mixtures optional

2f1674b

Better logs

8728273

[pre-commit.ci] auto fixes from pre-commit.com hooks

3c854bb

for more information, see https://pre-commit.ci

Merge branch 'returned_svd' of github.com:yger/spikeinterface into re…

d7a4b1c

…turned_svd

[pre-commit.ci] auto fixes from pre-commit.com hooks

14dc3bb

for more information, see https://pre-commit.ci

samuelgarcia approved these changes Apr 16, 2025

View reviewed changes

samuelgarcia merged commit f4b908c into SpikeInterface:main Apr 16, 2025
15 checks passed

Conversation

yger commented Apr 8, 2025

Uh oh!

Uh oh!

Uh oh!

yger Apr 9, 2025

Choose a reason for hiding this comment

Uh oh!

yger commented Apr 9, 2025

Uh oh!

b-grimaud commented Apr 9, 2025

Uh oh!

yger commented Apr 9, 2025

Uh oh!

b-grimaud commented Apr 10, 2025

Uh oh!

yger commented Apr 10, 2025

Uh oh!

yger commented Apr 10, 2025

Uh oh!

yger commented Apr 10, 2025

Uh oh!

samuelgarcia commented Apr 16, 2025

Uh oh!

Uh oh!

h-mayorquin commented Apr 16, 2025

Uh oh!

b-grimaud commented Apr 30, 2025

Uh oh!

yger commented Apr 30, 2025

Uh oh!

b-grimaud commented Apr 30, 2025

Uh oh!

yger commented Apr 30, 2025

Uh oh!

b-grimaud commented May 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants