Returned svd#3847
Conversation
for more information, see https://pre-commit.ci
| nbefore = int(ms_before * fs / 1000.0) | ||
| nafter = int(ms_after * fs / 1000.0) | ||
|
|
||
| sorting = np.zeros(valid_peaks.size, dtype=minimum_spike_dtype) |
|
This is good to me @samuelgarcia am, and now, with such a PR, we can even use KS clustering internally in SC2. Keeping everything else constant, the perfs are really good! Clustering is still currently the remaining component that should be optimized (graph_clustering is an attempt) |
for more information, see https://pre-commit.ci
|
I'm getting very different results than with the older version with default params on the same data. Previous one : New one : I also get some warnings : However, if I lower the detection threshold this warning disappears so I'm guessing it's due to too few spikes ? I also get this one at the end, which I didn't before, presumably when saving the analyzer ? |
|
Can you pull again and retry? I put back the circus clustering as a default (so you should not see graph clustering and face these warnings), but with svd estimation of the templates. This is faster, and in my case finding only hundreds of templates, that do make sense without memory saturation hopefully. You should not get only 26, or are we taking about the same data ? Otherwise I'll check tomorrow, but this is clearly not the numer I am getting, so something must be strange |
|
Yes, that was my bad, I got the most recent commits and it is indeed working better. With this PR, I get : Then OOM error as it hits I then merged with #3856 and get : Then OOM again. Looking at the commits, it doesn't look like |
|
Arg yes, sorry, I'll make an option to make this optional and you should be good to go. Comment the line for now, but this will be exposed at a higher level. At least, templates are computed from svd in memory and thus not more errors because of estimate_templates() |
|
#3856 is a different issue, not for your own purpose. This is a way, for low density probes, to avoid template matching when templates are found, but instead assign labels to all peaks given the SVD projections. Because the clustering is performed only on a subset of the peaks. But this is only interesting for low density probes, where template matching can sometimes do more harm than good. In your case (dense, lots of channels), you should not activate such a mode |
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
|
Now this is good to go for you, remove_mixtures is disabled by default, this should work on your side, let me know |
|
bien joué! |
|
I think this broke some tests. |
|
Apologies for the delay, I was away and had not been able to try it out more thoroughly. I can process a 10 minutes long recording in about 18 hours with I had to set up a less-than-elegant solution of having a 500 GB swap file on an SSD for some other computing tasks, which might explain why things are slow. I'm currently trying again with lower Results look good with default parameters, but the quality metrics suggest some unit contamination. I'll try to mess around with clustering params to see if it can be improved. |
|
This has been merged to main, but strange that you need such a large amount of swap. Nothing is written to disk, so I'm rather curious... Note also that torch is not really faster in case of wobble and circus-omp-svd, maybe this could be the reason also of latencies. I'll investigate why the final merging is so long, because this should not be the case. How many units are found? |
|
The size of the swap space is because of something unrelated, I think at most SC2 used 25-30 GB of it. |
|
Ok, then I'll make some tests with such high number of units. If you update from main your branch, the memory consumption should have been reduced during matching (if using circus-omp-svd) because of #3889 that I've just merged |
|
Tested a bit more, doesn't make much of a difference with or without torch. I also sometimes get the following warning, especially, but not exclusively, if I reduce |
Make an example, with SC2, of how one can avoid template estimation from raw data with SVD components with graph-based clustering (or circus clustering). Note that there is also a slight refactoring of SC2 to make the code more component-friendly while writing the paper. With such a version, there should be no memory problem with very large number of templates and/or electrodes. This is solving #3722