Add PreprocessingPipeline by chrishalcrow · Pull Request #3438 · SpikeInterface/spikeinterface

chrishalcrow · 2024-09-25T10:13:57Z

Add a PreprocessingPipeline class, which contains ordered preprocessing steps and their kwargs in a dictionary.

Docs:
https://spikeinterface--3438.org.readthedocs.build/en/3438/modules/preprocessing.html#the-preprocessing-pipeline
https://spikeinterface--3438.org.readthedocs.build/en/3438/how_to/build_pipeline_with_dicts.html

You can apply_preprocessing_pipeline to a recording to make a preprocessed recording:

preprocessor_dict = {'bandpass_filter': {'freq_max': 3000}, 'common_reference': {}}

from spikeinterface.preprocessing import apply_preprocessing_pipeline
preprocessed_recording = apply_preprocessing_pipeline(recording, preprocessor_dict)

Under the hood, this uses the _apply method of the new PreprocessingPipeline.

Also adds a function which takes in a provenance.json, provenance.pkl, 'recording.jsonorrecording.pklprovenance file and makes apreprocessor_dict`. So it's easy to extract preprocessing steps from a saved recording.

from spikeinterface.preprocessing import get_preprocessing_dict_from_json
my_dict = get_preprocessing_dict_from_provenance('/path/to/provenance.json')

Can also get it from an analyzer, using get_preprocessing_dict_from_analyzer function.

After you load this, you can either apply the precomputable_kwargs or ignore them and compute on application:

# this will apply the precomputed stuff, like the `M` and `W` matrices from whitening:
pp_rec = si.apply_preprocessing_pipeline(rec, my_dict, apply_precomputed_kwargs=True)
# this will ignore this stuff, and recompute the kwargs on application:
pp_rec = si.apply_preprocessing_pipeline(rec, my_dict, apply_precomputed_kwargs=False)

PR allow for some cool things:

Users can pass a single dictionary to construct a preprocessed recording (as above). Hence it completes the “dictionary workflow”; since you can use dicts in sorting, run_sorter, and postprocessing in compute.
Users can easily visualise their preprocessing pipeline using the repr, including an HTML repr in Jupyter notebook
Increases portability between labs, and should make giving advice to users easier (from us, and from spike sorting developers), since we can just say "Oh, for KS4 NP2.0 we use this dict for preprocessing".
Increases the usefulness of our provenance system, since you can reconstruct human-readable preprocessing steps from the provenance.json file without the original recording (and worrying about paths).

The repr currently looks like this:

zm711

I guess I'm out of the loop on this. So I'm not sure how helpful my review of code would be. I would prefer to hear what Alessio and Sam think to feel safe. I think my only concern is that dictionaries since 3.7 maintain order of insertion, but since preprocessing steps are ordered (doing filtering then whitening is not the same as whitening then filtering) I would feel more comfortable if we had some way to guarantee the order rather than relying on dictionary implementation details since they've changed once before. Right?

zm711 · 2025-06-13T15:08:02Z

+            provenance_dict = pickle.load(f)
+
+    pipeline_dict_from_provenance = {}
+    _load_pp_from_dict(provenance_dict, pipeline_dict_from_provenance)


shouldn't this return something?

This _load_pp_from_dict is a gross recursive function since our provenance json/pkl are gross recursive dicts. Due to this structure, it returns the kwargs of the recursive step it's at. So it does return something, but the important bit of the function is that it modifies pipeline_dict_from_provenance. It does return something, so maybe it's best practice to put that returned value somewhere (which I've done). I've also significantly updated the _load_pp_from_dict docstring and variable names, which should hopefully make the purpose of the function clearer.

chrishalcrow · 2025-06-16T07:15:32Z

I guess I'm out of the loop on this. So I'm not sure how helpful my review of code would be. I would prefer to hear what Alessio and Sam think to feel safe. I think my only concern is that dictionaries since 3.7 maintain order of insertion, but since preprocessing steps are ordered (doing filtering then whitening is not the same as whitening then filtering) I would feel more comfortable if we had some way to guarantee the order rather than relying on dictionary implementation details since they've changed once before. Right?

It was incredibly helpful - thanks!!

They made the change in 3.6, and from 3.7 it was added to the language spec ("Changed in version 3.7: Dictionary order is guaranteed to be insertion order." from the bottom of https://docs.python.org/3/library/stdtypes.html#dict), as ruled by Guido (https://mail.python.org/pipermail/python-dev/2017-December/151283.html). So I think it's pretty safe.

Not against implementing OrderedDicts under the hood, but I'd prefer to allow the user to pass a plain dict, to avoid them having to from collections import OrderedDict. If we used OrderdDicts in the codebase, it would be pretty easy to update if python broke dictionary ordering. But I'm personally in favour of keeping it the way it is. The tests will (sometimes) fail if ordered is removed.

alejoe91

Thanks for this amazing work Chris! My main point is to make the functions more generally available to recordings, not just provenance files.

alejoe91 · 2025-06-16T14:35:24Z

+    preprocessing_pipeline = PreprocessingPipeline(recording, preprocessing_dict)
+    # to view the pipeline:
+    preprocessing_pipeline
+


You could paste html here, right?

alejoe91 · 2025-06-16T14:38:54Z

+    return preprocessed_recording
+
+
+def get_preprocessing_dict_from_provenance(recording_provenance_path):


I don't think this needs to be provenance, since you could pass it any json/pkl recording file. What about get_preprocessing_pipeline_from_file?

Yup, sounds good.

Note: I prefer to return the dict than a PreprocessingPipeline itself since this allows the user to then use apply_pipeline(recording, pipeline_dict) and to share it easily. But I could be convinced otherwise!

alejoe91 · 2025-06-16T14:39:18Z

+    return pipeline_dict
+
+
+def _load_pp_from_dict(prov_dict, kwargs_dict):


I would make this public. It could be helpful

I'm against this. Mostly because this funcitons main purpose is to modify kwargs_dict. So it doesn't return what you would naively expect. It's like this to make the recursive loading work. In this last round of refactoring, I've added a _make_pipeline_dict_from_recording_dict - maybe that could be made public?? Do you really think it's helpful?

Co-authored-by: Alessio Buccino <alejoe9187@gmail.com>

…row/spikeinterface into preprocessing-pipeline

chrishalcrow · 2025-06-17T13:12:48Z

Tests failing due to #3990

alejoe91 · 2025-06-17T17:24:40Z

LGTM! Thanks Chris this is going to be supeeeer helpful!

zm711 · 2025-06-17T17:28:34Z

Not required I guess, but we could fix both of them once tests pass so that we don't forget to fix in the future.

Co-authored-by: Zach McKenzie <92116279+zm711@users.noreply.github.com>

…row/spikeinterface into preprocessing-pipeline

chrishalcrow · 2025-06-18T08:38:58Z

Just fixed a couple of lil doc things - happy once tests pass :)

zm711

I didn't reread the code, but at least from the documentation side this looks good to me too

zm711 · 2025-06-18T13:47:18Z

+
+
+pp_names_to_functions = {preprocessor.__name__: preprocessor for preprocessor in preprocessor_dict.values()}
+pp_names_to_classes = {pp_function.__name__: pp_class for pp_class, pp_function in _all_preprocesser_dict.items()}


As an aside if we are actually going to use this private dict then I probably shouldn't have left the typo as an homage to history :). We should consider fixing that in another PR....

alejoe91 · 2025-06-18T15:20:13Z

Ok to merge for me. @samuelgarcia ok?

chrishalcrow added 2 commits September 25, 2024 11:06

add PreprocessingPipeline

d7bb297

Merge branch 'main' into preprocessing-pipeline

d0e74f7

chrishalcrow added enhancement New feature or request preprocessing Related to preprocessing module labels Sep 25, 2024

alejoe91 modified the milestone: 0.101.2 Oct 1, 2024

chrishalcrow and others added 4 commits December 6, 2024 15:57

add motion correct and nice html repr

8252f8f

add preprocessing names_to_funcitons dict

8436eb2

delete pp_name_to_function

5c19765

Merge branch 'main' into preprocessing-pipeline

5e202c3

chrishalcrow mentioned this pull request Feb 18, 2025

Add parents in HTML representation and always print class name #3700

Merged

chrishalcrow and others added 7 commits February 25, 2025 09:26

Unifty rerp with Extractors

36fa35a

Merge branch 'main' into preprocessing-pipeline

26eac1f

refactor

95de48f

add future

a3070fd

add first test

3b13d57

add tests and docs

5bcca60

test and doc improvements

2cc22a4

chrishalcrow mentioned this pull request Jun 3, 2025

Remove classes from extractor and preprocessing __init__ #3898

Merged

alejoe91 mentioned this pull request Jun 6, 2025

Save the preprocessing pipelines for simple reuse in curation GUI #1103

Open

alejoe91 added this to the 0.103.0 milestone Jun 11, 2025

chrishalcrow mentioned this pull request Jun 12, 2025

Add DetectAndRemoveBadChannelsRecording and DetectAndInterpolateBadChannelsRecording classes #3685

Merged

chrishalcrow and others added 8 commits June 12, 2025 10:01

add PreprocessingPipeline

37accac

add motion correct and nice html repr

88b88e9

add preprocessing names_to_funcitons dict

288b886

delete pp_name_to_function

4dc0df4

Unifty rerp with Extractors

4576153

refactor

7d87bc3

add future

5fef8c8

add first test

e5946c8

zm711 reviewed Jun 13, 2025

View reviewed changes

respond to review

373acc4

alejoe91 requested changes Jun 16, 2025

View reviewed changes

chrishalcrow and others added 6 commits June 17, 2025 14:37

respond to review

3c98db3

Update src/spikeinterface/preprocessing/pipeline.py

cee6485

Co-authored-by: Alessio Buccino <alejoe9187@gmail.com>

Update src/spikeinterface/preprocessing/pipeline.py

7a64053

Co-authored-by: Alessio Buccino <alejoe9187@gmail.com>

remove provenance name

13d5c09

Merge branch 'preprocessing-pipeline' of https://github.com/chrishalc…

e4df0f7

…row/spikeinterface into preprocessing-pipeline

Merge branch 'main' into preprocessing-pipeline

a22274a

alejoe91 approved these changes Jun 17, 2025

View reviewed changes

Merge branch 'main' into preprocessing-pipeline

abdb810

zm711 reviewed Jun 17, 2025

View reviewed changes

Comment thread examples/how_to/build_pipeline_with_dicts.py Outdated

zm711 reviewed Jun 17, 2025

View reviewed changes

Comment thread doc/how_to/build_pipeline_with_dicts.rst Outdated

alejoe91 and others added 4 commits June 18, 2025 09:29

Apply suggestions from code review

e78d677

Co-authored-by: Zach McKenzie <92116279+zm711@users.noreply.github.com>

Fix link in tutorials custom index

35792a7

Merge branch 'preprocessing-pipeline' of https://github.com/chrishalc…

361756c

…row/spikeinterface into preprocessing-pipeline

fix raw html

7452278

zm711 approved these changes Jun 18, 2025

View reviewed changes

chrishalcrow and others added 3 commits June 19, 2025 10:28

apply_preprocessing_pipeline and BaseRecording

e8a9cf4

oups

bd12490

Merge branch 'main' into preprocessing-pipeline

a9fcb94

alejoe91 approved these changes Jun 19, 2025

View reviewed changes

alejoe91 merged commit 172ba18 into SpikeInterface:main Jun 19, 2025
15 checks passed

chrishalcrow deleted the preprocessing-pipeline branch July 30, 2025 10:35

		return preprocessed_recording


		def get_preprocessing_dict_from_provenance(recording_provenance_path):

		return pipeline_dict


		def _load_pp_from_dict(prov_dict, kwargs_dict):



		pp_names_to_functions = {preprocessor.__name__: preprocessor for preprocessor in preprocessor_dict.values()}
		pp_names_to_classes = {pp_function.__name__: pp_class for pp_class, pp_function in _all_preprocesser_dict.items()}

Conversation

chrishalcrow commented Sep 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zm711 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chrishalcrow commented Jun 16, 2025

Uh oh!

alejoe91 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chrishalcrow commented Jun 17, 2025

Uh oh!

alejoe91 commented Jun 17, 2025

Uh oh!

Uh oh!

Uh oh!

zm711 commented Jun 17, 2025

Uh oh!

chrishalcrow commented Jun 18, 2025

Uh oh!

zm711 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alejoe91 commented Jun 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

chrishalcrow commented Sep 25, 2024 •

edited

Loading