Add PreprocessingPipeline#3438
Conversation
zm711
left a comment
There was a problem hiding this comment.
I guess I'm out of the loop on this. So I'm not sure how helpful my review of code would be. I would prefer to hear what Alessio and Sam think to feel safe. I think my only concern is that dictionaries since 3.7 maintain order of insertion, but since preprocessing steps are ordered (doing filtering then whitening is not the same as whitening then filtering) I would feel more comfortable if we had some way to guarantee the order rather than relying on dictionary implementation details since they've changed once before. Right?
| provenance_dict = pickle.load(f) | ||
|
|
||
| pipeline_dict_from_provenance = {} | ||
| _load_pp_from_dict(provenance_dict, pipeline_dict_from_provenance) |
There was a problem hiding this comment.
This _load_pp_from_dict is a gross recursive function since our provenance json/pkl are gross recursive dicts. Due to this structure, it returns the kwargs of the recursive step it's at. So it does return something, but the important bit of the function is that it modifies pipeline_dict_from_provenance. It does return something, so maybe it's best practice to put that returned value somewhere (which I've done). I've also significantly updated the _load_pp_from_dict docstring and variable names, which should hopefully make the purpose of the function clearer.
It was incredibly helpful - thanks!! They made the change in 3.6, and from 3.7 it was added to the language spec ("Changed in version 3.7: Dictionary order is guaranteed to be insertion order." from the bottom of https://docs.python.org/3/library/stdtypes.html#dict), as ruled by Guido (https://mail.python.org/pipermail/python-dev/2017-December/151283.html). So I think it's pretty safe. Not against implementing |
alejoe91
left a comment
There was a problem hiding this comment.
Thanks for this amazing work Chris! My main point is to make the functions more generally available to recordings, not just provenance files.
| preprocessing_pipeline = PreprocessingPipeline(recording, preprocessing_dict) | ||
| # to view the pipeline: | ||
| preprocessing_pipeline | ||
|
|
There was a problem hiding this comment.
You could paste html here, right?
| return preprocessed_recording | ||
|
|
||
|
|
||
| def get_preprocessing_dict_from_provenance(recording_provenance_path): |
There was a problem hiding this comment.
I don't think this needs to be provenance, since you could pass it any json/pkl recording file. What about get_preprocessing_pipeline_from_file?
There was a problem hiding this comment.
Yup, sounds good.
Note: I prefer to return the dict than a PreprocessingPipeline itself since this allows the user to then use apply_pipeline(recording, pipeline_dict) and to share it easily. But I could be convinced otherwise!
| return pipeline_dict | ||
|
|
||
|
|
||
| def _load_pp_from_dict(prov_dict, kwargs_dict): |
There was a problem hiding this comment.
I would make this public. It could be helpful
There was a problem hiding this comment.
I'm against this. Mostly because this funcitons main purpose is to modify kwargs_dict. So it doesn't return what you would naively expect. It's like this to make the recursive loading work. In this last round of refactoring, I've added a _make_pipeline_dict_from_recording_dict - maybe that could be made public?? Do you really think it's helpful?
Co-authored-by: Alessio Buccino <alejoe9187@gmail.com>
Co-authored-by: Alessio Buccino <alejoe9187@gmail.com>
…row/spikeinterface into preprocessing-pipeline
|
Tests failing due to #3990 |
|
LGTM! Thanks Chris this is going to be supeeeer helpful! |
|
Not required I guess, but we could fix both of them once tests pass so that we don't forget to fix in the future. |
Co-authored-by: Zach McKenzie <92116279+zm711@users.noreply.github.com>
…row/spikeinterface into preprocessing-pipeline
|
Just fixed a couple of lil doc things - happy once tests pass :) |
zm711
left a comment
There was a problem hiding this comment.
I didn't reread the code, but at least from the documentation side this looks good to me too
|
|
||
|
|
||
| pp_names_to_functions = {preprocessor.__name__: preprocessor for preprocessor in preprocessor_dict.values()} | ||
| pp_names_to_classes = {pp_function.__name__: pp_class for pp_class, pp_function in _all_preprocesser_dict.items()} |
There was a problem hiding this comment.
As an aside if we are actually going to use this private dict then I probably shouldn't have left the typo as an homage to history :). We should consider fixing that in another PR....
|
Ok to merge for me. @samuelgarcia ok? |
Add a
PreprocessingPipelineclass, which contains ordered preprocessing steps and their kwargs in a dictionary.Docs:
https://spikeinterface--3438.org.readthedocs.build/en/3438/modules/preprocessing.html#the-preprocessing-pipeline
https://spikeinterface--3438.org.readthedocs.build/en/3438/how_to/build_pipeline_with_dicts.html
You can
apply_preprocessing_pipelineto a recording to make a preprocessed recording:Under the hood, this uses the
_applymethod of the newPreprocessingPipeline.Also adds a function which takes in a
provenance.json,provenance.pkl, 'recording.jsonorrecording.pklprovenance file and makes apreprocessor_dict`. So it's easy to extract preprocessing steps from a saved recording.Can also get it from an analyzer, using
get_preprocessing_dict_from_analyzerfunction.After you load this, you can either apply the
precomputable_kwargsor ignore them and compute on application:PR allow for some cool things:
run_sorter, and postprocessing incompute.provenance.jsonfile without the original recording (and worrying about paths).The repr currently looks like this:
