Skip to content

Commit 24b116c

Browse files
committed
Docs: cover Gwilliams/Armeni/Schoffelen + preprocessing pipeline
Refresh user-facing docs for the post-refactor package: per-dataset pages for Gwilliams2022, Armeni2022, Schoffelen2019; new pages for the preprocessing pipeline (Pipeline, steps, config cascade) and the tasks module (TaskProtocol, per-dataset task classes); updated overview, quickstart, install, README, and TOC. API reference now autosums the new datasets, tasks, preprocessing module, and download/H5/standardisation mixins. Drops the stale constants_utils stub.
1 parent 0af5659 commit 24b116c

46 files changed

Lines changed: 2113 additions & 64 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

README.md

Lines changed: 41 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,12 +2,16 @@
22

33
> The current primary use of the PNPL library is for the LibriBrain competition. [Click here](https://neural-processing-lab.github.io/2025-libribrain-competition/) to learn more and get started!
44
5-
Welcome to PNPL — a Python toolkit for loading and processing brain datasets for deep learning. The package ships the LibriBrain 2025 dataset family plus shared preprocessing and task utilities.
5+
Welcome to PNPL — a Python toolkit for loading and processing brain
6+
datasets for deep learning. The package now ships four MEG dataset
7+
loaders (LibriBrain, MEG-MASC, Armeni 2022, MOUS) plus a composable
8+
preprocessing pipeline and shared task abstractions.
69

710
## Features
811
- Friendly dataset APIs backed by real MEG recordings
9-
- Batteries‑included standardization, clipping, and windowing
10-
- LibriBrain 2025 dataset support with optional on‑demand download
12+
- Composable preprocessing pipeline (`bads+headpos+sss+notch+bp+ds`, etc.)
13+
- On-demand download from Hugging Face (LibriBrain), OSF (MEG-MASC), and Radboud WebDAV (Armeni, MOUS)
14+
- Task-based API: pick a task object, get `(x, y)` (or `(x, y, info)`) windows
1115
- Works with PyTorch `DataLoader` out of the box
1216
- Clean namespace and lazy imports to keep startup fast
1317

@@ -44,11 +48,36 @@ speech_ds = LibriBrainSpeech(data_path="./data/LibriBrain", partition="train")
4448
phoneme_ds = LibriBrainPhoneme(data_path="./data/LibriBrain", partition="train")
4549
```
4650

51+
The same task-based pattern works for the other corpora:
52+
53+
```python
54+
from pnpl.datasets import Gwilliams2022, Armeni2022, Schoffelen2019
55+
from pnpl.tasks.gwilliams2022 import PhonemeClassification
56+
57+
meg_masc = Gwilliams2022(
58+
data_path="./data/meg_masc",
59+
task=PhonemeClassification(tmin=-0.2, tmax=0.6),
60+
include_subjects=["01"], include_sessions=["0"], include_tasks=["0"],
61+
preprocessing="notch+bp+ds",
62+
)
63+
```
64+
4765
## Included Datasets
48-
- `pnpl` includes the `libribrain2025` dataset family together with shared preprocessing and task utilities.
66+
67+
| Class | Source | Auth |
68+
| --- | --- | --- |
69+
| `LibriBrain` (+ `LibriBrainSpeech`/`Phoneme`/`Word`/`Sentence`) | Hugging Face `pnpl/LibriBrain` | none |
70+
| `Gwilliams2022` (MEG-MASC) | OSF `ag3kj` | none |
71+
| `Armeni2022` | Radboud `DSC_3011085.05_995_v1` | Radboud credentials |
72+
| `Schoffelen2019` (MOUS) | Radboud `DSC_3011020.09_236_v1` | Radboud credentials |
73+
74+
For the Radboud-hosted datasets, set `RADBOUD_USERNAME` and
75+
`RADBOUD_PASSWORD` (an approved data-sharing agreement is required
76+
before access is granted).
4977

5078
## Support
5179
In case of any questions or problems, please get in touch through [our Discord server](https://discord.gg/Fqr8gJnvSh).
80+
5281
## Quickstart
5382

5483
Load a single run of the LibriBrain Speech dataset and iterate samples:
@@ -79,6 +108,14 @@ We publish documentation with Jupyter Book and GitHub Pages.
79108
- Local preview: `pip install -r docs/requirements.txt && jupyter-book build docs/` then open `docs/_build/html/index.html`.
80109
- GitHub Pages: when made public, enable Pages via repo settings to publish automatically from the existing workflow.
81110

111+
The docs cover:
112+
113+
- Per-dataset pages (`docs/libribrain.md`, `docs/gwilliams2022.md`,
114+
`docs/armeni2022.md`, `docs/schoffelen2019.md`)
115+
- The preprocessing pipeline (`docs/preprocessing.md`) and tasks
116+
(`docs/tasks.md`)
117+
- Tutorials for the LibriBrain competition tracks
118+
82119
## Contributing
83120
We welcome contributions from the community!
84121

docs/_toc.yml

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,16 @@ parts:
66
- file: install
77
- file: quickstart
88
- file: datasets
9+
- caption: Datasets
10+
chapters:
911
- file: libribrain
12+
- file: gwilliams2022
13+
- file: armeni2022
14+
- file: schoffelen2019
15+
- caption: Pipelines
16+
chapters:
17+
- file: preprocessing
18+
- file: tasks
1019
- caption: Tutorials
1120
chapters:
1221
- file: LibriBrain_Competition_Speech_Detection.ipynb
Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
pnpl.datasets.armeni2022.dataset.Armeni2022
2+
===========================================
3+
4+
.. currentmodule:: pnpl.datasets.armeni2022.dataset
5+
6+
.. autoclass:: Armeni2022
7+
8+
9+
.. automethod:: __init__
10+
11+
12+
.. rubric:: Methods
13+
14+
.. autosummary::
15+
16+
~Armeni2022.__init__
17+
~Armeni2022.calculate_standardization_params
18+
~Armeni2022.clip_sample
19+
~Armeni2022.close_h5_files
20+
~Armeni2022.ensure_directory
21+
~Armeni2022.ensure_file
22+
~Armeni2022.get_bids_raw_path
23+
~Armeni2022.get_calibration_files
24+
~Armeni2022.get_derivatives_path
25+
~Armeni2022.get_events_path
26+
~Armeni2022.get_h5_dataset
27+
~Armeni2022.get_h5_path
28+
~Armeni2022.get_headpos_path
29+
~Armeni2022.get_meg_dir
30+
~Armeni2022.get_preprocessed_path
31+
~Armeni2022.get_sfreq_from_h5
32+
~Armeni2022.init_continuous_h5
33+
~Armeni2022.load_continuous_window
34+
~Armeni2022.load_continuous_window_from_sample
35+
~Armeni2022.load_head_positions
36+
~Armeni2022.load_preprocessed_bids
37+
~Armeni2022.load_raw_bids
38+
~Armeni2022.prefetch_files
39+
~Armeni2022.raw_bids_exists
40+
~Armeni2022.resolve_remote_file
41+
~Armeni2022.setup_standardization
42+
~Armeni2022.standardize
43+
44+
45+
46+
47+
48+
.. rubric:: Attributes
49+
50+
.. autosummary::
51+
52+
~Armeni2022.RADBOUD_DATASET_URL
53+
~Armeni2022.RADBOUD_PASSWORD_ENV
54+
~Armeni2022.RADBOUD_USERNAME_ENV
55+
~Armeni2022.broadcasted_means
56+
~Armeni2022.broadcasted_stds
57+
~Armeni2022.channel_means
58+
~Armeni2022.channel_stds
59+
~Armeni2022.label_info
60+
~Armeni2022.n_channels
61+
~Armeni2022.n_times
62+
63+
Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
pnpl.datasets.gwilliams2022.dataset.Gwilliams2022
2+
=================================================
3+
4+
.. currentmodule:: pnpl.datasets.gwilliams2022.dataset
5+
6+
.. autoclass:: Gwilliams2022
7+
8+
9+
.. automethod:: __init__
10+
11+
12+
.. rubric:: Methods
13+
14+
.. autosummary::
15+
16+
~Gwilliams2022.__init__
17+
~Gwilliams2022.calculate_standardization_params
18+
~Gwilliams2022.clip_sample
19+
~Gwilliams2022.close_h5_files
20+
~Gwilliams2022.ensure_file
21+
~Gwilliams2022.get_bids_raw_path
22+
~Gwilliams2022.get_calibration_files
23+
~Gwilliams2022.get_dataset_manifest
24+
~Gwilliams2022.get_derivatives_path
25+
~Gwilliams2022.get_elp_path
26+
~Gwilliams2022.get_events_path
27+
~Gwilliams2022.get_h5_dataset
28+
~Gwilliams2022.get_h5_path
29+
~Gwilliams2022.get_headpos_path
30+
~Gwilliams2022.get_hsp_path
31+
~Gwilliams2022.get_markers_path
32+
~Gwilliams2022.get_meg_dir
33+
~Gwilliams2022.get_preprocessed_path
34+
~Gwilliams2022.get_sfreq_from_h5
35+
~Gwilliams2022.init_continuous_h5
36+
~Gwilliams2022.list_remote_files
37+
~Gwilliams2022.load_continuous_window
38+
~Gwilliams2022.load_continuous_window_from_sample
39+
~Gwilliams2022.load_head_positions
40+
~Gwilliams2022.load_preprocessed_bids
41+
~Gwilliams2022.load_raw_bids
42+
~Gwilliams2022.prefetch_files
43+
~Gwilliams2022.raw_bids_exists
44+
~Gwilliams2022.resolve_remote_file
45+
~Gwilliams2022.setup_standardization
46+
~Gwilliams2022.standardize
47+
48+
49+
50+
51+
52+
.. rubric:: Attributes
53+
54+
.. autosummary::
55+
56+
~Gwilliams2022.OSF_API_BASE
57+
~Gwilliams2022.OSF_FILES_BASE
58+
~Gwilliams2022.OSF_PROJECT_FALLBACKS
59+
~Gwilliams2022.OSF_PROJECT_ID
60+
~Gwilliams2022.OSF_TOKEN_ENV
61+
~Gwilliams2022.broadcasted_means
62+
~Gwilliams2022.broadcasted_stds
63+
~Gwilliams2022.channel_means
64+
~Gwilliams2022.channel_stds
65+
~Gwilliams2022.label_info
66+
~Gwilliams2022.n_channels
67+
~Gwilliams2022.n_times
68+
69+
Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
pnpl.datasets.libribrain2025.compat.LibriBrainPhoneme
2+
=====================================================
3+
4+
.. currentmodule:: pnpl.datasets.libribrain2025.compat
5+
6+
.. autoclass:: LibriBrainPhoneme
7+
8+
9+
.. automethod:: __init__
10+
11+
12+
.. rubric:: Methods
13+
14+
.. autosummary::
15+
16+
~LibriBrainPhoneme.__init__
17+
~LibriBrainPhoneme.calculate_standardization_params
18+
~LibriBrainPhoneme.clip_sample
19+
~LibriBrainPhoneme.close_h5_files
20+
~LibriBrainPhoneme.ensure_file
21+
~LibriBrainPhoneme.ensure_file_download
22+
~LibriBrainPhoneme.get_bids_raw_path
23+
~LibriBrainPhoneme.get_calibration_files
24+
~LibriBrainPhoneme.get_derivatives_path
25+
~LibriBrainPhoneme.get_events_path
26+
~LibriBrainPhoneme.get_h5_dataset
27+
~LibriBrainPhoneme.get_h5_path
28+
~LibriBrainPhoneme.get_headpos_path
29+
~LibriBrainPhoneme.get_preprocessed_path
30+
~LibriBrainPhoneme.get_sfreq_from_h5
31+
~LibriBrainPhoneme.init_continuous_h5
32+
~LibriBrainPhoneme.load_continuous_window
33+
~LibriBrainPhoneme.load_continuous_window_from_sample
34+
~LibriBrainPhoneme.load_head_positions
35+
~LibriBrainPhoneme.load_preprocessed_bids
36+
~LibriBrainPhoneme.load_raw_bids
37+
~LibriBrainPhoneme.prefetch_files
38+
~LibriBrainPhoneme.raw_bids_exists
39+
~LibriBrainPhoneme.setup_standardization
40+
~LibriBrainPhoneme.standardize
41+
42+
43+
44+
45+
46+
.. rubric:: Attributes
47+
48+
.. autosummary::
49+
50+
~LibriBrainPhoneme.HUGGINGFACE_FALLBACK_REPOS
51+
~LibriBrainPhoneme.HUGGINGFACE_REPO
52+
~LibriBrainPhoneme.broadcasted_means
53+
~LibriBrainPhoneme.broadcasted_stds
54+
~LibriBrainPhoneme.channel_means
55+
~LibriBrainPhoneme.channel_stds
56+
~LibriBrainPhoneme.label_info
57+
~LibriBrainPhoneme.n_channels
58+
~LibriBrainPhoneme.n_times
59+
60+
Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
pnpl.datasets.libribrain2025.compat.LibriBrainSpeech
2+
====================================================
3+
4+
.. currentmodule:: pnpl.datasets.libribrain2025.compat
5+
6+
.. autoclass:: LibriBrainSpeech
7+
8+
9+
.. automethod:: __init__
10+
11+
12+
.. rubric:: Methods
13+
14+
.. autosummary::
15+
16+
~LibriBrainSpeech.__init__
17+
~LibriBrainSpeech.calculate_standardization_params
18+
~LibriBrainSpeech.clip_sample
19+
~LibriBrainSpeech.close_h5_files
20+
~LibriBrainSpeech.ensure_file
21+
~LibriBrainSpeech.ensure_file_download
22+
~LibriBrainSpeech.get_bids_raw_path
23+
~LibriBrainSpeech.get_calibration_files
24+
~LibriBrainSpeech.get_derivatives_path
25+
~LibriBrainSpeech.get_events_path
26+
~LibriBrainSpeech.get_h5_dataset
27+
~LibriBrainSpeech.get_h5_path
28+
~LibriBrainSpeech.get_headpos_path
29+
~LibriBrainSpeech.get_preprocessed_path
30+
~LibriBrainSpeech.get_sfreq_from_h5
31+
~LibriBrainSpeech.init_continuous_h5
32+
~LibriBrainSpeech.load_continuous_window
33+
~LibriBrainSpeech.load_continuous_window_from_sample
34+
~LibriBrainSpeech.load_head_positions
35+
~LibriBrainSpeech.load_preprocessed_bids
36+
~LibriBrainSpeech.load_raw_bids
37+
~LibriBrainSpeech.prefetch_files
38+
~LibriBrainSpeech.raw_bids_exists
39+
~LibriBrainSpeech.setup_standardization
40+
~LibriBrainSpeech.standardize
41+
42+
43+
44+
45+
46+
.. rubric:: Attributes
47+
48+
.. autosummary::
49+
50+
~LibriBrainSpeech.HUGGINGFACE_FALLBACK_REPOS
51+
~LibriBrainSpeech.HUGGINGFACE_REPO
52+
~LibriBrainSpeech.broadcasted_means
53+
~LibriBrainSpeech.broadcasted_stds
54+
~LibriBrainSpeech.channel_means
55+
~LibriBrainSpeech.channel_stds
56+
~LibriBrainSpeech.label_info
57+
~LibriBrainSpeech.n_channels
58+
~LibriBrainSpeech.n_times
59+
60+

0 commit comments

Comments
 (0)