The current primary use of the PNPL library is for the LibriBrain competition. Click here to learn more and get started!
Welcome to PNPL — a Python toolkit for loading and processing brain datasets for deep learning. The package now ships four MEG dataset loaders (LibriBrain, MEG-MASC, Armeni 2022, MOUS) plus a composable preprocessing pipeline and shared task abstractions.
- Friendly dataset APIs backed by real MEG recordings
- Composable preprocessing pipeline (
bads+headpos+sss+notch+bp+ds, etc.) - On-demand download from Hugging Face (LibriBrain), OSF (MEG-MASC), Radboud WebDAV (Armeni, MOUS), and OpenNeuro (LittlePrince)
- Task-based API: pick a task object, get
(x, y)(or(x, y, info)) windows - Works with PyTorch
DataLoaderout of the box - Clean namespace and lazy imports to keep startup fast
pip install pnpl
This installs the package and its core dependencies.
A common entry point uses a task object:
from pnpl.datasets import LibriBrain
from pnpl.tasks import SpeechDetection
dataset = LibriBrain(
data_path="./data/LibriBrain",
task=SpeechDetection(tmin=0.0, tmax=0.5),
partition="train",
)
sample_data, label = dataset[0]
print(sample_data.shape, label.shape)Dataset-specific wrapper classes are also available:
from pnpl.datasets import LibriBrainSpeech, LibriBrainPhoneme
speech_ds = LibriBrainSpeech(data_path="./data/LibriBrain", partition="train")
phoneme_ds = LibriBrainPhoneme(data_path="./data/LibriBrain", partition="train")The same task-based pattern works for the other corpora:
from pnpl.datasets import Gwilliams2022, Armeni2022, Schoffelen2019
from pnpl.tasks.gwilliams2022 import PhonemeClassification
meg_masc = Gwilliams2022(
data_path="./data/meg_masc",
task=PhonemeClassification(tmin=-0.2, tmax=0.6),
include_subjects=["01"], include_sessions=["0"], include_tasks=["0"],
preprocessing="notch+bp+ds",
)For the full LibriBrain release (deep sub-0 across 9 Sherlock books +
TIMIT + MOCHA-TIMIT + 30 Moth podcasts, plus 32 broad subjects on
Sherlock1 ses-11/ses-12), use LibriBrain100:
from pnpl.datasets import LibriBrain100
from pnpl.tasks import SpeechDetection
ds = LibriBrain100(
data_path="./data/LibriBrain100",
task=SpeechDetection(tmin=0.0, tmax=0.5),
partition="train",
subjects="deep", # or "broad", "all", 0, [1, 2, 3], range(1, 33)
corpus="sherlock", # or "timit", "mocha", "podcasts", "all"
)| Class | Source | Auth |
|---|---|---|
LibriBrain (+ LibriBrainSpeech/Phoneme/Word/Sentence) |
Hugging Face pnpl/LibriBrain |
none |
LibriBrain100 (+ LibriBrain100Speech/Phoneme/Word) |
HF pnpl/LibriBrain ∪ pnpl/LibriBrain2 (deep + broad release) |
none |
Gwilliams2022 (MEG-MASC) |
OSF ag3kj |
none |
Armeni2022 |
Radboud DSC_3011085.05_995_v1 |
Radboud credentials |
Schoffelen2019 (MOUS) |
Radboud DSC_3011020.09_236_v1 |
Radboud credentials |
Pallier2025 (LittlePrince Listen) |
OpenNeuro ds007523 |
none |
For the Radboud-hosted datasets, set RADBOUD_USERNAME and
RADBOUD_PASSWORD (an approved data-sharing agreement is required
before access is granted).
In case of any questions or problems, please get in touch through our Discord server.
Load a single run of the LibriBrain Speech dataset and iterate samples:
from pnpl.datasets.libribrain2025 import constants
from pnpl.datasets import LibriBrainSpeech
ds = LibriBrainSpeech(
data_path="./data/LibriBrain",
preprocessing_str="bads+headpos+sss+notch+bp+ds",
include_run_keys=[constants.RUN_KEYS[0]], # pick a single run
tmin=0.0,
tmax=0.2,
standardize=True,
include_info=True,
)
print(len(ds), "samples")
x, y, info = ds[0]
print(x.shape, y.shape, info["dataset"]) # (channels,time), (time,), "libribrain2025"We publish documentation with Jupyter Book and GitHub Pages.
- Local preview:
pip install -r docs/requirements.txt && jupyter-book build docs/then opendocs/_build/html/index.html. - GitHub Pages: when made public, enable Pages via repo settings to publish automatically from the existing workflow.
The docs cover:
- Per-dataset pages (
docs/libribrain.md,docs/gwilliams2022.md,docs/armeni2022.md,docs/schoffelen2019.md) - The preprocessing pipeline (
docs/preprocessing.md) and tasks (docs/tasks.md) - Tutorials for the LibriBrain competition tracks
We welcome contributions from the community!
- Read the Contributor Guide in
docs/contributing.mdfor setup, coding style, and PR workflow. - Open issues for bugs and enhancements with clear, minimal repros when possible.
- Tests: add/update
pytesttests for any feature or fix.
Quick dev setup:
git clone https://github.com/neural-processing-lab/pnpl.git
cd pnpl
python -m venv .venv && source .venv/bin/activate
pip install -e .
pip install pytest
pytest -q- Check the FAQ at
docs/faq.md. - If something is unclear in the docs, please open a documentation issue.
BSD‑3‑Clause. See LICENSE for details.