Skip to content

Commit dfaf6e6

Browse files
author
Gereon Elvers
committed
Merge refactor into main
Refactor branch is authoritative — conflicts resolved in favor of refactor. Brings in the public pnpl package refactor (mixin-based architecture, task- based dataset, compat wrappers) plus the preload_h5 option lifted onto the shared ContinuousH5Mixin (supersedes fa8f92c).
2 parents fa8f92c + 163edd0 commit dfaf6e6

65 files changed

Lines changed: 5550 additions & 2248 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

README.md

Lines changed: 21 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
> The current primary use of the PNPL library is for the LibriBrain competition. [Click here](https://neural-processing-lab.github.io/2025-libribrain-competition/) to learn more and get started!
44
5-
Welcome to PNPL — a Python toolkit for loading and processing brain datasets for deep learning. It provides ready‑to‑use dataset classes (PyTorch `Dataset`) and utilities with a simple, consistent API.
5+
Welcome to PNPL — a Python toolkit for loading and processing brain datasets for deep learning. The package ships the LibriBrain 2025 dataset family plus shared preprocessing and task utilities.
66

77
## Features
88
- Friendly dataset APIs backed by real MEG recordings
@@ -16,45 +16,37 @@ Welcome to PNPL — a Python toolkit for loading and processing brain datasets f
1616
pip install pnpl
1717
```
1818

19-
This will also take care of all requirements.
19+
This installs the package and its core dependencies.
2020

2121
## Usage
22-
The core functionality of the library is contained in the two Dataset classes `LibriBrainSpeech` and `LibriBrainPhoneme`.
23-
Check out the basic usage:
22+
A common entry point uses a task object:
2423

25-
### LibriBrainSpeech
26-
This wraps the LibriBrain dataset for use in speech detection problems.
2724
```python
28-
from pnpl.datasets import LibriBrainSpeech
25+
from pnpl.datasets import LibriBrain
26+
from pnpl.tasks import SpeechDetection
2927

30-
speech_example_data = LibriBrainSpeech(
31-
data_path="./data/",
32-
include_run_keys = [("0","1","Sherlock1","1")]
28+
dataset = LibriBrain(
29+
data_path="./data/LibriBrain",
30+
task=SpeechDetection(tmin=0.0, tmax=0.5),
31+
partition="train",
3332
)
3433

35-
sample_data, label = speech_example_data[0]
36-
37-
# Print out some basic info about the sample
38-
print("Sample data shape:", sample_data.shape)
39-
print("Label shape:", label.shape)
34+
sample_data, label = dataset[0]
35+
print(sample_data.shape, label.shape)
4036
```
4137

42-
### LibriBrainSpeech
43-
This wraps the LibriBrain dataset for use in phoneme classification problems.
44-
```python
45-
from pnpl.datasets import LibriBrainPhoneme
38+
Dataset-specific wrapper classes are also available:
4639

47-
phoneme_example_data = LibriBrainPhoneme(
48-
data_path="./data/",
49-
include_run_keys = [("0","1","Sherlock1","1")]
50-
)
51-
sample_data, label = phoneme_example_data[0]
40+
```python
41+
from pnpl.datasets import LibriBrainSpeech, LibriBrainPhoneme
5242

53-
# Print out some basic info about the sample
54-
print("Sample data shape:", sample_data.shape)
55-
print("Label shape:", label.shape)
43+
speech_ds = LibriBrainSpeech(data_path="./data/LibriBrain", partition="train")
44+
phoneme_ds = LibriBrainPhoneme(data_path="./data/LibriBrain", partition="train")
5645
```
5746

47+
## Included Datasets
48+
- `pnpl` includes the `libribrain2025` dataset family together with shared preprocessing and task utilities.
49+
5850
## Support
5951
In case of any questions or problems, please get in touch through [our Discord server](https://discord.gg/Fqr8gJnvSh).
6052
## Quickstart
@@ -96,8 +88,8 @@ We welcome contributions from the community!
9688

9789
Quick dev setup:
9890
```bash
99-
git clone https://github.com/neural-processing-lab/pnpl-public.git
100-
cd pnpl-public
91+
git clone https://github.com/neural-processing-lab/pnpl.git
92+
cd pnpl
10193
python -m venv .venv && source .venv/bin/activate
10294
pip install -e .
10395
pip install pytest

docs/_build/html/_sources/install.md

Lines changed: 2 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -12,16 +12,11 @@ pip install pnpl
1212

1313
Core scientific dependencies include `numpy`, `pandas`, `torch`, `h5py`, `mne`, `mne_bids`, and `huggingface_hub`.
1414

15-
```{tip}
16-
To use private/internal datasets as part of the same `pnpl` namespace, also install the overlay package `pnpl-internal` from your private index (or editable checkout). The overlay depends on `pnpl` and contributes additional modules under `pnpl.*`.
17-
```
18-
1915
## Development install (editable)
2016

2117
```bash
22-
git clone https://github.com/neural-processing-lab/pnpl-public.git
23-
cd pnpl-public
18+
git clone https://github.com/neural-processing-lab/pnpl.git
19+
cd pnpl
2420
python -m venv .venv && source .venv/bin/activate
2521
pip install -e .
2622
```
27-

docs/_build/html/install.html

Lines changed: 3 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -370,14 +370,10 @@ <h1>Install<a class="headerlink" href="#install" title="Link to this heading">#<
370370
</pre></div>
371371
</div>
372372
<p>Core scientific dependencies include <code class="docutils literal notranslate"><span class="pre">numpy</span></code>, <code class="docutils literal notranslate"><span class="pre">pandas</span></code>, <code class="docutils literal notranslate"><span class="pre">torch</span></code>, <code class="docutils literal notranslate"><span class="pre">h5py</span></code>, <code class="docutils literal notranslate"><span class="pre">mne</span></code>, <code class="docutils literal notranslate"><span class="pre">mne_bids</span></code>, and <code class="docutils literal notranslate"><span class="pre">huggingface_hub</span></code>.</p>
373-
<div class="admonition tip">
374-
<p class="admonition-title">Tip</p>
375-
<p>To use private/internal datasets as part of the same <code class="docutils literal notranslate"><span class="pre">pnpl</span></code> namespace, also install the overlay package <code class="docutils literal notranslate"><span class="pre">pnpl-internal</span></code> from your private index (or editable checkout). The overlay depends on <code class="docutils literal notranslate"><span class="pre">pnpl</span></code> and contributes additional modules under <code class="docutils literal notranslate"><span class="pre">pnpl.*</span></code>.</p>
376-
</div>
377373
<section id="development-install-editable">
378374
<h2>Development install (editable)<a class="headerlink" href="#development-install-editable" title="Link to this heading">#</a></h2>
379-
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>git<span class="w"> </span>clone<span class="w"> </span>https://github.com/neural-processing-lab/pnpl-public.git
380-
<span class="nb">cd</span><span class="w"> </span>pnpl-public
375+
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>git<span class="w"> </span>clone<span class="w"> </span>https://github.com/neural-processing-lab/pnpl.git
376+
<span class="nb">cd</span><span class="w"> </span>pnpl
381377
python<span class="w"> </span>-m<span class="w"> </span>venv<span class="w"> </span>.venv<span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nb">source</span><span class="w"> </span>.venv/bin/activate
382378
pip<span class="w"> </span>install<span class="w"> </span>-e<span class="w"> </span>.
383379
</pre></div>
@@ -504,4 +500,4 @@ <h2>Development install (editable)<a class="headerlink" href="#development-insta
504500
<footer class="bd-footer">
505501
</footer>
506502
</body>
507-
</html>
503+
</html>

docs/_config.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ logo: pnpl-wordmark.png
88
only_build_toc_files: false
99

1010
repository:
11-
url: https://github.com/neural-processing-lab/pnpl-public
11+
url: https://github.com/neural-processing-lab/pnpl
1212
path_to_book: docs
1313
branch: main
1414

docs/contributing.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,8 @@ Thanks for your interest in improving PNPL! This guide helps you get set up, mak
1313

1414
## Development setup
1515
```bash
16-
git clone https://github.com/neural-processing-lab/pnpl-public.git
17-
cd pnpl-public
16+
git clone https://github.com/neural-processing-lab/pnpl.git
17+
cd pnpl
1818
python -m venv .venv && source .venv/bin/activate
1919
pip install -e .
2020
pip install -r docs/requirements.txt # if you build docs locally
@@ -47,4 +47,3 @@ open docs/_build/html/index.html
4747
- Label: bug/enhancement/docs where appropriate.
4848

4949
Thanks for contributing!
50-

docs/datasets.md

Lines changed: 24 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -4,16 +4,31 @@ title: Datasets
44

55
# Public Datasets
66

7-
The `pnpl.datasets` package provides dataset classes designed for deep learning workflows (PyTorch `Dataset`).
7+
The public `pnpl.datasets` package provides the LibriBrain 2025 dataset family plus shared utilities for deep learning workflows.
88

9-
## GroupedDataset
9+
The main entry point is the task-based `LibriBrain` dataset:
1010

11-
Utility dataset to group multiple datasets and expose a unified interface.
11+
```python
12+
from pnpl.datasets import LibriBrain
13+
from pnpl.tasks import SpeechDetection, PhonemeClassification, WordDetection
14+
```
15+
16+
Additional wrapper datasets are also available:
1217

1318
```python
14-
from pnpl.datasets import GroupedDataset, LibriBrainSpeech, LibriBrainPhoneme
19+
from pnpl.datasets import (
20+
GroupedDataset,
21+
LibriBrainSpeech,
22+
LibriBrainPhoneme,
23+
LibriBrainWord,
24+
LibriBrainSentence,
25+
)
1526
```
1627

28+
## GroupedDataset
29+
30+
Utility dataset to group multiple datasets and expose a unified interface.
31+
1732
## HDF5Dataset (base)
1833

1934
`pnpl.datasets.hdf5.HDF5Dataset` is a simple base for datasets backed by MEG signals serialized as HDF5, with standardization and slicing support.
@@ -25,8 +40,8 @@ Key features:
2540

2641
## LibriBrain 2025
2742

28-
- `LibriBrainPhoneme`: phoneme classification from MEG segments.
29-
- `LibriBrainSpeech`: speech/silence time-series labels over a window.
30-
31-
Both rely on a BIDS-like directory structure and can download needed files from Hugging Face.
32-
43+
- `LibriBrain`: task-based dataset entry point
44+
- `LibriBrainSpeech`: speech/silence time-series labels over a window
45+
- `LibriBrainPhoneme`: phoneme classification from MEG segments
46+
- `LibriBrainWord`: word-detection wrapper
47+
- `LibriBrainSentence`: sentence-level dataset wrapper

docs/index.md

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ title: PNPL
77
PNPL is a friendly Python toolkit for loading and processing brain datasets for deep learning. It ships with ready‑to‑use dataset classes (PyTorch `Dataset`) and simple utilities so you can focus on modeling, not file plumbing.
88

99
- [PyPI: `pnpl`](https://pypi.org/project/pnpl/)
10-
- [GitHub: Neural-Processing-Lab/pnpl](https://github.com/neural-processing-lab/pnpl-public)
10+
- [GitHub: Neural-Processing-Lab/pnpl](https://github.com/neural-processing-lab/pnpl)
1111

1212

1313
## Get Started
@@ -18,21 +18,23 @@ PNPL is a friendly Python toolkit for loading and processing brain datasets for
1818
pip install pnpl
1919
```
2020

21-
2) Load a single run of LibriBrain Speech and iterate samples
21+
2) Load LibriBrain with the task-based API
2222

2323
```python
24-
from pnpl.datasets.libribrain2025 import constants
25-
from pnpl.datasets import LibriBrainSpeech
24+
from pnpl.datasets import LibriBrain
25+
from pnpl.tasks import SpeechDetection
2626

27-
ds = LibriBrainSpeech(
27+
ds = LibriBrain(
2828
data_path="./data/LibriBrain",
29-
preprocessing_str="bads+headpos+sss+notch+bp+ds",
30-
include_run_keys=[constants.RUN_KEYS[0]]
31-
)
29+
task=SpeechDetection(tmin=0.0, tmax=0.2),
30+
partition="train",
31+
)
3232
x, y, info = ds[0]
3333
print(x.shape, y.shape) # (channels,time), (time,)
3434
```
3535

36+
Wrapper classes such as `LibriBrainSpeech` and `LibriBrainPhoneme` are also available.
37+
3638
## Explore PNPL
3739

3840
<div class="feature-grid">

docs/install.md

Lines changed: 2 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -12,16 +12,11 @@ pip install pnpl
1212

1313
Core scientific dependencies include `numpy`, `pandas`, `torch`, `h5py`, `mne`, `mne_bids`, and `huggingface_hub`.
1414

15-
```{tip}
16-
To use private/internal datasets as part of the same `pnpl` namespace, also install the overlay package `pnpl-internal` from your private index (or editable checkout). The overlay depends on `pnpl` and contributes additional modules under `pnpl.*`.
17-
```
18-
1915
## Development install (editable)
2016

2117
```bash
22-
git clone https://github.com/neural-processing-lab/pnpl-public.git
23-
cd pnpl-public
18+
git clone https://github.com/neural-processing-lab/pnpl.git
19+
cd pnpl
2420
python -m venv .venv && source .venv/bin/activate
2521
pip install -e .
2622
```
27-

docs/libribrain.md

Lines changed: 27 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -4,19 +4,38 @@ title: LibriBrain
44

55
# LibriBrain
66

7-
The LibriBrain 2025 datasets provide MEG-based tasks with convenient download and caching from Hugging Face.
7+
The LibriBrain 2025 dataset family provides MEG-based speech and language tasks with download/caching support from Hugging Face.
88

99
## Common Arguments
1010

1111
- `data_path`: local root where files are stored / downloaded
12-
- `preprocessing_str`: expected preprocessing string in filenames
13-
- `tmin`, `tmax`: window relative to event (seconds)
12+
- `preprocessing` / `preprocessing_str`: expected preprocessing string in filenames
1413
- `standardize`: z-score channels using per-run stats
1514
- `include_run_keys`: list of run keys to include (see constants.RUN_KEYS)
1615
- `include_info`: include an info dict in each sample
1716
- `download`: if True (default), fetch missing files via Hugging Face
1817

19-
## Speech (binary time series)
18+
## Task-based entry point
19+
20+
```python
21+
from pnpl.datasets import LibriBrain
22+
from pnpl.tasks import SpeechDetection
23+
24+
ds = LibriBrain(
25+
data_path="./data/LibriBrain",
26+
task=SpeechDetection(tmin=0.0, tmax=0.2),
27+
partition="train",
28+
include_info=True,
29+
)
30+
31+
print(len(ds))
32+
```
33+
34+
The task object controls sample collection and label semantics. Public task classes live in `pnpl.tasks`.
35+
36+
## Wrapper datasets
37+
38+
### Speech (binary time series)
2039

2140
```python
2241
from pnpl.datasets import LibriBrainSpeech
@@ -34,9 +53,9 @@ ds = LibriBrainSpeech(
3453
print(len(ds))
3554
```
3655

37-
Each item returns `(data: float32[channels,time], labels: int[time], info: dict)`.
56+
Each item returns `(data: float32[channels,time], labels: int[time], info: dict)` when `include_info=True`.
3857

39-
## Phoneme (classification)
58+
### Phoneme (classification)
4059

4160
```python
4261
from pnpl.datasets import LibriBrainPhoneme
@@ -53,3 +72,5 @@ print(len(ds))
5372
```
5473

5574
Each item returns `(data: float32[channels,time], label_id: int64)`.
75+
76+
`LibriBrainWord` and `LibriBrainSentence` are also available as dataset-specific wrappers.

docs/quickstart.md

Lines changed: 20 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -4,23 +4,18 @@ title: Quickstart
44

55
# Quickstart
66

7-
This page shows short examples loading datasets and iterating samples.
7+
This page shows the task-based LibriBrain entry point together with dataset-specific wrappers.
88

9-
## LibriBrain Speech (public)
9+
## Task-based API
1010

1111
```python
12-
from pnpl.datasets.libribrain2025 import constants
13-
from pnpl.datasets import LibriBrainSpeech
14-
15-
# pick one run to keep it quick
16-
include_run_keys = [constants.RUN_KEYS[0]] # e.g. ('0','1','Sherlock1','1')
12+
from pnpl.datasets import LibriBrain
13+
from pnpl.tasks import SpeechDetection
1714

18-
ds = LibriBrainSpeech(
15+
ds = LibriBrain(
1916
data_path="./data/LibriBrain",
20-
preprocessing_str="bads+headpos+sss+notch+bp+ds",
21-
include_run_keys=include_run_keys,
22-
tmin=0.0,
23-
tmax=0.2,
17+
task=SpeechDetection(tmin=0.0, tmax=0.2),
18+
partition="train",
2419
standardize=True,
2520
include_info=True,
2621
)
@@ -30,15 +25,22 @@ x, y, info = ds[0]
3025
print(x.shape, y.shape, info["dataset"]) # (channels,time), (time,), "libribrain2025"
3126
```
3227

33-
## LibriBrain Phoneme (public)
28+
## Wrapper datasets
3429

3530
```python
3631
from pnpl.datasets.libribrain2025 import constants
37-
from pnpl.datasets import LibriBrainPhoneme
32+
from pnpl.datasets import LibriBrainSpeech, LibriBrainPhoneme
3833

3934
include_run_keys = [constants.RUN_KEYS[0]]
4035

41-
ds = LibriBrainPhoneme(
36+
speech_ds = LibriBrainSpeech(
37+
data_path="./data/LibriBrain",
38+
include_run_keys=include_run_keys,
39+
tmin=0.0,
40+
tmax=0.2,
41+
)
42+
43+
phoneme_ds = LibriBrainPhoneme(
4244
data_path="./data/LibriBrain",
4345
preprocessing_str="bads+headpos+sss+notch+bp+ds",
4446
include_run_keys=include_run_keys,
@@ -47,12 +49,12 @@ ds = LibriBrainPhoneme(
4749
standardize=True,
4850
)
4951

50-
print(len(ds), "samples")
51-
x, y = ds[0]
52+
print(len(speech_ds), "speech samples")
53+
print(len(phoneme_ds), "phoneme samples")
54+
x, y = phoneme_ds[0]
5255
print(x.shape, y.item())
5356
```
5457

5558
```{note}
5659
The first time you instantiate a dataset with `download=True` (default), required files are downloaded from Hugging Face and cached under `data_path`.
5760
```
58-

0 commit comments

Comments
 (0)