neural-processing-lab
diff --git a/‎README.md‎
Lines changed: 21 additions & 29 deletions b/‎README.md‎
Lines changed: 21 additions & 29 deletions
diff --git a/‎docs/_build/html/_sources/install.md‎
Lines changed: 2 additions & 7 deletions b/‎docs/_build/html/_sources/install.md‎
Lines changed: 2 additions & 7 deletions
diff --git a/‎docs/_build/html/install.html‎
Lines changed: 3 additions & 7 deletions b/‎docs/_build/html/install.html‎
Lines changed: 3 additions & 7 deletions
diff --git a/‎docs/_config.yml‎
Lines changed: 1 addition & 1 deletion b/‎docs/_config.yml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/contributing.md‎
Lines changed: 2 additions & 3 deletions b/‎docs/contributing.md‎
Lines changed: 2 additions & 3 deletions
diff --git a/‎docs/datasets.md‎
Lines changed: 24 additions & 9 deletions b/‎docs/datasets.md‎
Lines changed: 24 additions & 9 deletions
diff --git a/‎docs/index.md‎
Lines changed: 10 additions & 8 deletions b/‎docs/index.md‎
Lines changed: 10 additions & 8 deletions
diff --git a/‎docs/install.md‎
Lines changed: 2 additions & 7 deletions b/‎docs/install.md‎
Lines changed: 2 additions & 7 deletions
diff --git a/‎docs/libribrain.md‎
Lines changed: 27 additions & 6 deletions b/‎docs/libribrain.md‎
Lines changed: 27 additions & 6 deletions
diff --git a/‎docs/quickstart.md‎
Lines changed: 20 additions & 18 deletions b/‎docs/quickstart.md‎
Lines changed: 20 additions & 18 deletions
@@ -2,7 +2,7 @@
 
 > The current primary use of the PNPL library is for the LibriBrain competition. [Click here](https://neural-processing-lab.github.io/2025-libribrain-competition/) to learn more and get started!
 
-Welcome to PNPL — a Python toolkit for loading and processing brain datasets for deep learning. It provides ready‑to‑use dataset classes (PyTorch `Dataset`) and utilities with a simple, consistent API.
+Welcome to PNPL — a Python toolkit for loading and processing brain datasets for deep learning. The package ships the LibriBrain 2025 dataset family plus shared preprocessing and task utilities.
 
 ## Features
 - Friendly dataset APIs backed by real MEG recordings
@@ -16,45 +16,37 @@ Welcome to PNPL — a Python toolkit for loading and processing brain datasets f
 pip install pnpl
 ```
 
-This will also take care of all requirements.
+This installs the package and its core dependencies.
 
 ## Usage
-The core functionality of the library is contained in the two Dataset classes `LibriBrainSpeech` and `LibriBrainPhoneme`.
-Check out the basic usage:
+A common entry point uses a task object:
 
-### LibriBrainSpeech
-This wraps the LibriBrain dataset for use in speech detection problems.
 ```python
-from pnpl.datasets import LibriBrainSpeech
+from pnpl.datasets import LibriBrain
+from pnpl.tasks import SpeechDetection
 
-speech_example_data = LibriBrainSpeech(
-    data_path="./data/",
-    include_run_keys = [("0","1","Sherlock1","1")]
+dataset = LibriBrain(
+    data_path="./data/LibriBrain",
+    task=SpeechDetection(tmin=0.0, tmax=0.5),
+    partition="train",
 )
 
-sample_data, label = speech_example_data[0]
-
-# Print out some basic info about the sample
-print("Sample data shape:", sample_data.shape)
-print("Label shape:", label.shape)
+sample_data, label = dataset[0]
+print(sample_data.shape, label.shape)
 ```
 
-### LibriBrainSpeech
-This wraps the LibriBrain dataset for use in phoneme classification problems.
-```python
-from pnpl.datasets import LibriBrainPhoneme
+Dataset-specific wrapper classes are also available:
 
-phoneme_example_data = LibriBrainPhoneme(
-    data_path="./data/",
-    include_run_keys = [("0","1","Sherlock1","1")]
-)
-sample_data, label = phoneme_example_data[0]
+```python
+from pnpl.datasets import LibriBrainSpeech, LibriBrainPhoneme
 
-# Print out some basic info about the sample
-print("Sample data shape:", sample_data.shape)
-print("Label shape:", label.shape)
+speech_ds = LibriBrainSpeech(data_path="./data/LibriBrain", partition="train")
+phoneme_ds = LibriBrainPhoneme(data_path="./data/LibriBrain", partition="train")
 ```
 
+## Included Datasets
+- `pnpl` includes the `libribrain2025` dataset family together with shared preprocessing and task utilities.
+
 ## Support
 In case of any questions or problems, please get in touch through [our Discord server](https://discord.gg/Fqr8gJnvSh).
 ## Quickstart
@@ -96,8 +88,8 @@ We welcome contributions from the community!
 
 Quick dev setup:
 ```bash
-git clone https://github.com/neural-processing-lab/pnpl-public.git
-cd pnpl-public
+git clone https://github.com/neural-processing-lab/pnpl.git
+cd pnpl
 python -m venv .venv && source .venv/bin/activate
 pip install -e .
 pip install pytest
 
@@ -12,16 +12,11 @@ pip install pnpl
 
 Core scientific dependencies include `numpy`, `pandas`, `torch`, `h5py`, `mne`, `mne_bids`, and `huggingface_hub`.
 
-```{tip}
-To use private/internal datasets as part of the same `pnpl` namespace, also install the overlay package `pnpl-internal` from your private index (or editable checkout). The overlay depends on `pnpl` and contributes additional modules under `pnpl.*`.
-```
-
 ## Development install (editable)
 
 ```bash
-git clone https://github.com/neural-processing-lab/pnpl-public.git
-cd pnpl-public
+git clone https://github.com/neural-processing-lab/pnpl.git
+cd pnpl
 python -m venv .venv && source .venv/bin/activate
 pip install -e .
 ```
-
@@ -370,14 +370,10 @@ <h1>Install<a class="headerlink" href="#install" title="Link to this heading">#<
 </pre></div>
 </div>
 <p>Core scientific dependencies include <code class="docutils literal notranslate"><span class="pre">numpy</span></code>, <code class="docutils literal notranslate"><span class="pre">pandas</span></code>, <code class="docutils literal notranslate"><span class="pre">torch</span></code>, <code class="docutils literal notranslate"><span class="pre">h5py</span></code>, <code class="docutils literal notranslate"><span class="pre">mne</span></code>, <code class="docutils literal notranslate"><span class="pre">mne_bids</span></code>, and <code class="docutils literal notranslate"><span class="pre">huggingface_hub</span></code>.</p>
-<div class="admonition tip">
-<p class="admonition-title">Tip</p>
-<p>To use private/internal datasets as part of the same <code class="docutils literal notranslate"><span class="pre">pnpl</span></code> namespace, also install the overlay package <code class="docutils literal notranslate"><span class="pre">pnpl-internal</span></code> from your private index (or editable checkout). The overlay depends on <code class="docutils literal notranslate"><span class="pre">pnpl</span></code> and contributes additional modules under <code class="docutils literal notranslate"><span class="pre">pnpl.*</span></code>.</p>
-</div>
 <section id="development-install-editable">
 <h2>Development install (editable)<a class="headerlink" href="#development-install-editable" title="Link to this heading">#</a></h2>
-<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>git<span class="w"> </span>clone<span class="w"> </span>https://github.com/neural-processing-lab/pnpl-public.git
-<span class="nb">cd</span><span class="w"> </span>pnpl-public
+<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>git<span class="w"> </span>clone<span class="w"> </span>https://github.com/neural-processing-lab/pnpl.git
+<span class="nb">cd</span><span class="w"> </span>pnpl
 python<span class="w"> </span>-m<span class="w"> </span>venv<span class="w"> </span>.venv<span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nb">source</span><span class="w"> </span>.venv/bin/activate
 pip<span class="w"> </span>install<span class="w"> </span>-e<span class="w"> </span>.
 </pre></div>
@@ -504,4 +500,4 @@ <h2>Development install (editable)<a class="headerlink" href="#development-insta
   <footer class="bd-footer">
   </footer>
   </body>
-</html>
+</html>
@@ -8,7 +8,7 @@ logo: pnpl-wordmark.png
 only_build_toc_files: false
 
 repository:
-  url: https://github.com/neural-processing-lab/pnpl-public
+  url: https://github.com/neural-processing-lab/pnpl
   path_to_book: docs
   branch: main
 
 
@@ -13,8 +13,8 @@ Thanks for your interest in improving PNPL! This guide helps you get set up, mak
 
 ## Development setup
 ```bash
-git clone https://github.com/neural-processing-lab/pnpl-public.git
-cd pnpl-public
+git clone https://github.com/neural-processing-lab/pnpl.git
+cd pnpl
 python -m venv .venv && source .venv/bin/activate
 pip install -e .
 pip install -r docs/requirements.txt  # if you build docs locally
@@ -47,4 +47,3 @@ open docs/_build/html/index.html
 - Label: bug/enhancement/docs where appropriate.
 
 Thanks for contributing!
-
@@ -4,16 +4,31 @@ title: Datasets
 
 # Public Datasets
 
-The `pnpl.datasets` package provides dataset classes designed for deep learning workflows (PyTorch `Dataset`).
+The public `pnpl.datasets` package provides the LibriBrain 2025 dataset family plus shared utilities for deep learning workflows.
 
-## GroupedDataset
+The main entry point is the task-based `LibriBrain` dataset:
 
-Utility dataset to group multiple datasets and expose a unified interface.
+```python
+from pnpl.datasets import LibriBrain
+from pnpl.tasks import SpeechDetection, PhonemeClassification, WordDetection
+```
+
+Additional wrapper datasets are also available:
 
 ```python
-from pnpl.datasets import GroupedDataset, LibriBrainSpeech, LibriBrainPhoneme
+from pnpl.datasets import (
+    GroupedDataset,
+    LibriBrainSpeech,
+    LibriBrainPhoneme,
+    LibriBrainWord,
+    LibriBrainSentence,
+)
 ```
 
+## GroupedDataset
+
+Utility dataset to group multiple datasets and expose a unified interface.
+
 ## HDF5Dataset (base)
 
 `pnpl.datasets.hdf5.HDF5Dataset` is a simple base for datasets backed by MEG signals serialized as HDF5, with standardization and slicing support.
@@ -25,8 +40,8 @@ Key features:
 
 ## LibriBrain 2025
 
-- `LibriBrainPhoneme`: phoneme classification from MEG segments.
-- `LibriBrainSpeech`: speech/silence time-series labels over a window.
-
-Both rely on a BIDS-like directory structure and can download needed files from Hugging Face.
-
+- `LibriBrain`: task-based dataset entry point
+- `LibriBrainSpeech`: speech/silence time-series labels over a window
+- `LibriBrainPhoneme`: phoneme classification from MEG segments
+- `LibriBrainWord`: word-detection wrapper
+- `LibriBrainSentence`: sentence-level dataset wrapper
@@ -7,7 +7,7 @@ title: PNPL
 PNPL is a friendly Python toolkit for loading and processing brain datasets for deep learning. It ships with ready‑to‑use dataset classes (PyTorch `Dataset`) and simple utilities so you can focus on modeling, not file plumbing.
 
 - [PyPI: `pnpl`](https://pypi.org/project/pnpl/)
-- [GitHub: Neural-Processing-Lab/pnpl](https://github.com/neural-processing-lab/pnpl-public)
+- [GitHub: Neural-Processing-Lab/pnpl](https://github.com/neural-processing-lab/pnpl)
 
 
 ## Get Started
@@ -18,21 +18,23 @@ PNPL is a friendly Python toolkit for loading and processing brain datasets for
 pip install pnpl
 ```
 
-2) Load a single run of LibriBrain Speech and iterate samples
+2) Load LibriBrain with the task-based API
 
 ```python
-from pnpl.datasets.libribrain2025 import constants
-from pnpl.datasets import LibriBrainSpeech
+from pnpl.datasets import LibriBrain
+from pnpl.tasks import SpeechDetection
 
-ds = LibriBrainSpeech(
+ds = LibriBrain(
     data_path="./data/LibriBrain",
-    preprocessing_str="bads+headpos+sss+notch+bp+ds",
-    include_run_keys=[constants.RUN_KEYS[0]]
-    )
+    task=SpeechDetection(tmin=0.0, tmax=0.2),
+    partition="train",
+)
 x, y, info = ds[0]
 print(x.shape, y.shape)  # (channels,time), (time,)
 ```
 
+Wrapper classes such as `LibriBrainSpeech` and `LibriBrainPhoneme` are also available.
+
 ## Explore PNPL
 
 <div class="feature-grid">
 
@@ -12,16 +12,11 @@ pip install pnpl
 
 Core scientific dependencies include `numpy`, `pandas`, `torch`, `h5py`, `mne`, `mne_bids`, and `huggingface_hub`.
 
-```{tip}
-To use private/internal datasets as part of the same `pnpl` namespace, also install the overlay package `pnpl-internal` from your private index (or editable checkout). The overlay depends on `pnpl` and contributes additional modules under `pnpl.*`.
-```
-
 ## Development install (editable)
 
 ```bash
-git clone https://github.com/neural-processing-lab/pnpl-public.git
-cd pnpl-public
+git clone https://github.com/neural-processing-lab/pnpl.git
+cd pnpl
 python -m venv .venv && source .venv/bin/activate
 pip install -e .
 ```
-
@@ -4,19 +4,38 @@ title: LibriBrain
 
 # LibriBrain
 
-The LibriBrain 2025 datasets provide MEG-based tasks with convenient download and caching from Hugging Face.
+The LibriBrain 2025 dataset family provides MEG-based speech and language tasks with download/caching support from Hugging Face.
 
 ## Common Arguments
 
 - `data_path`: local root where files are stored / downloaded
-- `preprocessing_str`: expected preprocessing string in filenames
-- `tmin`, `tmax`: window relative to event (seconds)
+- `preprocessing` / `preprocessing_str`: expected preprocessing string in filenames
 - `standardize`: z-score channels using per-run stats
 - `include_run_keys`: list of run keys to include (see constants.RUN_KEYS)
 - `include_info`: include an info dict in each sample
 - `download`: if True (default), fetch missing files via Hugging Face
 
-## Speech (binary time series)
+## Task-based entry point
+
+```python
+from pnpl.datasets import LibriBrain
+from pnpl.tasks import SpeechDetection
+
+ds = LibriBrain(
+    data_path="./data/LibriBrain",
+    task=SpeechDetection(tmin=0.0, tmax=0.2),
+    partition="train",
+    include_info=True,
+)
+
+print(len(ds))
+```
+
+The task object controls sample collection and label semantics. Public task classes live in `pnpl.tasks`.
+
+## Wrapper datasets
+
+### Speech (binary time series)
 
 ```python
 from pnpl.datasets import LibriBrainSpeech
@@ -34,9 +53,9 @@ ds = LibriBrainSpeech(
 print(len(ds))
 ```
 
-Each item returns `(data: float32[channels,time], labels: int[time], info: dict)`.
+Each item returns `(data: float32[channels,time], labels: int[time], info: dict)` when `include_info=True`.
 
-## Phoneme (classification)
+### Phoneme (classification)
 
 ```python
 from pnpl.datasets import LibriBrainPhoneme
@@ -53,3 +72,5 @@ print(len(ds))
 ```
 
 Each item returns `(data: float32[channels,time], label_id: int64)`.
+
+`LibriBrainWord` and `LibriBrainSentence` are also available as dataset-specific wrappers.
@@ -4,23 +4,18 @@ title: Quickstart
 
 # Quickstart
 
-This page shows short examples loading datasets and iterating samples.
+This page shows the task-based LibriBrain entry point together with dataset-specific wrappers.
 
-## LibriBrain Speech (public)
+## Task-based API
 
 ```python
-from pnpl.datasets.libribrain2025 import constants
-from pnpl.datasets import LibriBrainSpeech
-
-# pick one run to keep it quick
-include_run_keys = [constants.RUN_KEYS[0]]  # e.g. ('0','1','Sherlock1','1')
+from pnpl.datasets import LibriBrain
+from pnpl.tasks import SpeechDetection
 
-ds = LibriBrainSpeech(
+ds = LibriBrain(
     data_path="./data/LibriBrain",
-    preprocessing_str="bads+headpos+sss+notch+bp+ds",
-    include_run_keys=include_run_keys,
-    tmin=0.0,
-    tmax=0.2,
+    task=SpeechDetection(tmin=0.0, tmax=0.2),
+    partition="train",
     standardize=True,
     include_info=True,
 )
@@ -30,15 +25,22 @@ x, y, info = ds[0]
 print(x.shape, y.shape, info["dataset"])  # (channels,time), (time,), "libribrain2025"
 ```
 
-## LibriBrain Phoneme (public)
+## Wrapper datasets
 
 ```python
 from pnpl.datasets.libribrain2025 import constants
-from pnpl.datasets import LibriBrainPhoneme
+from pnpl.datasets import LibriBrainSpeech, LibriBrainPhoneme
 
 include_run_keys = [constants.RUN_KEYS[0]]
 
-ds = LibriBrainPhoneme(
+speech_ds = LibriBrainSpeech(
+    data_path="./data/LibriBrain",
+    include_run_keys=include_run_keys,
+    tmin=0.0,
+    tmax=0.2,
+)
+
+phoneme_ds = LibriBrainPhoneme(
     data_path="./data/LibriBrain",
     preprocessing_str="bads+headpos+sss+notch+bp+ds",
     include_run_keys=include_run_keys,
@@ -47,12 +49,12 @@ ds = LibriBrainPhoneme(
     standardize=True,
 )
 
-print(len(ds), "samples")
-x, y = ds[0]
+print(len(speech_ds), "speech samples")
+print(len(phoneme_ds), "phoneme samples")
+x, y = phoneme_ds[0]
 print(x.shape, y.item())
 ```
 
 ```{note}
 The first time you instantiate a dataset with `download=True` (default), required files are downloaded from Hugging Face and cached under `data_path`.
 ```
-