Docs: add Jupyter Book in docs/ with Pages workflow; initial content (install, quickstart, datasets, LibriBrain)

gereonelvers · gereonelvers · commit 219be790d004 · 2025-09-11T23:10:30.000+01:00
diff --git a/.github/workflows/docs.yml b/.github/workflows/docs.yml
@@ -0,0 +1,53 @@
+name: Docs
+
+on:
+  push:
+    branches: [ main ]
+  workflow_dispatch:
+
+permissions:
+  contents: read
+  pages: write
+  id-token: write
+
+concurrency:
+  group: pages
+  cancel-in-progress: true
+
+jobs:
+  build:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Checkout
+        uses: actions/checkout@v4
+
+      - name: Setup Python
+        uses: actions/setup-python@v5
+        with:
+          python-version: '3.11'
+
+      - name: Install doc deps
+        run: |
+          python -m pip install --upgrade pip
+          pip install -r docs/requirements.txt
+
+      - name: Build Jupyter Book
+        run: |
+          jupyter-book build docs/
+
+      - name: Upload artifact
+        uses: actions/upload-pages-artifact@v3
+        with:
+          path: docs/_build/html
+
+  deploy:
+    environment:
+      name: github-pages
+      url: ${{ steps.deployment.outputs.page_url }}
+    runs-on: ubuntu-latest
+    needs: build
+    steps:
+      - name: Deploy to GitHub Pages
+        id: deployment
+        uses: actions/deploy-pages@v4
+
diff --git a/README.md b/README.md
@@ -48,4 +48,10 @@ print("Label shape:", label.shape)
 ```
 
 ## Support
-In case of any questions or problems, please get in touch through [our Discord server](https://discord.gg/Fqr8gJnvSh).
+In case of any questions or problems, please get in touch through [our Discord server](https://discord.gg/Fqr8gJnvSh).
+## Documentation
+
+We publish documentation with Jupyter Book and GitHub Pages.
+
+- Live site: enable “GitHub Pages → Source: GitHub Actions” in repo settings (first run deploys automatically).
+- Build locally: `pip install -r docs/requirements.txt && jupyter-book build docs/`
diff --git a/docs/_config.yml b/docs/_config.yml
@@ -0,0 +1,34 @@
+################################################################################
+# Jupyter Book configuration
+################################################################################
+
+title: PNPL
+author: Neural Processing Lab
+logo: null
+only_build_toc_files: true
+
+repository:
+  url: https://github.com/neural-processing-lab/pnpl-public
+  path_to_book: docs
+  branch: main
+
+html:
+  favicon: null
+  use_repository_button: true
+  use_issues_button: true
+  use_edit_page_button: true
+  extra_navbar: "PNPL public package"
+  extra_footer: "© Neural Processing Lab"
+
+execute:
+  execute_notebooks: cache
+
+parse:
+  myst_enable_extensions:
+    - colon_fence
+    - deflist
+    - dollarmath
+    - html_image
+    - linkify
+    - substitution
+
diff --git a/docs/_toc.yml b/docs/_toc.yml
@@ -0,0 +1,9 @@
+format: jb-book
+root: index
+chapters:
+  - file: install
+  - file: quickstart
+  - file: datasets
+  - file: libribrain
+  - file: development
+
diff --git a/docs/datasets.md b/docs/datasets.md
@@ -0,0 +1,32 @@
+---
+title: Datasets
+---
+
+# Public Datasets
+
+The `pnpl.datasets` package provides dataset classes designed for deep learning workflows (PyTorch `Dataset`).
+
+## GroupedDataset
+
+Utility dataset to group multiple datasets and expose a unified interface.
+
+```python
+from pnpl.datasets import GroupedDataset, LibriBrainSpeech, LibriBrainPhoneme
+```
+
+## HDF5Dataset (base)
+
+`pnpl.datasets.hdf5.HDF5Dataset` is a simple base for datasets backed by MEG signals serialized as HDF5, with standardization and slicing support.
+
+Key features:
+- windowed access `(channels, time)`
+- channel-wise standardization
+- optional clipping
+
+## LibriBrain 2025
+
+- `LibriBrainPhoneme`: phoneme classification from MEG segments.
+- `LibriBrainSpeech`: speech/silence time-series labels over a window.
+
+Both rely on a BIDS-like directory structure and can download needed files from Hugging Face.
+
diff --git a/docs/development.md b/docs/development.md
@@ -0,0 +1,24 @@
+---
+title: Development
+---
+
+# Development
+
+## Running Tests
+
+```bash
+python -m venv .venv && source .venv/bin/activate
+pip install -e .
+pip install pytest
+pytest -q
+```
+
+## Building this documentation locally
+
+```bash
+python -m venv .venv && source .venv/bin/activate
+pip install -r docs/requirements.txt
+jupyter-book build docs/
+open docs/_build/html/index.html
+```
+
diff --git a/docs/index.md b/docs/index.md
@@ -0,0 +1,26 @@
+---
+title: PNPL
+---
+
+# PNPL
+
+PNPL is a public Python package for loading and processing brain datasets for deep learning.
+
+- PyPI package: `pnpl`
+- Source: {repo:link}`neural-processing-lab/pnpl-public`
+
+This site documents the public package and shows how to install and use the datasets.
+
+```{note}
+For internal/private datasets, install the overlay package `pnpl-internal` in addition to `pnpl`.
+The overlay contributes extra modules under the same `pnpl.*` namespace and is documented privately.
+```
+
+## What’s inside
+
+- A lightweight top-level namespace `pnpl` with lazily exposed symbols
+- `pnpl.datasets` with public datasets and helpers
+- LibriBrain 2025 datasets: phoneme- and speech-based tasks
+
+Use the navigation to explore installation and examples.
+
diff --git a/docs/install.md b/docs/install.md
@@ -0,0 +1,27 @@
+---
+title: Install
+---
+
+# Install
+
+PNPL requires Python 3.10+ and installs from PyPI:
+
+```bash
+pip install pnpl
+```
+
+Core scientific dependencies include `numpy`, `pandas`, `torch`, `h5py`, `mne`, `mne_bids`, and `huggingface_hub`.
+
+```{tip}
+To use private/internal datasets as part of the same `pnpl` namespace, also install the overlay package `pnpl-internal` from your private index (or editable checkout). The overlay depends on `pnpl` and contributes additional modules under `pnpl.*`.
+```
+
+## Development install (editable)
+
+```bash
+git clone https://github.com/neural-processing-lab/pnpl-public.git
+cd pnpl-public
+python -m venv .venv && source .venv/bin/activate
+pip install -e .
+```
+
diff --git a/docs/libribrain.md b/docs/libribrain.md
@@ -0,0 +1,56 @@
+---
+title: LibriBrain
+---
+
+# LibriBrain Datasets
+
+The LibriBrain 2025 datasets provide MEG-based tasks with convenient download and caching from Hugging Face.
+
+## Common Arguments
+
+- `data_path`: local root where files are stored / downloaded
+- `preprocessing_str`: expected preprocessing string in filenames
+- `tmin`, `tmax`: window relative to event (seconds)
+- `standardize`: z-score channels using per-run stats
+- `include_run_keys`: list of run keys to include (see constants.RUN_KEYS)
+- `include_info`: include an info dict in each sample
+- `download`: if True (default), fetch missing files via Hugging Face
+
+## Speech (binary time series)
+
+```python
+from pnpl.datasets import LibriBrainSpeech
+from pnpl.datasets.libribrain2025 import constants
+
+ds = LibriBrainSpeech(
+    data_path="./data/LibriBrain",
+    preprocessing_str="bads+headpos+sss+notch+bp+ds",
+    include_run_keys=[constants.RUN_KEYS[0]],
+    tmin=0.0,
+    tmax=0.2,
+    include_info=True,
+)
+
+print(len(ds))
+```
+
+Each item returns `(data: float32[channels,time], labels: int[time], info: dict)`.
+
+## Phoneme (classification)
+
+```python
+from pnpl.datasets import LibriBrainPhoneme
+from pnpl.datasets.libribrain2025 import constants
+
+ds = LibriBrainPhoneme(
+    data_path="./data/LibriBrain",
+    preprocessing_str="bads+headpos+sss+notch+bp+ds",
+    include_run_keys=[constants.RUN_KEYS[0]],
+    tmin=-0.2,
+    tmax=0.6,
+)
+print(len(ds))
+```
+
+Each item returns `(data: float32[channels,time], label_id: int64)`.
+
diff --git a/docs/quickstart.md b/docs/quickstart.md
@@ -0,0 +1,58 @@
+---
+title: Quickstart
+---
+
+# Quickstart
+
+This page shows short examples loading datasets and iterating samples.
+
+## LibriBrain Speech (public)
+
+```python
+from pnpl.datasets.libribrain2025 import constants
+from pnpl.datasets import LibriBrainSpeech
+
+# pick one run to keep it quick
+include_run_keys = [constants.RUN_KEYS[0]]  # e.g. ('0','1','Sherlock1','1')
+
+ds = LibriBrainSpeech(
+    data_path="./data/LibriBrain",
+    preprocessing_str="bads+headpos+sss+notch+bp+ds",
+    include_run_keys=include_run_keys,
+    tmin=0.0,
+    tmax=0.2,
+    standardize=True,
+    include_info=True,
+)
+
+print(len(ds), "samples")
+x, y, info = ds[0]
+print(x.shape, y.shape, info["dataset"])  # (channels,time), (time,), "libribrain2025"
+```
+
+## LibriBrain Phoneme (public)
+
+```python
+from pnpl.datasets.libribrain2025 import constants
+from pnpl.datasets import LibriBrainPhoneme
+
+include_run_keys = [constants.RUN_KEYS[0]]
+
+ds = LibriBrainPhoneme(
+    data_path="./data/LibriBrain",
+    preprocessing_str="bads+headpos+sss+notch+bp+ds",
+    include_run_keys=include_run_keys,
+    tmin=-0.2,
+    tmax=0.6,
+    standardize=True,
+)
+
+print(len(ds), "samples")
+x, y = ds[0]
+print(x.shape, y.item())
+```
+
+```{note}
+The first time you instantiate a dataset with `download=True` (default), required files are downloaded from Hugging Face and cached under `data_path`.
+```
+
diff --git a/docs/requirements.txt b/docs/requirements.txt
@@ -0,0 +1,9 @@
+jupyter-book>=1.0.0
+ipywidgets
+numpy
+pandas
+torch
+h5py
+mne
+mne-bids
+huggingface_hub