dynamicslab · ludgerpaehler · Apr 27, 2026 · Apr 27, 2026 · Apr 27, 2026 · Apr 27, 2026
diff --git a/.github/workflows/deploy-docs.yml b/.github/workflows/deploy-docs.yml
@@ -0,0 +1,53 @@
+name: Deploy Docs to GitHub Pages
+
+on:
+  push:
+    branches:
+      - master
+
+jobs:
+  build:
+    name: Build Docusaurus
+    runs-on: ubuntu-latest
+    defaults:
+      run:
+        working-directory: docs
+    steps:
+      - uses: actions/checkout@v4
+        with:
+          fetch-depth: 0
+
+      - uses: actions/setup-node@v4
+        with:
+          node-version: 20
+          cache: npm
+          cache-dependency-path: docs/package-lock.json
+
+      - name: Install dependencies
+        run: npm ci
+
+      - name: Build website
+        run: npm run build
+
+      - name: Upload build artifact
+        uses: actions/upload-pages-artifact@v3
+        with:
+          path: docs/build
+
+  deploy:
+    name: Deploy to GitHub Pages
+    needs: build
+
+    permissions:
+      pages: write
+      id-token: write
+
+    environment:
+      name: github-pages
+      url: ${{ steps.deployment.outputs.page_url }}
+
+    runs-on: ubuntu-latest
+    steps:
+      - name: Deploy to GitHub Pages
+        id: deployment
+        uses: actions/deploy-pages@v4
diff --git a/.github/workflows/test-deploy-docs.yml b/.github/workflows/test-deploy-docs.yml
@@ -0,0 +1,30 @@
+name: Test Docs Build
+
+on:
+  pull_request:
+    branches:
+      - master
+
+jobs:
+  test-deploy:
+    name: Test Docusaurus build
+    runs-on: ubuntu-latest
+    defaults:
+      run:
+        working-directory: docs
+    steps:
+      - uses: actions/checkout@v4
+        with:
+          fetch-depth: 0
+
+      - uses: actions/setup-node@v4
+        with:
+          node-version: 20
+          cache: npm
+          cache-dependency-path: docs/package-lock.json
+
+      - name: Install dependencies
+        run: npm ci
+
+      - name: Test build website
+        run: npm run build
diff --git a/docs/.gitignore b/docs/.gitignore
@@ -0,0 +1,20 @@
+# Dependencies
+/node_modules
+
+# Production
+/build
+
+# Generated files
+.docusaurus
+.cache-loader
+
+# Misc
+.DS_Store
+.env.local
+.env.development.local
+.env.test.local
+.env.production.local
+
+npm-debug.log*
+yarn-debug.log*
+yarn-error.log*
diff --git a/docs/docs/algorithms/_category_.json b/docs/docs/algorithms/_category_.json
@@ -0,0 +1,9 @@
+{
+  "label": "Algorithms",
+  "position": 2,
+  "collapsed": false,
+  "link": {
+    "type": "generated-index",
+    "description": "Control algorithms implemented in KoopmanRL, including the two KARL algorithms and baseline comparators."
+  }
+}
diff --git a/docs/docs/algorithms/lqr.md b/docs/docs/algorithms/lqr.md
@@ -0,0 +1,29 @@
+---
+id: lqr
+sidebar_position: 3
+title: Linear Quadratic Regulator (LQR)
+---
+
+# Linear Quadratic Regulator (LQR)
+
+The Linear Quadratic Regulator is a classical optimal control algorithm included in KoopmanRL as a baseline comparator. It provides an exact analytical solution for linear systems with quadratic cost, serving as an upper-bound reference on environments where linearity holds.
+
+## Running LQR
+
+```bash
+uv run -m koopmanrl.linear_quadratic_regulator
+```
+
+With a specific environment:
+
+```bash
+uv run -m koopmanrl.linear_quadratic_regulator --env_id FluidFlow-v0
+```
+
+## When to use LQR
+
+LQR is most informative on the `LinearSystem-v0` environment where its optimality assumptions are exactly satisfied. On nonlinear environments (Lorenz, Fluid Flow, Double Well) it provides a linearised-dynamics baseline that the KARL algorithms aim to outperform.
+
+## Source
+
+`koopmanrl/linear_quadratic_regulator.py`
diff --git a/docs/docs/algorithms/sac.md b/docs/docs/algorithms/sac.md
@@ -0,0 +1,33 @@
+---
+id: sac
+sidebar_position: 4
+title: Soft Actor-Critic Baselines
+---
+
+# Soft Actor-Critic Baselines
+
+KoopmanRL ships two CleanRL-style SAC baselines that provide model-free comparators for the KARL algorithms.
+
+## Q-value SAC
+
+Standard Soft Actor-Critic with a Q-function critic, adapted from CleanRL.
+
+```bash
+uv run -m koopmanrl.sac_continuous_action --env_id Lorenz-v0
+```
+
+**Source:** `koopmanrl/sac_continuous_action.py`
+
+## Value-based SAC
+
+A variant that uses a value function $V(s)$ rather than $Q(s,a)$, also from CleanRL.
+
+```bash
+uv run -m koopmanrl.value_based_sac_continuous_action --env_id DoubleWell-v0
+```
+
+**Source:** `koopmanrl/value_based_sac_continuous_action.py`
+
+## Purpose
+
+These baselines are the direct model-free counterparts to SAKC. Comparing SAKC against them on the same environment and seed budget quantifies the benefit of the Koopman critic.
diff --git a/docs/docs/algorithms/sakc.md b/docs/docs/algorithms/sakc.md
@@ -0,0 +1,62 @@
+---
+id: sakc
+sidebar_position: 2
+title: Soft Actor Koopman Critic (SAKC)
+---
+
+# Soft Actor Koopman Critic (SAKC)
+
+Soft Actor Koopman Critic (SAKC) is the second KARL algorithm. It extends the standard Soft Actor-Critic (SAC) framework by replacing the learned critic network with a critic derived from the Koopman tensor representation of the transition dynamics.
+
+## Algorithm overview
+
+SAKC follows the actor-critic paradigm:
+
+1. **Koopman tensor construction** — as in SKVI, trajectories are collected and a Koopman tensor $\mathcal{K}$ is fitted to the environment's dynamics.
+2. **Koopman critic** — instead of learning $Q(s, a)$ from scratch via a neural network, SAKC computes the critic analytically from the Koopman tensor, exploiting its linear structure.
+3. **Actor update** — a standard stochastic policy gradient update is applied to the actor network using the Koopman-derived critic as the advantage signal.
+4. **Entropy regularisation** — a soft maximum-entropy objective is retained, balancing exploration and exploitation.
+
+The Koopman critic replaces thousands of gradient steps of critic regression with a single closed-form computation, reducing sample complexity while maintaining the expressiveness of the actor.
+
+## Running SAKC
+
+```bash
+uv run -m koopmanrl.soft_actor_koopman_critic --env_id FluidFlow-v0
+```
+
+## Key hyperparameters
+
+| Flag | Default | Description |
+|------|---------|-------------|
+| `--env_id` | `LinearSystem-v0` | Environment to train on |
+| `--seed` | `1` | Random seed |
+| `--total_timesteps` | `100000` | Training budget |
+| `--num_trajectories` | `500` | Trajectories for Koopman tensor fitting |
+| `--actor_lr` | `3e-4` | Actor learning rate |
+| `--alpha` | `0.2` | Entropy regularisation coefficient |
+
+Run `--help` to see the full list:
+
+```bash
+uv run -m koopmanrl.soft_actor_koopman_critic --help
+```
+
+## Using a pre-optimised config
+
+```bash
+uv run python -m koopmanrl.soft_actor_koopman_critic \
+    --config_file configurations/sakc_fluid_flow_hparams.json
+```
+
+Override individual flags even when using a config:
+
+```bash
+uv run python -m koopmanrl.soft_actor_koopman_critic \
+    --config_file configurations/sakc_double_well_hparams.json \
+    --seed 42
+```
+
+## Source
+
+`koopmanrl/soft_actor_koopman_critic.py`
diff --git a/docs/docs/algorithms/skvi.md b/docs/docs/algorithms/skvi.md
@@ -0,0 +1,61 @@
+---
+id: skvi
+sidebar_position: 1
+title: Soft Koopman Value Iteration (SKVI)
+---
+
+# Soft Koopman Value Iteration (SKVI)
+
+Soft Koopman Value Iteration (SKVI) is the first of the two KARL algorithms. It replaces the standard Bellman backup in soft value iteration with one that exploits a learned Koopman tensor representation of the environment's transition dynamics.
+
+## Algorithm overview
+
+SKVI operates in discrete value-iteration fashion:
+
+1. **Koopman tensor construction** — collect trajectories from the environment and fit a Koopman tensor $\mathcal{K}$ that maps observable functions of the current state-action pair to observables of the next state.
+2. **Lifted value iteration** — define the soft Bellman backup over the lifted (observable) space rather than the raw state space, exploiting the linearity of $\mathcal{K}$.
+3. **Policy extraction** — derive the policy from the soft value function in the observable space, then project back to action space.
+
+The key benefit is that the linear structure of the Koopman operator allows the Bellman backup to be solved analytically, avoiding the regression step needed by model-free methods.
+
+## Running SKVI
+
+```bash
+uv run -m koopmanrl.soft_koopman_value_iteration --env_id LinearSystem-v0
+```
+
+All supported environment IDs:
+
+| Environment | `--env_id` |
+|-------------|-----------|
+| Linear System | `LinearSystem-v0` |
+| Fluid Flow | `FluidFlow-v0` |
+| Lorenz | `Lorenz-v0` |
+| Double Well | `DoubleWell-v0` |
+
+## Key hyperparameters
+
+| Flag | Default | Description |
+|------|---------|-------------|
+| `--env_id` | `LinearSystem-v0` | Environment to train on |
+| `--seed` | `1` | Random seed |
+| `--total_timesteps` | `100000` | Training budget |
+| `--num_trajectories` | `500` | Trajectories for Koopman tensor fitting |
+| `--observable_dim` | Env-specific | Dimension of the observable (lifted) space |
+
+Run `--help` to see the full list:
+
+```bash
+uv run -m koopmanrl.soft_koopman_value_iteration --help
+```
+
+## Using a pre-optimised config
+
+```bash
+uv run python -m koopmanrl.soft_koopman_value_iteration \
+    --config_file configurations/skvi_lorenz_hparams.json
+```
+
+## Source
+
+`koopmanrl/soft_koopman_value_iteration.py`
diff --git a/docs/docs/api/_category_.json b/docs/docs/api/_category_.json
@@ -0,0 +1,9 @@
+{
+  "label": "API Reference",
+  "position": 5,
+  "collapsed": false,
+  "link": {
+    "type": "generated-index",
+    "description": "Reference documentation for the KoopmanRL Python API."
+  }
+}
diff --git a/docs/docs/api/index.md b/docs/docs/api/index.md
@@ -0,0 +1,79 @@
+---
+id: api
+sidebar_position: 1
+title: API Reference
+---
+
+# API Reference
+
+KoopmanRL is organised into two top-level packages.
+
+## `koopmanrl` — core algorithms and environments
+
+| Module | Contents |
+|--------|----------|
+| `koopmanrl.environments` | Four benchmark Gym environments |
+| `koopmanrl.soft_koopman_value_iteration` | SKVI training script |
+| `koopmanrl.soft_actor_koopman_critic` | SAKC training script |
+| `koopmanrl.linear_quadratic_regulator` | LQR baseline |
+| `koopmanrl.sac_continuous_action` | SAC (Q-value) baseline |
+| `koopmanrl.value_based_sac_continuous_action` | SAC (value-function) baseline |
+| `koopmanrl.koopman_observables` | Observable (lifting) functions |
+| `koopmanrl.koopman_tensor` | Koopman tensor construction and fitting |
+| `koopmanrl.opt_wrappers` | Wrappers for Optuna/Ray Tune integration |
+| `koopmanrl.utils` | Shared utilities (config loading, seeding) |
+| `koopmanrl.sakc_optuna_opt` | SAKC hyperparameter optimization |
+| `koopmanrl.skvi_optuna_opt` | SKVI hyperparameter optimization |
+
+## `koopmanrl_utils` — post-processing and visualisation
+
+| Module | Contents |
+|--------|----------|
+| `koopmanrl_utils.movies.generate_trajectories` | Roll out policies and save trajectory `.npy` files |
+| `koopmanrl_utils.movies.generate_trajectory_figure` | Static PNG trajectory plots with optional vector field |
+| `koopmanrl_utils.movies.generate_gifs` | Animated GIF generation from saved trajectories |
+| `koopmanrl_utils.run_optimized_experiments` | Re-run best configs across seeds |
+| `koopmanrl_utils.plot_csv_from_tensorboards` | Plot training curves from TensorBoard CSVs |
+
+## Environments
+
+All four environments follow the [OpenAI Gym](https://gymnasium.farama.org/) interface (`gym==0.23.1`). They are registered at import time and can be instantiated with:
+
+```python
+import gym
+import koopmanrl.environments  # registers all environments
+
+env = gym.make("FluidFlow-v0")
+obs = env.reset()
+obs, reward, done, info = env.step(env.action_space.sample())
+```
+
+### Environment IDs
+
+| ID | Class | Source |
+|----|-------|--------|
+| `LinearSystem-v0` | `LinearSystem` | `koopmanrl/environments/linear_system.py` |
+| `FluidFlow-v0` | `FluidFlow` | `koopmanrl/environments/fluid_flow.py` |
+| `Lorenz-v0` | `Lorenz` | `koopmanrl/environments/lorenz.py` |
+| `DoubleWell-v0` | `DoubleWell` | `koopmanrl/environments/double_well.py` |
+
+## Koopman tensor
+
+The `koopmanrl.koopman_tensor` module provides the core Koopman tensor fitting routine used by both SKVI and SAKC. It accepts batches of transition tuples $(x_t, u_t, x_{t+1})$ and returns a tensor $\mathcal{K}$ such that
+
+$$
+\phi(x_{t+1}) \approx \mathcal{K}(u_t) \, \phi(x_t)
+$$
+
+where $\phi$ is the observable (lifting) function chosen via `koopmanrl.koopman_observables`.
+
+## Config loading
+
+`koopmanrl.utils.load_and_apply_config` provides layered configuration merging: a JSON file sets defaults, and any CLI flag explicitly provided takes precedence.
+
+```python
+from koopmanrl.utils import load_and_apply_config
+
+args = MyArgs().parse_args()
+args = load_and_apply_config(args, "configurations/sakc_fluid_flow_hparams.json")
+```