Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 53 additions & 0 deletions .github/workflows/deploy-docs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
name: Deploy Docs to GitHub Pages

on:
push:
branches:
- master

jobs:
build:
name: Build Docusaurus
runs-on: ubuntu-latest
defaults:
run:
working-directory: docs
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0

- uses: actions/setup-node@v4
with:
node-version: 20
cache: npm
cache-dependency-path: docs/package-lock.json

- name: Install dependencies
run: npm ci

- name: Build website
run: npm run build

- name: Upload build artifact
uses: actions/upload-pages-artifact@v3
with:
path: docs/build

deploy:
name: Deploy to GitHub Pages
needs: build

permissions:
pages: write
id-token: write

environment:
name: github-pages
url: ${{ steps.deployment.outputs.page_url }}

runs-on: ubuntu-latest
steps:
- name: Deploy to GitHub Pages
id: deployment
uses: actions/deploy-pages@v4
30 changes: 30 additions & 0 deletions .github/workflows/test-deploy-docs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
name: Test Docs Build

on:
pull_request:
branches:
- master

jobs:
test-deploy:
name: Test Docusaurus build
runs-on: ubuntu-latest
defaults:
run:
working-directory: docs
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0

- uses: actions/setup-node@v4
with:
node-version: 20
cache: npm
cache-dependency-path: docs/package-lock.json

- name: Install dependencies
run: npm ci

- name: Test build website
run: npm run build
20 changes: 20 additions & 0 deletions docs/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Dependencies
/node_modules

# Production
/build

# Generated files
.docusaurus
.cache-loader

# Misc
.DS_Store
.env.local
.env.development.local
.env.test.local
.env.production.local

npm-debug.log*
yarn-debug.log*
yarn-error.log*
9 changes: 9 additions & 0 deletions docs/docs/algorithms/_category_.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
{
"label": "Algorithms",
"position": 2,
"collapsed": false,
"link": {
"type": "generated-index",
"description": "Control algorithms implemented in KoopmanRL, including the two KARL algorithms and baseline comparators."
}
}
29 changes: 29 additions & 0 deletions docs/docs/algorithms/lqr.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
---
id: lqr
sidebar_position: 3
title: Linear Quadratic Regulator (LQR)
---

# Linear Quadratic Regulator (LQR)

The Linear Quadratic Regulator is a classical optimal control algorithm included in KoopmanRL as a baseline comparator. It provides an exact analytical solution for linear systems with quadratic cost, serving as an upper-bound reference on environments where linearity holds.

## Running LQR

```bash
uv run -m koopmanrl.linear_quadratic_regulator
```

With a specific environment:

```bash
uv run -m koopmanrl.linear_quadratic_regulator --env_id FluidFlow-v0
```

## When to use LQR

LQR is most informative on the `LinearSystem-v0` environment where its optimality assumptions are exactly satisfied. On nonlinear environments (Lorenz, Fluid Flow, Double Well) it provides a linearised-dynamics baseline that the KARL algorithms aim to outperform.

## Source

`koopmanrl/linear_quadratic_regulator.py`
33 changes: 33 additions & 0 deletions docs/docs/algorithms/sac.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
---
id: sac
sidebar_position: 4
title: Soft Actor-Critic Baselines
---

# Soft Actor-Critic Baselines

KoopmanRL ships two CleanRL-style SAC baselines that provide model-free comparators for the KARL algorithms.

## Q-value SAC

Standard Soft Actor-Critic with a Q-function critic, adapted from CleanRL.

```bash
uv run -m koopmanrl.sac_continuous_action --env_id Lorenz-v0
```

**Source:** `koopmanrl/sac_continuous_action.py`

## Value-based SAC

A variant that uses a value function $V(s)$ rather than $Q(s,a)$, also from CleanRL.

```bash
uv run -m koopmanrl.value_based_sac_continuous_action --env_id DoubleWell-v0
```

**Source:** `koopmanrl/value_based_sac_continuous_action.py`

## Purpose

These baselines are the direct model-free counterparts to SAKC. Comparing SAKC against them on the same environment and seed budget quantifies the benefit of the Koopman critic.
62 changes: 62 additions & 0 deletions docs/docs/algorithms/sakc.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
---
id: sakc
sidebar_position: 2
title: Soft Actor Koopman Critic (SAKC)
---

# Soft Actor Koopman Critic (SAKC)

Soft Actor Koopman Critic (SAKC) is the second KARL algorithm. It extends the standard Soft Actor-Critic (SAC) framework by replacing the learned critic network with a critic derived from the Koopman tensor representation of the transition dynamics.

## Algorithm overview

SAKC follows the actor-critic paradigm:

1. **Koopman tensor construction** — as in SKVI, trajectories are collected and a Koopman tensor $\mathcal{K}$ is fitted to the environment's dynamics.
2. **Koopman critic** — instead of learning $Q(s, a)$ from scratch via a neural network, SAKC computes the critic analytically from the Koopman tensor, exploiting its linear structure.
3. **Actor update** — a standard stochastic policy gradient update is applied to the actor network using the Koopman-derived critic as the advantage signal.
4. **Entropy regularisation** — a soft maximum-entropy objective is retained, balancing exploration and exploitation.

The Koopman critic replaces thousands of gradient steps of critic regression with a single closed-form computation, reducing sample complexity while maintaining the expressiveness of the actor.

## Running SAKC

```bash
uv run -m koopmanrl.soft_actor_koopman_critic --env_id FluidFlow-v0
```

## Key hyperparameters

| Flag | Default | Description |
|------|---------|-------------|
| `--env_id` | `LinearSystem-v0` | Environment to train on |
| `--seed` | `1` | Random seed |
| `--total_timesteps` | `100000` | Training budget |
| `--num_trajectories` | `500` | Trajectories for Koopman tensor fitting |
| `--actor_lr` | `3e-4` | Actor learning rate |
| `--alpha` | `0.2` | Entropy regularisation coefficient |

Run `--help` to see the full list:

```bash
uv run -m koopmanrl.soft_actor_koopman_critic --help
```

## Using a pre-optimised config

```bash
uv run python -m koopmanrl.soft_actor_koopman_critic \
--config_file configurations/sakc_fluid_flow_hparams.json
```

Override individual flags even when using a config:

```bash
uv run python -m koopmanrl.soft_actor_koopman_critic \
--config_file configurations/sakc_double_well_hparams.json \
--seed 42
```

## Source

`koopmanrl/soft_actor_koopman_critic.py`
61 changes: 61 additions & 0 deletions docs/docs/algorithms/skvi.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
---
id: skvi
sidebar_position: 1
title: Soft Koopman Value Iteration (SKVI)
---

# Soft Koopman Value Iteration (SKVI)

Soft Koopman Value Iteration (SKVI) is the first of the two KARL algorithms. It replaces the standard Bellman backup in soft value iteration with one that exploits a learned Koopman tensor representation of the environment's transition dynamics.

## Algorithm overview

SKVI operates in discrete value-iteration fashion:

1. **Koopman tensor construction** — collect trajectories from the environment and fit a Koopman tensor $\mathcal{K}$ that maps observable functions of the current state-action pair to observables of the next state.
2. **Lifted value iteration** — define the soft Bellman backup over the lifted (observable) space rather than the raw state space, exploiting the linearity of $\mathcal{K}$.
3. **Policy extraction** — derive the policy from the soft value function in the observable space, then project back to action space.

The key benefit is that the linear structure of the Koopman operator allows the Bellman backup to be solved analytically, avoiding the regression step needed by model-free methods.

## Running SKVI

```bash
uv run -m koopmanrl.soft_koopman_value_iteration --env_id LinearSystem-v0
```

All supported environment IDs:

| Environment | `--env_id` |
|-------------|-----------|
| Linear System | `LinearSystem-v0` |
| Fluid Flow | `FluidFlow-v0` |
| Lorenz | `Lorenz-v0` |
| Double Well | `DoubleWell-v0` |

## Key hyperparameters

| Flag | Default | Description |
|------|---------|-------------|
| `--env_id` | `LinearSystem-v0` | Environment to train on |
| `--seed` | `1` | Random seed |
| `--total_timesteps` | `100000` | Training budget |
| `--num_trajectories` | `500` | Trajectories for Koopman tensor fitting |
| `--observable_dim` | Env-specific | Dimension of the observable (lifted) space |

Run `--help` to see the full list:

```bash
uv run -m koopmanrl.soft_koopman_value_iteration --help
```

## Using a pre-optimised config

```bash
uv run python -m koopmanrl.soft_koopman_value_iteration \
--config_file configurations/skvi_lorenz_hparams.json
```

## Source

`koopmanrl/soft_koopman_value_iteration.py`
9 changes: 9 additions & 0 deletions docs/docs/api/_category_.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
{
"label": "API Reference",
"position": 5,
"collapsed": false,
"link": {
"type": "generated-index",
"description": "Reference documentation for the KoopmanRL Python API."
}
}
79 changes: 79 additions & 0 deletions docs/docs/api/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
---
id: api
sidebar_position: 1
title: API Reference
---

# API Reference

KoopmanRL is organised into two top-level packages.

## `koopmanrl` — core algorithms and environments

| Module | Contents |
|--------|----------|
| `koopmanrl.environments` | Four benchmark Gym environments |
| `koopmanrl.soft_koopman_value_iteration` | SKVI training script |
| `koopmanrl.soft_actor_koopman_critic` | SAKC training script |
| `koopmanrl.linear_quadratic_regulator` | LQR baseline |
| `koopmanrl.sac_continuous_action` | SAC (Q-value) baseline |
| `koopmanrl.value_based_sac_continuous_action` | SAC (value-function) baseline |
| `koopmanrl.koopman_observables` | Observable (lifting) functions |
| `koopmanrl.koopman_tensor` | Koopman tensor construction and fitting |
| `koopmanrl.opt_wrappers` | Wrappers for Optuna/Ray Tune integration |
| `koopmanrl.utils` | Shared utilities (config loading, seeding) |
| `koopmanrl.sakc_optuna_opt` | SAKC hyperparameter optimization |
| `koopmanrl.skvi_optuna_opt` | SKVI hyperparameter optimization |

## `koopmanrl_utils` — post-processing and visualisation

| Module | Contents |
|--------|----------|
| `koopmanrl_utils.movies.generate_trajectories` | Roll out policies and save trajectory `.npy` files |
| `koopmanrl_utils.movies.generate_trajectory_figure` | Static PNG trajectory plots with optional vector field |
| `koopmanrl_utils.movies.generate_gifs` | Animated GIF generation from saved trajectories |
| `koopmanrl_utils.run_optimized_experiments` | Re-run best configs across seeds |
| `koopmanrl_utils.plot_csv_from_tensorboards` | Plot training curves from TensorBoard CSVs |

## Environments

All four environments follow the [OpenAI Gym](https://gymnasium.farama.org/) interface (`gym==0.23.1`). They are registered at import time and can be instantiated with:

```python
import gym
import koopmanrl.environments # registers all environments

env = gym.make("FluidFlow-v0")
obs = env.reset()
obs, reward, done, info = env.step(env.action_space.sample())
```

### Environment IDs

| ID | Class | Source |
|----|-------|--------|
| `LinearSystem-v0` | `LinearSystem` | `koopmanrl/environments/linear_system.py` |
| `FluidFlow-v0` | `FluidFlow` | `koopmanrl/environments/fluid_flow.py` |
| `Lorenz-v0` | `Lorenz` | `koopmanrl/environments/lorenz.py` |
| `DoubleWell-v0` | `DoubleWell` | `koopmanrl/environments/double_well.py` |

## Koopman tensor

The `koopmanrl.koopman_tensor` module provides the core Koopman tensor fitting routine used by both SKVI and SAKC. It accepts batches of transition tuples $(x_t, u_t, x_{t+1})$ and returns a tensor $\mathcal{K}$ such that

$$
\phi(x_{t+1}) \approx \mathcal{K}(u_t) \, \phi(x_t)
$$

where $\phi$ is the observable (lifting) function chosen via `koopmanrl.koopman_observables`.

## Config loading

`koopmanrl.utils.load_and_apply_config` provides layered configuration merging: a JSON file sets defaults, and any CLI flag explicitly provided takes precedence.

```python
from koopmanrl.utils import load_and_apply_config

args = MyArgs().parse_args()
args = load_and_apply_config(args, "configurations/sakc_fluid_flow_hparams.json")
```
Loading
Loading