Skip to content

Commit b870c35

Browse files
Merge pull request #2 from dynamicslab/claude/docs
Add Docusaurus documentation site and GitHub Pages deployment
2 parents 5f6bb38 + 7ecda6d commit b870c35

44 files changed

Lines changed: 20409 additions & 89 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/deploy-docs.yml

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
name: Deploy Docs to GitHub Pages
2+
3+
on:
4+
push:
5+
branches:
6+
- master
7+
8+
jobs:
9+
build:
10+
name: Build Docusaurus
11+
runs-on: ubuntu-latest
12+
defaults:
13+
run:
14+
working-directory: docs
15+
steps:
16+
- uses: actions/checkout@v4
17+
with:
18+
fetch-depth: 0
19+
20+
- uses: actions/setup-node@v4
21+
with:
22+
node-version: 20
23+
cache: npm
24+
cache-dependency-path: docs/package-lock.json
25+
26+
- name: Install dependencies
27+
run: npm ci
28+
29+
- name: Build website
30+
run: npm run build
31+
32+
- name: Upload build artifact
33+
uses: actions/upload-pages-artifact@v3
34+
with:
35+
path: docs/build
36+
37+
deploy:
38+
name: Deploy to GitHub Pages
39+
needs: build
40+
41+
permissions:
42+
pages: write
43+
id-token: write
44+
45+
environment:
46+
name: github-pages
47+
url: ${{ steps.deployment.outputs.page_url }}
48+
49+
runs-on: ubuntu-latest
50+
steps:
51+
- name: Deploy to GitHub Pages
52+
id: deployment
53+
uses: actions/deploy-pages@v4
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
name: Test Docs Build
2+
3+
on:
4+
pull_request:
5+
branches:
6+
- master
7+
8+
jobs:
9+
test-deploy:
10+
name: Test Docusaurus build
11+
runs-on: ubuntu-latest
12+
defaults:
13+
run:
14+
working-directory: docs
15+
steps:
16+
- uses: actions/checkout@v4
17+
with:
18+
fetch-depth: 0
19+
20+
- uses: actions/setup-node@v4
21+
with:
22+
node-version: 20
23+
cache: npm
24+
cache-dependency-path: docs/package-lock.json
25+
26+
- name: Install dependencies
27+
run: npm ci
28+
29+
- name: Test build website
30+
run: npm run build

docs/.gitignore

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# Dependencies
2+
/node_modules
3+
4+
# Production
5+
/build
6+
7+
# Generated files
8+
.docusaurus
9+
.cache-loader
10+
11+
# Misc
12+
.DS_Store
13+
.env.local
14+
.env.development.local
15+
.env.test.local
16+
.env.production.local
17+
18+
npm-debug.log*
19+
yarn-debug.log*
20+
yarn-error.log*
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
{
2+
"label": "Algorithms",
3+
"position": 2,
4+
"collapsed": false,
5+
"link": {
6+
"type": "generated-index",
7+
"description": "Control algorithms implemented in KoopmanRL, including the two KARL algorithms and baseline comparators."
8+
}
9+
}

docs/docs/algorithms/lqr.md

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
---
2+
id: lqr
3+
sidebar_position: 3
4+
title: Linear Quadratic Regulator (LQR)
5+
---
6+
7+
# Linear Quadratic Regulator (LQR)
8+
9+
The Linear Quadratic Regulator is a classical optimal control algorithm included in KoopmanRL as a baseline comparator. It provides an exact analytical solution for linear systems with quadratic cost, serving as an upper-bound reference on environments where linearity holds.
10+
11+
## Running LQR
12+
13+
```bash
14+
uv run -m koopmanrl.linear_quadratic_regulator
15+
```
16+
17+
With a specific environment:
18+
19+
```bash
20+
uv run -m koopmanrl.linear_quadratic_regulator --env_id FluidFlow-v0
21+
```
22+
23+
## When to use LQR
24+
25+
LQR is most informative on the `LinearSystem-v0` environment where its optimality assumptions are exactly satisfied. On nonlinear environments (Lorenz, Fluid Flow, Double Well) it provides a linearised-dynamics baseline that the KARL algorithms aim to outperform.
26+
27+
## Source
28+
29+
`koopmanrl/linear_quadratic_regulator.py`

docs/docs/algorithms/sac.md

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
---
2+
id: sac
3+
sidebar_position: 4
4+
title: Soft Actor-Critic Baselines
5+
---
6+
7+
# Soft Actor-Critic Baselines
8+
9+
KoopmanRL ships two CleanRL-style SAC baselines that provide model-free comparators for the KARL algorithms.
10+
11+
## Q-value SAC
12+
13+
Standard Soft Actor-Critic with a Q-function critic, adapted from CleanRL.
14+
15+
```bash
16+
uv run -m koopmanrl.sac_continuous_action --env_id Lorenz-v0
17+
```
18+
19+
**Source:** `koopmanrl/sac_continuous_action.py`
20+
21+
## Value-based SAC
22+
23+
A variant that uses a value function $V(s)$ rather than $Q(s,a)$, also from CleanRL.
24+
25+
```bash
26+
uv run -m koopmanrl.value_based_sac_continuous_action --env_id DoubleWell-v0
27+
```
28+
29+
**Source:** `koopmanrl/value_based_sac_continuous_action.py`
30+
31+
## Purpose
32+
33+
These baselines are the direct model-free counterparts to SAKC. Comparing SAKC against them on the same environment and seed budget quantifies the benefit of the Koopman critic.

docs/docs/algorithms/sakc.md

Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
---
2+
id: sakc
3+
sidebar_position: 2
4+
title: Soft Actor Koopman Critic (SAKC)
5+
---
6+
7+
# Soft Actor Koopman Critic (SAKC)
8+
9+
Soft Actor Koopman Critic (SAKC) is the second KARL algorithm. It extends the standard Soft Actor-Critic (SAC) framework by replacing the learned critic network with a critic derived from the Koopman tensor representation of the transition dynamics.
10+
11+
## Algorithm overview
12+
13+
SAKC follows the actor-critic paradigm:
14+
15+
1. **Koopman tensor construction** — as in SKVI, trajectories are collected and a Koopman tensor $\mathcal{K}$ is fitted to the environment's dynamics.
16+
2. **Koopman critic** — instead of learning $Q(s, a)$ from scratch via a neural network, SAKC computes the critic analytically from the Koopman tensor, exploiting its linear structure.
17+
3. **Actor update** — a standard stochastic policy gradient update is applied to the actor network using the Koopman-derived critic as the advantage signal.
18+
4. **Entropy regularisation** — a soft maximum-entropy objective is retained, balancing exploration and exploitation.
19+
20+
The Koopman critic replaces thousands of gradient steps of critic regression with a single closed-form computation, reducing sample complexity while maintaining the expressiveness of the actor.
21+
22+
## Running SAKC
23+
24+
```bash
25+
uv run -m koopmanrl.soft_actor_koopman_critic --env_id FluidFlow-v0
26+
```
27+
28+
## Key hyperparameters
29+
30+
| Flag | Default | Description |
31+
|------|---------|-------------|
32+
| `--env_id` | `LinearSystem-v0` | Environment to train on |
33+
| `--seed` | `1` | Random seed |
34+
| `--total_timesteps` | `100000` | Training budget |
35+
| `--num_trajectories` | `500` | Trajectories for Koopman tensor fitting |
36+
| `--actor_lr` | `3e-4` | Actor learning rate |
37+
| `--alpha` | `0.2` | Entropy regularisation coefficient |
38+
39+
Run `--help` to see the full list:
40+
41+
```bash
42+
uv run -m koopmanrl.soft_actor_koopman_critic --help
43+
```
44+
45+
## Using a pre-optimised config
46+
47+
```bash
48+
uv run python -m koopmanrl.soft_actor_koopman_critic \
49+
--config_file configurations/sakc_fluid_flow_hparams.json
50+
```
51+
52+
Override individual flags even when using a config:
53+
54+
```bash
55+
uv run python -m koopmanrl.soft_actor_koopman_critic \
56+
--config_file configurations/sakc_double_well_hparams.json \
57+
--seed 42
58+
```
59+
60+
## Source
61+
62+
`koopmanrl/soft_actor_koopman_critic.py`

docs/docs/algorithms/skvi.md

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
---
2+
id: skvi
3+
sidebar_position: 1
4+
title: Soft Koopman Value Iteration (SKVI)
5+
---
6+
7+
# Soft Koopman Value Iteration (SKVI)
8+
9+
Soft Koopman Value Iteration (SKVI) is the first of the two KARL algorithms. It replaces the standard Bellman backup in soft value iteration with one that exploits a learned Koopman tensor representation of the environment's transition dynamics.
10+
11+
## Algorithm overview
12+
13+
SKVI operates in discrete value-iteration fashion:
14+
15+
1. **Koopman tensor construction** — collect trajectories from the environment and fit a Koopman tensor $\mathcal{K}$ that maps observable functions of the current state-action pair to observables of the next state.
16+
2. **Lifted value iteration** — define the soft Bellman backup over the lifted (observable) space rather than the raw state space, exploiting the linearity of $\mathcal{K}$.
17+
3. **Policy extraction** — derive the policy from the soft value function in the observable space, then project back to action space.
18+
19+
The key benefit is that the linear structure of the Koopman operator allows the Bellman backup to be solved analytically, avoiding the regression step needed by model-free methods.
20+
21+
## Running SKVI
22+
23+
```bash
24+
uv run -m koopmanrl.soft_koopman_value_iteration --env_id LinearSystem-v0
25+
```
26+
27+
All supported environment IDs:
28+
29+
| Environment | `--env_id` |
30+
|-------------|-----------|
31+
| Linear System | `LinearSystem-v0` |
32+
| Fluid Flow | `FluidFlow-v0` |
33+
| Lorenz | `Lorenz-v0` |
34+
| Double Well | `DoubleWell-v0` |
35+
36+
## Key hyperparameters
37+
38+
| Flag | Default | Description |
39+
|------|---------|-------------|
40+
| `--env_id` | `LinearSystem-v0` | Environment to train on |
41+
| `--seed` | `1` | Random seed |
42+
| `--total_timesteps` | `100000` | Training budget |
43+
| `--num_trajectories` | `500` | Trajectories for Koopman tensor fitting |
44+
| `--observable_dim` | Env-specific | Dimension of the observable (lifted) space |
45+
46+
Run `--help` to see the full list:
47+
48+
```bash
49+
uv run -m koopmanrl.soft_koopman_value_iteration --help
50+
```
51+
52+
## Using a pre-optimised config
53+
54+
```bash
55+
uv run python -m koopmanrl.soft_koopman_value_iteration \
56+
--config_file configurations/skvi_lorenz_hparams.json
57+
```
58+
59+
## Source
60+
61+
`koopmanrl/soft_koopman_value_iteration.py`

docs/docs/api/_category_.json

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
{
2+
"label": "API Reference",
3+
"position": 5,
4+
"collapsed": false,
5+
"link": {
6+
"type": "generated-index",
7+
"description": "Reference documentation for the KoopmanRL Python API."
8+
}
9+
}

docs/docs/api/index.md

Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
---
2+
id: api
3+
sidebar_position: 1
4+
title: API Reference
5+
---
6+
7+
# API Reference
8+
9+
KoopmanRL is organised into two top-level packages.
10+
11+
## `koopmanrl` — core algorithms and environments
12+
13+
| Module | Contents |
14+
|--------|----------|
15+
| `koopmanrl.environments` | Four benchmark Gym environments |
16+
| `koopmanrl.soft_koopman_value_iteration` | SKVI training script |
17+
| `koopmanrl.soft_actor_koopman_critic` | SAKC training script |
18+
| `koopmanrl.linear_quadratic_regulator` | LQR baseline |
19+
| `koopmanrl.sac_continuous_action` | SAC (Q-value) baseline |
20+
| `koopmanrl.value_based_sac_continuous_action` | SAC (value-function) baseline |
21+
| `koopmanrl.koopman_observables` | Observable (lifting) functions |
22+
| `koopmanrl.koopman_tensor` | Koopman tensor construction and fitting |
23+
| `koopmanrl.opt_wrappers` | Wrappers for Optuna/Ray Tune integration |
24+
| `koopmanrl.utils` | Shared utilities (config loading, seeding) |
25+
| `koopmanrl.sakc_optuna_opt` | SAKC hyperparameter optimization |
26+
| `koopmanrl.skvi_optuna_opt` | SKVI hyperparameter optimization |
27+
28+
## `koopmanrl_utils` — post-processing and visualisation
29+
30+
| Module | Contents |
31+
|--------|----------|
32+
| `koopmanrl_utils.movies.generate_trajectories` | Roll out policies and save trajectory `.npy` files |
33+
| `koopmanrl_utils.movies.generate_trajectory_figure` | Static PNG trajectory plots with optional vector field |
34+
| `koopmanrl_utils.movies.generate_gifs` | Animated GIF generation from saved trajectories |
35+
| `koopmanrl_utils.run_optimized_experiments` | Re-run best configs across seeds |
36+
| `koopmanrl_utils.plot_csv_from_tensorboards` | Plot training curves from TensorBoard CSVs |
37+
38+
## Environments
39+
40+
All four environments follow the [OpenAI Gym](https://gymnasium.farama.org/) interface (`gym==0.23.1`). They are registered at import time and can be instantiated with:
41+
42+
```python
43+
import gym
44+
import koopmanrl.environments # registers all environments
45+
46+
env = gym.make("FluidFlow-v0")
47+
obs = env.reset()
48+
obs, reward, done, info = env.step(env.action_space.sample())
49+
```
50+
51+
### Environment IDs
52+
53+
| ID | Class | Source |
54+
|----|-------|--------|
55+
| `LinearSystem-v0` | `LinearSystem` | `koopmanrl/environments/linear_system.py` |
56+
| `FluidFlow-v0` | `FluidFlow` | `koopmanrl/environments/fluid_flow.py` |
57+
| `Lorenz-v0` | `Lorenz` | `koopmanrl/environments/lorenz.py` |
58+
| `DoubleWell-v0` | `DoubleWell` | `koopmanrl/environments/double_well.py` |
59+
60+
## Koopman tensor
61+
62+
The `koopmanrl.koopman_tensor` module provides the core Koopman tensor fitting routine used by both SKVI and SAKC. It accepts batches of transition tuples $(x_t, u_t, x_{t+1})$ and returns a tensor $\mathcal{K}$ such that
63+
64+
$$
65+
\phi(x_{t+1}) \approx \mathcal{K}(u_t) \, \phi(x_t)
66+
$$
67+
68+
where $\phi$ is the observable (lifting) function chosen via `koopmanrl.koopman_observables`.
69+
70+
## Config loading
71+
72+
`koopmanrl.utils.load_and_apply_config` provides layered configuration merging: a JSON file sets defaults, and any CLI flag explicitly provided takes precedence.
73+
74+
```python
75+
from koopmanrl.utils import load_and_apply_config
76+
77+
args = MyArgs().parse_args()
78+
args = load_and_apply_config(args, "configurations/sakc_fluid_flow_hparams.json")
79+
```

0 commit comments

Comments
 (0)