|
| 1 | +# HighJax: Highway Driving environment for Reinforcement Learning research |
| 2 | + |
| 3 | +<p align="center"> |
| 4 | + <img src="misc/videos/demo.webp" alt="HighJax PPO training demo"><br/> |
| 5 | + <em>PPO agent learning to drive on a 4-lane highway</em> |
| 6 | +</p> |
| 7 | + |
| 8 | +HighJax is an autonomous driving environment for Reinforcement Learning research. It's a JAX implementation of the [HighwayEnv](https://github.com/Farama-Foundation/HighwayEnv). HighJax provides a fully JIT-compilable and vectorizable highway driving simulation. |
| 9 | + |
| 10 | +Besides being much faster than the original, it provides Octane, a Rust-based TUI for examining your experiment runs. Octane provides an interface for defining behaviors and then measuring how much each policy exhibits them. |
| 11 | + |
| 12 | +HighJax was produced as part of our research project about [BXRL:Behavior-Explainable Reinforcement Learning](https://arxiv.org/abs/XXXX.XXXXX). |
| 13 | + |
| 14 | +## Installation |
| 15 | + |
| 16 | +```bash |
| 17 | +pip install highjax # Minimal installation |
| 18 | +pip install "highjax[cuda12]" # Including GPU support |
| 19 | +pip install "highjax[trainer]" # Including PPO implementation |
| 20 | +pip install "highjax[cuda12,trainer]" # Including both |
| 21 | +``` |
| 22 | + |
| 23 | +## Quick Start |
| 24 | + |
| 25 | +```python |
| 26 | +import jax |
| 27 | +import highjax |
| 28 | + |
| 29 | +env, params = highjax.make('highjax-v0') |
| 30 | +key = jax.random.PRNGKey(0) |
| 31 | +obs, state = env.reset(key, params) |
| 32 | +obs, state, reward, done, info = env.step(key, state, 1, params) # IDLE |
| 33 | +``` |
| 34 | + |
| 35 | +## Using with JAX RL Libraries |
| 36 | + |
| 37 | +HighJax follows the [gymnax](https://github.com/RobertTLange/gymnax) API, so it works with JAX RL frameworks that expect gymnax-style environments: |
| 38 | + |
| 39 | +- [PureJaxRL](https://github.com/luchris429/purejaxrl) — drop-in gymnax replacement (no PureJaxRL install needed), see [`examples/use_purejaxrl.py`](examples/use_purejaxrl.py) |
| 40 | +- [Stoix](https://github.com/EdanToledo/Stoix) — via `stoa` gymnax adapter, see [`examples/use_stoix.py`](examples/use_stoix.py) |
| 41 | +- [Rejax](https://github.com/keraJLi/rejax) — pass env object directly, see [`examples/use_rejax.py`](examples/use_rejax.py) |
| 42 | + |
| 43 | +## Training |
| 44 | + |
| 45 | +Train a PPO agent via the CLI: |
| 46 | + |
| 47 | +```bash |
| 48 | +highjax-trainer train |
| 49 | +``` |
| 50 | + |
| 51 | +Key options: |
| 52 | + |
| 53 | +| Flag | Default | Description | |
| 54 | +|---------------------|---------|--------------------------------------| |
| 55 | +| `--n-epochs` / `-e` | 300 | Training epochs | |
| 56 | +| `--n-es` | 400 | Parallel episodes per epoch | |
| 57 | +| `--n-ts` | 40 | Timesteps per episode | |
| 58 | +| `--seed` / `-s` | 0 | Random seed | |
| 59 | +| `--actor-lr` | 3e-4 | Actor learning rate | |
| 60 | +| `--critic-lr` | 3e-3 | Critic learning rate | |
| 61 | +| `--n-npcs` | 50 | NPC vehicles | |
| 62 | +| `--no-trek` | — | Disable trek recording | |
| 63 | +| `--n-sample-es` | 1 | Episodes to sample per epoch for trek| |
| 64 | +| `--trek-path` | auto | Custom trek directory path | |
| 65 | +| `--discount` | 0.95 | Discount factor (gamma) | |
| 66 | +| `--n-lanes` | 4 | Number of highway lanes | |
| 67 | + |
| 68 | +Training automatically records episode data to `~/.highjax/t/` for browsing with Octane (the TUI). Use `--no-trek` to disable. |
| 69 | + |
| 70 | +Here's a snazzy one-liner that will let you explore the results of the current experiment run using [VisiData](https://github.com/saulpw/visidata): |
| 71 | + |
| 72 | +```bash |
| 73 | +pip install visidata |
| 74 | +vd "$(ls -d ~/.highjax/t/2*/ | tail -1)"/epochia.pq |
| 75 | +``` |
| 76 | + |
| 77 | +Use the following command line to produce similar results as seen in Figure 2 of the paper: |
| 78 | + |
| 79 | +```bash |
| 80 | +highjax-trainer train --n-es 128 --n-ts 400 --n-epochs 300 --target-kld 0.0005 |
| 81 | +``` |
| 82 | + |
| 83 | +## Octane (Episode Browser) |
| 84 | + |
| 85 | +This repo also includes Octane, which is a Rust-based TUI for browsing HighJax experiments. |
| 86 | + |
| 87 | +### Installation |
| 88 | + |
| 89 | +```bash |
| 90 | +sudo apt-get install build-essential # C toolchain (needed by Rust) |
| 91 | +sudo apt-get install ffmpeg # Needed for `octane animate` |
| 92 | +git clone https://github.com/HumanCompatibleAI/HighJax # Clone this repo |
| 93 | +cd HighJax |
| 94 | +curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh # Install Rust |
| 95 | +source "$HOME/.cargo/env" |
| 96 | +cd octane && cargo build --release # Build Octane |
| 97 | +alias octane="$(readlink -f octane/target/release/octane)" |
| 98 | +``` |
| 99 | + |
| 100 | +The binary will be at `octane/target/release/octane`. |
| 101 | + |
| 102 | +### Usage |
| 103 | + |
| 104 | +After training, launch Octane to see all the experiments you ran with `highjax-trainer`: |
| 105 | + |
| 106 | +```bash |
| 107 | +octane |
| 108 | +``` |
| 109 | + |
| 110 | +### Figures |
| 111 | + |
| 112 | +Use Octane to make figures for your paper: |
| 113 | + |
| 114 | +```bash |
| 115 | +octane draw -t ~/.highjax/t/2026-03-15-20-02-25-101327 --epoch 300 -e 0 --timestep 19 --theme light \ |
| 116 | + --zoom 1.8 --png ~/figure.png |
| 117 | +``` |
| 118 | + |
| 119 | +<p align="center"> |
| 120 | + <img src="misc/images/figure.png" alt="Octane figure output" width="428"><br/> |
| 121 | +</p> |
| 122 | + |
| 123 | +### Behavior crafting |
| 124 | + |
| 125 | +Octane includes a behavior explorer for defining measurable policy properties. While watching an episode, press `b` to capture a scenario — mark which actions you want (positive weight) or don't want (negative weight) at that traffic state. Name it, and Octane saves the behavior to `~/.highjax/behaviors/`. The next time you run `highjax-trainer train`, all discovered behaviors are evaluated every epoch and their scores are recorded as `behavior.{name}` columns in `epochia.parquet`. |
| 126 | + |
| 127 | +<p align="center"> |
| 128 | + <img src="misc/images/behavior_tui.png" alt="Behavior crafting dialog in Octane" width="364"><br/> |
| 129 | + <em>Defining a behavior scenario in Octane</em> |
| 130 | +</p> |
| 131 | + |
| 132 | +Press `B` (Shift-B) to open the full Behavior Explorer tab. |
| 133 | + |
| 134 | +See the [Octane docs](docs/Octane/Octane%20docs.md) for full details. |
| 135 | + |
| 136 | +## Documentation |
| 137 | + |
| 138 | +Full documentation is in the `docs/` folder: |
| 139 | + |
| 140 | +- [HighJax environment docs](docs/HighJax/HighJax%20docs.md) — state, observations, reward, NPCs, physics |
| 141 | +- [Octane TUI docs](docs/Octane/Octane%20docs.md) — episode browser, configuration, key bindings |
| 142 | +- [Coding conventions](docs/HighJax%20coding%20conventions.md) — naming, array indices, style |
| 143 | + |
| 144 | +## Examples |
| 145 | + |
| 146 | +- `examples/basic_usage.py` — Create env, reset, step, print observations |
| 147 | +- `examples/train_ppo.py` — Train a PPO agent and evaluate it |
| 148 | +- `examples/use_purejaxrl.py` — PureJaxRL integration (vectorized scan loop) |
| 149 | +- `examples/use_stoix.py` — Stoix integration (via stoa gymnax adapter) |
| 150 | +- `examples/use_rejax.py` — Rejax integration (JIT-compiled training, vmapped seeds) |
| 151 | + |
| 152 | +## Citation |
| 153 | + |
| 154 | +If you use HighJax in your research, please cite: |
| 155 | + |
| 156 | +```bibtex |
| 157 | +@article{rachum2025bxrl, |
| 158 | + title={BXRL: Behavior-Explainable Reinforcement Learning}, |
| 159 | + author={Rachum, Ram and Amitai, Yotam and Nakar, Yonatan and Mirsky, Reuth and Allen, Cameron}, |
| 160 | + year={2025} |
| 161 | +} |
| 162 | +``` |
0 commit comments