HumanCompatibleAI
diff --git a/‎.gitignore‎
Lines changed: 32 additions & 0 deletions b/‎.gitignore‎
Lines changed: 32 additions & 0 deletions
diff --git a/‎AGENTS.md‎
Lines changed: 9 additions & 0 deletions b/‎AGENTS.md‎
Lines changed: 9 additions & 0 deletions
diff --git a/‎AUTHORS‎
Lines changed: 2 additions & 0 deletions b/‎AUTHORS‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎CLAUDE.md‎
Lines changed: 1 addition & 0 deletions b/‎CLAUDE.md‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎LICENSE‎
Lines changed: 21 additions & 0 deletions b/‎LICENSE‎
Lines changed: 21 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 162 additions & 0 deletions b/‎README.md‎
Lines changed: 162 additions & 0 deletions
diff --git a/‎docs/HighJax coding conventions.md‎
Lines changed: 104 additions & 0 deletions b/‎docs/HighJax coding conventions.md‎
Lines changed: 104 additions & 0 deletions
diff --git a/‎docs/HighJax docs.md‎
Lines changed: 19 additions & 0 deletions b/‎docs/HighJax docs.md‎
Lines changed: 19 additions & 0 deletions
@@ -0,0 +1,32 @@
+*.py[co]
+__pycache__/
+
+.tox/
+.pytest_cache/
+.mypy_cache/
+.ruff_cache/
+
+dist/
+build/
+*.egg-info/
+
+*.bak
+*.wpu
+
+.coverage
+htmlcov
+pytest_report*.html
+
+node_modules
+.ipynb_checkpoints
+
+*.zip
+
+checkpoint_*
+.env
+.claude/
+
+# Rust:
+*/target/
+Cargo.lock
+*.rs.bk
@@ -0,0 +1,9 @@
+Documentation TOCs (start with the one relevant to your task):
+- `README.md` shows basic usage
+- `docs/HighJax docs.md` — Top-level TOC
+- `docs/HighJax/HighJax docs.md` — HighJax environment (state, observations, reward, NPCs)
+- `docs/Octane/Octane docs.md` — Octane TUI explorer (navigation, key bindings, rendering)
+
+More things:
+- Critical for writing code: `docs/HighJax coding conventions.md` Don't write code without consulting this and abiding to it.
+- Run tests like `JAX_PLATFORMS=cpu pytest -n 12 <...other pytest args if needed>`
@@ -0,0 +1,2 @@
+Ram Rachum
+University of California, Berkeley
@@ -0,0 +1 @@
+../AGENTS.md
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2024 The HighJax authors
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
@@ -0,0 +1,162 @@
+# HighJax: Highway Driving environment for Reinforcement Learning research
+
+<p align="center">
+    <img src="misc/videos/demo.webp" alt="HighJax PPO training demo"><br/>
+    <em>PPO agent learning to drive on a 4-lane highway</em>
+</p>
+
+HighJax is an autonomous driving environment for Reinforcement Learning research. It's a JAX implementation of the [HighwayEnv](https://github.com/Farama-Foundation/HighwayEnv). HighJax provides a fully JIT-compilable and vectorizable highway driving simulation.
+
+Besides being much faster than the original, it provides Octane, a Rust-based TUI for examining your experiment runs. Octane provides an interface for defining behaviors and then measuring how much each policy exhibits them.
+
+HighJax was produced as part of our research project about [BXRL:Behavior-Explainable Reinforcement Learning](https://arxiv.org/abs/XXXX.XXXXX).
+
+## Installation
+
+```bash
+pip install highjax # Minimal installation
+pip install "highjax[cuda12]" # Including GPU support
+pip install "highjax[trainer]" # Including PPO implementation
+pip install "highjax[cuda12,trainer]" # Including both
+```
+
+## Quick Start
+
+```python
+import jax
+import highjax
+
+env, params = highjax.make('highjax-v0')
+key = jax.random.PRNGKey(0)
+obs, state = env.reset(key, params)
+obs, state, reward, done, info = env.step(key, state, 1, params)  # IDLE
+```
+
+## Using with JAX RL Libraries
+
+HighJax follows the [gymnax](https://github.com/RobertTLange/gymnax) API, so it works with JAX RL frameworks that expect gymnax-style environments:
+
+- [PureJaxRL](https://github.com/luchris429/purejaxrl) — drop-in gymnax replacement (no PureJaxRL install needed), see [`examples/use_purejaxrl.py`](examples/use_purejaxrl.py)
+- [Stoix](https://github.com/EdanToledo/Stoix) — via `stoa` gymnax adapter, see [`examples/use_stoix.py`](examples/use_stoix.py)
+- [Rejax](https://github.com/keraJLi/rejax) — pass env object directly, see [`examples/use_rejax.py`](examples/use_rejax.py)
+
+## Training
+
+Train a PPO agent via the CLI:
+
+```bash
+highjax-trainer train
+```
+
+Key options:
+
+| Flag                | Default | Description                          |
+|---------------------|---------|--------------------------------------|
+| `--n-epochs` / `-e` | 300     | Training epochs                      |
+| `--n-es`            | 400     | Parallel episodes per epoch          |
+| `--n-ts`            | 40      | Timesteps per episode                |
+| `--seed` / `-s`     | 0       | Random seed                          |
+| `--actor-lr`        | 3e-4    | Actor learning rate                  |
+| `--critic-lr`       | 3e-3    | Critic learning rate                 |
+| `--n-npcs`          | 50      | NPC vehicles                         |
+| `--no-trek`         | —       | Disable trek recording               |
+| `--n-sample-es`     | 1       | Episodes to sample per epoch for trek|
+| `--trek-path`       | auto    | Custom trek directory path           |
+| `--discount`        | 0.95    | Discount factor (gamma)              |
+| `--n-lanes`         | 4       | Number of highway lanes              |
+
+Training automatically records episode data to `~/.highjax/t/` for browsing with Octane (the TUI). Use `--no-trek` to disable.
+
+Here's a snazzy one-liner that will let you explore the results of the current experiment run using [VisiData](https://github.com/saulpw/visidata):
+
+```bash
+pip install visidata
+vd "$(ls -d ~/.highjax/t/2*/ | tail -1)"/epochia.pq
+```
+
+Use the following command line to produce similar results as seen in Figure 2 of the paper:
+
+```bash
+highjax-trainer train --n-es 128 --n-ts 400 --n-epochs 300 --target-kld 0.0005
+```
+
+## Octane (Episode Browser)
+
+This repo also includes Octane, which is a Rust-based TUI for browsing HighJax experiments.
+
+### Installation
+
+```bash
+sudo apt-get install build-essential # C toolchain (needed by Rust)
+sudo apt-get install ffmpeg # Needed for `octane animate`
+git clone https://github.com/HumanCompatibleAI/HighJax # Clone this repo
+cd HighJax
+curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh # Install Rust
+source "$HOME/.cargo/env"
+cd octane && cargo build --release # Build Octane
+alias octane="$(readlink -f octane/target/release/octane)"
+```
+
+The binary will be at `octane/target/release/octane`.
+
+### Usage
+
+After training, launch Octane to see all the experiments you ran with `highjax-trainer`:
+
+```bash
+octane
+```
+
+### Figures
+
+Use Octane to make figures for your paper:
+
+```bash
+octane draw -t ~/.highjax/t/2026-03-15-20-02-25-101327 --epoch 300 -e 0 --timestep 19 --theme light \
+  --zoom 1.8 --png ~/figure.png
+```
+
+<p align="center">
+    <img src="misc/images/figure.png" alt="Octane figure output" width="428"><br/>
+</p>
+
+### Behavior crafting
+
+Octane includes a behavior explorer for defining measurable policy properties. While watching an episode, press `b` to capture a scenario — mark which actions you want (positive weight) or don't want (negative weight) at that traffic state. Name it, and Octane saves the behavior to `~/.highjax/behaviors/`. The next time you run `highjax-trainer train`, all discovered behaviors are evaluated every epoch and their scores are recorded as `behavior.{name}` columns in `epochia.parquet`.
+
+<p align="center">
+    <img src="misc/images/behavior_tui.png" alt="Behavior crafting dialog in Octane" width="364"><br/>
+    <em>Defining a behavior scenario in Octane</em>
+</p>
+
+Press `B` (Shift-B) to open the full Behavior Explorer tab.
+
+See the [Octane docs](docs/Octane/Octane%20docs.md) for full details.
+
+## Documentation
+
+Full documentation is in the `docs/` folder:
+
+- [HighJax environment docs](docs/HighJax/HighJax%20docs.md) — state, observations, reward, NPCs, physics
+- [Octane TUI docs](docs/Octane/Octane%20docs.md) — episode browser, configuration, key bindings
+- [Coding conventions](docs/HighJax%20coding%20conventions.md) — naming, array indices, style
+
+## Examples
+
+- `examples/basic_usage.py` — Create env, reset, step, print observations
+- `examples/train_ppo.py` — Train a PPO agent and evaluate it
+- `examples/use_purejaxrl.py` — PureJaxRL integration (vectorized scan loop)
+- `examples/use_stoix.py` — Stoix integration (via stoa gymnax adapter)
+- `examples/use_rejax.py` — Rejax integration (JIT-compiled training, vmapped seeds)
+
+## Citation
+
+If you use HighJax in your research, please cite:
+
+```bibtex
+@article{rachum2025bxrl,
+  title={BXRL: Behavior-Explainable Reinforcement Learning},
+  author={Rachum, Ram and Amitai, Yotam and Nakar, Yonatan and Mirsky, Reuth and Allen, Cameron},
+  year={2025}
+}
+```
@@ -0,0 +1,104 @@
+# HighJax Coding Conventions
+
+Related: [[HighJax docs]]
+
+Naming conventions, style rules, and JAX-specific patterns used throughout the HighJax codebase.
+
+## Array naming: `foo_by_bar_by_baz`
+
+Almost all JAX arrays use the pattern `foo_by_bar_by_baz`. Each `by_something` is an axis (or sometimes multiple axes). Axes go right-to-left: `foo_by_bar_by_baz` is a 2D array where axis 0 is baz, axis 1 is bar, and the value is foo.
+
+Examples:
+
+- `reward_by_e_by_t` -- shape (n_ts, n_es), reward at each (timestep, episode)
+- `p_by_action_by_e` -- shape (n_es, *action_shape), action probabilities
+
+Multi-dimensional items like `position` or `action` occupy multiple axes. The number of axes that `action` occupies depends on the environment.
+
+**This applies to ALL arrays** -- including intermediate variables, temporaries, results, diffs, distances. Even a simple subtraction result should be named `diff_by_e_by_t`, not `diff`.
+
+## Dimension size naming: `n_foos_per_bar`
+
+When unpacking array shapes to get dimension sizes, use `n_foos_per_bar`:
+
+- `foos`: plural of what's being counted
+- `bar`: the containing unit
+
+Examples:
+
+```python
+n_es_per_epoch, n_cells_per_e, n_tokens_per_vocabulary = logit_by_token_by_cell_by_e.shape
+n_es_per_epoch, n_cells_per_e = token_by_cell_by_e.shape
+```
+
+## Index variables: `i_foo`
+
+When naming a variable that's an index number, use `i_foo`. However, these are exceptions that don't need the `i_` prefix: `epoch`, `t`, `e`.
+
+## Shorthands
+
+Only use shorthands that already exist in the codebase. Don't invent new ones. Established shorthands:
+
+- `p` for probability
+- `e` for episode
+- `t` for timestep
+- `ft` for flat timestep (full flattened pool, in minibatch code)
+- `mt` for minibatch timestep (within one minibatch slice)
+- `ts` for timesteps (plural, in CLI args like `--n-ts`)
+- `es` for episodes (plural, in CLI args like `--n-es`)
+- `v` for value estimate
+- `vf` for value function
+- `kld` for KL divergence (never bare `kl`)
+- `obs` for observation (matches gymnax API convention)
+- `nz` for normalized (prefix, e.g. `nz_speed`, `nz_return`, `nz_advantage`)
+- `theta` for model parameters (neural network param dicts)
+- `vital` for alive-mask arrays (not post-crash)
+- `tendency` for log-probability of chosen action
+- `epilogue` for post-final-step values (e.g. `epilogue_v_by_agent_by_e`)
+- `lunge` for action dimension in multi-discrete action spaces
+- `deed` for a choice within a lunge
+- `mb` for minibatch (in axis names like `_by_mt_by_mb`)
+- `sweep` for one pass over all minibatches
+
+Don't shorten `position` to `pos`, etc.
+
+## Code style
+
+- Python 3.12+ required
+- Single quotes everywhere, unless there's a quote-in-quote situation
+- Maximum line length: 100 characters
+- Type annotations using builtins (`list`, `tuple`, `dict`), not `List`, `Tuple`, `Dict`
+- `from __future__ import annotations` at the top of every file
+- Import order: `__future__` > stdlib > third-party > highjax
+- snake_case for variables/functions, PascalCase for classes
+
+## Docstrings and comments
+
+The `highjax` environment package has docstrings on public API functions (reset, step, etc.) since it serves as a library. The `highjax_trainer` package generally avoids function docstrings — the code should be self-explanatory. Add comments sparingly, only when the code is genuinely difficult to understand otherwise.
+
+## JAX JIT
+
+Many functions that process arrays need to be JIT-compiled. This means:
+
+- No Python control flow on array values (use `jnp.where` instead of `if`)
+- Be careful with loops (use `jax.lax.scan` or `jax.vmap` instead)
+- Use `jax.Array` and pytree-compatible data structures
+
+## Flax dataclasses
+
+HighJax uses `@flax.struct.dataclass` for most data classes. These are JAX-compatible (pytree-registered) frozen dataclasses. Fields can be marked `pytree_node=False` via `flax.struct.field(pytree_node=False)` to exclude from JAX tracing (e.g., config objects).
+
+## Minibatch PPO pipeline
+
+The gradient computation pipeline has a specific data flow with its own naming:
+
+```
+Ascender -> SweepMaster (flatten to _by_ft) -> Sweeper (shuffle to _by_mt_by_minibatch) -> Minibatcher (_by_mt per minibatch)
+```
+
+The Minibatcher computes the composite actor objective (PPO clipped surrogate + entropy) and produces gradients via `jax.grad`. The critic is updated separately.
+
+## Testing
+
+- **Golden tests** (`test_golden_runs/`): Deterministic training runs with exact expected values. When the training pipeline changes, these need regeneration. Each test defines its own `train()` function; run it, capture the new values, update `golden_data`.
+- **Unit tests**: Everything else -- estimators, objectives, masking, freezing, trainer integration, etc.
@@ -0,0 +1,19 @@
+# HighJax Docs
+
+Top-level table of contents for HighJax documentation.
+
+## Environment
+
+[[HighJax environment]] — State, observations, reward, NPCs, physics.
+
+## Training
+
+[[Trainer docs]] — PPO pipeline, epochia.parquet, trek recording, behaviors.
+
+## TUI Explorer
+
+[[Octane docs]] — Octane TUI: navigation, key bindings, rendering.
+
+## Coding
+
+[[HighJax coding conventions]] — Naming conventions, style rules, JAX patterns.
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,2 @@`
	`1`	`+Ram Rachum`
	`2`	`+University of California, Berkeley`