Emerge-Lab · HC-Seaple · Jun 14, 2026 · Jun 15, 2026
diff --git a/.gitignore b/.gitignore
@@ -190,3 +190,13 @@ pufferlib/resources/drive/output*.gif
 emsdk/
 docs/book/*
 !docs/book/assets/
+
+# Local PufferDrive RL experiments
+/native_3d_renders/
+/training_visualizations/
+/visualizations/
+/outputs/
+/pufferdrive_headless_demo.mp4
+/tfrecord-*.json
+/waymo_viewer_server.*.log
+/pufferlib/resources/drive/waymo_*_json/
diff --git a/README.md b/README.md
@@ -1,218 +1,203 @@
-# PufferDrive
+# PufferDrive Minimal PPO on Windows and WSL
 
-[![Unit Tests](https://github.com/Emerge-Lab/PufferDrive/actions/workflows/utest.yml/badge.svg)](https://github.com/Emerge-Lab/PufferDrive/actions/workflows/utest.yml)
+This branch adds a small reinforcement learning workflow on top of the
+original [PufferDrive](https://github.com/Emerge-Lab/PufferDrive) project.
 
-<img align="left" style="width:260px" src="https://github.com/Emerge-Lab/PufferDrive/blob/main/pufferlib/resources/drive/pufferdrive_20fps_long.gif" width="288px">
+The goal is simple:
 
-**PufferDrive is a fast and friendly driving simulator to train and test RL-based models.**
+1. Export Waymo Motion Dataset scenarios as JSON.
+2. Convert the JSON files to PufferDrive map binaries.
+3. Train a small continuous-action PPO policy.
+4. Render the trained policy as an MP4 video.
 
-<br>
-<br>
-<br>
-<br>
-<br>
-<br>
-<br>
-<br>
-<br>
-<br>
+This is a working training example, not a finished autonomous driving model.
+The default short run is mainly useful for checking that the complete pipeline
+works.
 
----
+## Why WSL is used
 
-**Docs**: https://emerge-lab.github.io/PufferDrive
+The native PufferDrive renderer uses Linux libraries and Raylib. On Windows,
+the easiest setup is:
 
----
+- Keep the Git repository on the Windows drive.
+- Run compilation, training, and native 3D rendering inside WSL.
+- Copy checkpoints and videos back to the Windows repository.
 
-### See our 2.0 release video
+The setup script creates a Linux copy at:
 
-<a href="https://www.youtube.com/watch?v=LfQ324R-cbE">
-  <img src="https://img.youtube.com/vi/LfQ324R-cbE/0.jpg" alt="PufferDrive 2.0" width="300">
-</a>
-
-## Installation
-
-Clone the repo
-```bash
-https://github.com/Emerge-Lab/PufferDrive.git
+```text
+~/PufferDrive-native
 ```
 
-Make a venv (`uv venv`), activate the venv
-```
-source .venv/bin/activate
-```
+Your original Windows files remain under:
 
-Inside the venv, install the dependencies
-```
-uv pip install -e .
+```text
+/mnt/c/Users/<username>/Desktop/PufferDrive
 ```
 
-Compile the C code
-```
-python setup.py build_ext --inplace --force
-```
-Run this while your virtual environment is active so the extension is built against the right interpreter.
+## 1. Clone this branch
 
-To test your setup, you can run
-```
-puffer train puffer_drive
-```
-See also the [puffer docs](https://puffer.ai/docs.html).
+From PowerShell:
 
+```powershell
+git clone --branch codex/minimal-ppo-wsl https://github.com/HC-Seaple/PufferDrive.git
+cd PufferDrive
+```
 
-## Quick start
+If Ubuntu is not installed in WSL, open PowerShell as Administrator:
 
-Start a training run
+```powershell
+powershell -ExecutionPolicy Bypass -File scripts/install_wsl_admin.ps1
 ```
-puffer train puffer_drive
-```
-
-## Dataset
 
-<details>
-<summary>Downloading and using data</summary>
+Restart Windows if requested, then open Ubuntu or run `wsl`.
 
-### Data preparation
+## 2. Build the native environment
 
-To train with PufferDrive, you need to convert JSON files to map binaries. Run the following command with the path to your data folder:
+From WSL:
 
 ```bash
-python pufferlib/ocean/drive/drive.py
+cd /mnt/c/Users/<username>/Desktop/PufferDrive
+bash scripts/wsl_native_3d_setup.sh
 ```
 
-### Downloading Waymo Data
-
-You can download the WOMD data from Hugging Face in two versions:
-
-- **Mini dataset**: [GPUDrive_mini](https://huggingface.co/datasets/EMERGE-lab/GPUDrive_mini) contains 1,000 training files and 300 test/validation files
-- **Medium dataset**: [10,000 files from the training dataset](https://huggingface.co/datasets/daphne-cornelisse/pufferdrive_train)
-- **Large dataset**: [GPUDrive](https://huggingface.co/datasets/EMERGE-lab/GPUDrive) contains 100,000 unique scenes
-
-**Note**: Replace 'GPUDrive_mini' with 'GPUDrive' in your download commands if you want to use the full dataset.
+This installs the Linux dependencies, creates a Python environment, copies the
+repository to the Linux filesystem, and builds the native extension.
 
-### Additional Data Sources
+The warning messages from the C compiler about ignored return values are not
+fatal. The setup is successful when it prints:
 
-For more training data compatible with PufferDrive, see [ScenarioMax](https://github.com/valeoai/ScenarioMax). The GPUDrive data format is fully compatible with PufferDrive.
-</details>
-
-
-## Visualizer
-
-<details>
-<summary>Dependencies and usage</summary>
+```text
+Done. Native build is ready
+```
 
-## Local rendering
+## 3. Prepare Waymo scenarios
 
-To launch an interactive renderer, first build:
-```
-bash scripts/build_ocean.sh drive local
-```
+Put one or more exported scenario JSON files in the Windows repository. Then
+run from WSL:
 
-then launch:
 ```bash
-./drive
+cd /mnt/c/Users/<username>/Desktop/PufferDrive
+bash scripts/prepare_waymo_maps_wsl.sh \
+  ./scenario_a.json \
+  ./scenario_b.json
 ```
-this will run `demo()` with an existing model checkpoint.
-
-## Headless server setup
-
-Run the Raylib visualizer on a headless server and export as .mp4. This will rollout the pre-trained policy in the env.
 
-### Install dependencies
+The binary maps are written to the Linux-native repository:
 
-```bash
-sudo apt update
-sudo apt install ffmpeg xvfb
+```text
+~/PufferDrive-native/resources/drive/binaries/training
 ```
 
-For HPC (There are no root privileges), so install into the conda environment
-```bash
-conda install -c conda-forge xorg-x11-server-xvfb-cos6-x86_64
-conda install -c conda-forge ffmpeg
+The files must be contiguous:
+
+```text
+map_000.bin
+map_001.bin
+map_002.bin
 ```
 
-- `ffmpeg`: Video processing and conversion
-- `xvfb`: Virtual display for headless environments
+## 4. Run a small training test
 
-### Build and run
+From the Windows-mounted repository in WSL:
 
-1. Build the application:
 ```bash
-bash scripts/build_ocean.sh visualize local
+bash scripts/run_minimal_ppo_wsl.sh \
+  --map-dir resources/drive/binaries/training \
+  --num-maps 2 \
+  --num-envs 1 \
+  --total-timesteps 10000 \
+  --rollout-steps 128 \
+  --minibatch-size 128
 ```
 
-2. Run with virtual display:
-```bash
-xvfb-run -s "-screen 0 1280x720x24" ./visualize
-```
+Change `--num-maps` to match the number of prepared map files.
 
-The `-s` flag sets up a virtual screen at 1280x720 resolution with 24-bit color depth.
+Checkpoints are saved under:
 
----
+```text
+~/PufferDrive-native/checkpoints/minimal_ppo
+```
 
-> To force a rebuild, you can delete the cached compiled executable binary using `rm ./visualize`.
+The training is running correctly when the step count increases, the losses
+remain finite, and `.pt` checkpoint files are created.
 
----
+## 5. Render the trained policy
 
-</details>
+```bash
+bash scripts/visualize_minimal_ppo_wsl.sh \
+  --map-dir resources/drive/binaries/training \
+  --num-maps 2 \
+  --episode-length 91 \
+  --draw-traces
+```
 
+The script renders with the native Raylib 3D renderer and copies the MP4 and
+JSON metrics to the Windows folder:
 
-## Benchmarks
+```text
+training_visualizations/
+```
 
-### Distributional realism
+## Visualize a Waymo JSON without training
 
-We provide a PufferDrive implementation of the [Waymo Open Sim Agents Challenge (WOSAC)](https://waymo.com/open/challenges/2025/sim-agents/) for fast, easy evaluation of how well your trained agent matches distributional properties of human behavior. See documentation [here](https://emerge-lab.github.io/PufferDrive/wosac/).
+The following commands use the Windows virtual environment.
 
-WOSAC evaluation with random policy:
-```bash
-puffer eval puffer_drive --eval.wosac-realism-eval True
-```
+Create a top-down replay:
 
-WOSAC evaluation with your checkpoint (must be .pt file):
-```bash
-puffer eval puffer_drive --eval.wosac-realism-eval True --load-model-path <your-trained-policy>.pt
+```powershell
+.\.venv\Scripts\python.exe scripts\visualize_waymo_json.py scenario.json
 ```
 
-### Human-compatibility
+Create a simple 3D chase-camera replay for a selected Waymo track:
 
-You may be interested in how compatible your agent is with human partners. For this purpose, we support an eval where your policy only controls the self-driving car (SDC). The rest of the agents in the scene are stepped using the logs. While it is not a perfect eval since the human partners here are static, it will still give you a sense of how closely aligned your agent's behavior is to how people drive. You can run it like this:
-```bash
-puffer eval puffer_drive --eval.human-replay-eval True --load-model-path <your-trained-policy>.pt
+```powershell
+.\.venv\Scripts\python.exe scripts\render_waymo_follow_3d.py scenario.json `
+  --track-index 90 `
+  --start-frame 0 `
+  --end-frame 60
 ```
 
-## Development
+The lightweight chase-camera renderer uses boxes and map lines. It is useful
+for quickly checking a recorded trajectory without compiling Raylib. The
+native checkpoint renderer uses the PufferDrive car models and full native
+rendering.
 
-<details><summary>Documentation and browser demo</summary>
+Outputs are written to:
 
-**Docs**
-
-A browsable documentation site now lives under `docs/` and is configured with mkbooks. To preview locally:
-```
-brew install mdbook
-mdbook serve --open docs
+```text
+visualizations/
 ```
-Open the served URL to see a local version of the docs.
 
-**Interactive demo**
+## Main files
 
-To edit the browser demo, follow these steps:
-- Download [emscripten](https://github.com/emscripten-core/emscripten)
-- emscripten install latest
-- Activate: `source emsdk/emsdk_env.sh`
-- Run `bash scripts/build_ocean.sh drive web`
-- This generates a number of `game*` files, move them to `assets/` to include them on the webpage
+| File | Purpose |
+| --- | --- |
+| `scripts/minimal_ppo_train.py` | Small PPO actor-critic training loop |
+| `scripts/parallel_data_collect.py` | Original rollout pattern used by the trainer |
+| `scripts/prepare_waymo_maps.py` | Converts Waymo JSON to PufferDrive maps |
+| `scripts/run_minimal_ppo_wsl.sh` | Starts training in the Linux-native copy |
+| `scripts/visualize_minimal_ppo.py` | Runs a checkpoint and records native 3D video |
+| `scripts/visualize_waymo_json.py` | Creates a top-down JSON replay |
+| `scripts/render_waymo_follow_3d.py` | Creates a lightweight 3D chase replay |
+| `docs/src/minimal-ppo.md` | More detail about PPO and command options |
 
-</details>
+## Current limitation
 
+A 10,000-step run only verifies the architecture. It is usually not enough to
+learn good driving. Early policies may reverse, steer poorly, or fail to reach
+the goal.
 
-## Citation
+For a useful model, use more scenarios and training steps, then evaluate on
+scenarios that were not used for training. Useful measurements include:
 
-If you use PufferDrive in your research, please cite:
-```bibtex
-@software{pufferdrive2025github,
-  author = {Daphne Cornelisse* and Spencer Cheng* and Pragnay Mandavilli and Julian Hunt and Kevin Joseph and Waël Doulazmi and Valentin Charraut and Aditya Gupta and Joseph Suarez and Eugene Vinitsky},
-  title = {{PufferDrive}: A Fast and Friendly Driving Simulator for Training and Evaluating {RL} Agents},
-  url = {https://github.com/Emerge-Lab/PufferDrive},
-  version = {2.0.0},
-  year = {2025},
-}
-```
+- goal completion rate
+- collision rate
+- off-road rate
+- reverse-motion frequency
+- average episode return
+
+## Upstream project
+
+PufferDrive is developed by Emerge Lab. The original documentation is
+available at <https://emerge-lab.github.io/PufferDrive>.