Skip to content

Commit c0c5894

Browse files
daphne-cornelisseDaphnejulianh65
authored
Fix goal resampling in Carla maps and make metrics suitable for resampling in longer episodes (#186)
* Make road lines and lanes visible in map. * Simplify goal resample algorithm: Pick best road lane point in road graph. * Delete redundant code. * Make the target distance to the new goal configurable. * Generalize metrics to work for longer episodes with resampling. Also delete a bunch of unused graph topology code. * Minor * Apply precommit. * Fix in visualizer. * fix metrics * WIP * Add goal behavior flag. * Add fallback for goal resampling and cleanup. * Make goal radius more visible. * Minor * Make grid appear in the background. * Minor. * Merge * Fix bug in logging num goals reached and sampled. * Add goal taret * Use classic dynamics model. * Fix descrepancies between demo() and eval_gif(). * Small bug fix. * Reward shaping * Termination mode must be 0 for Carla maps. * Add all args from ini to demo() env. * Clean up visualization code. * Clean up metrics and vis. * Fix metrics. * Add diversity to agent view. * Add better fallback. * Reserve red for cars that are in collision. * Keep track of current goals. * Carla testing simple/ * Use classic dynamics by default. * Fix small bug in goal logging (respawn). * Always draw agent obs when resampling goals. * Increase render videos timeout (carla maps take longer). * Minor vis changes. * Minor vis changes. * Rmv displacement error for now and add goal speed target. * Add optional goal speed. * Incorporate suggestions. * Revert settings. * Revert settings. * Revert settings. * Fixes * Add docs * Minor * Make grid appear in background. * Edits. * Typo. * Minor visual adaptations. --------- Co-authored-by: Daphne <daphn3cor@gmail.com> Co-authored-by: julianh65 <jhunt17159@gmail.com>
1 parent 657281c commit c0c5894

16 files changed

Lines changed: 380 additions & 616 deletions

File tree

docs/simulator.md

Lines changed: 25 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,9 @@
33
Deep dive into how the Drive environment is wired, what it expects as inputs, and how observations/actions/configs are shaped. The environment entrypoint is `pufferlib/ocean/drive/drive.py`, which wraps the C core in `pufferlib/ocean/drive/drive.h` via `binding.c`.
44

55
## Runtime inputs and lifecycle
6+
67
- **Map binaries**: The environment scans `resources/drive/binaries` for `map_*.bin` files and requires at least one to load. Keep `num_maps` no larger than what is present on disk. During vectorized setup, `binding.shared` samples maps until it accumulates at least `num_agents` controllable entities, skipping maps with no valid agents (`set_active_agents` in `drive.h`).
7-
- **Episode length**: Default `scenario_length = 91` to match the Waymo logs (trajectory data is 91 steps), but you can set `env.scenario_length` (CLI or `.ini`) to any positive value. Metrics are logged and `c_reset` is called when `timestep == scenario_length`.
8+
- **Episode length**: Default `episode_length = 91` to match the Waymo logs (trajectory data is 91 steps), but you can set `env.episode_length` (CLI or `.ini`) to any positive value. Metrics are logged and `c_reset` is called when `timestep = episode_length`.
89
- **Resampling maps**: Python-side `Drive.step` reinitializes the vectorized environments every `resample_frequency` steps (default `910`, ~10 episodes) with fresh map IDs and seeds.
910
- **Initialization controls**:
1011
- `init_steps` starts agents from a later timestep in the logged trajectory.
@@ -15,6 +16,7 @@ Deep dive into how the Drive environment is wired, what it expects as inputs, an
1516
See [Data](data.md) for how to produce the `.bin` inputs, including the binary layout.
1617

1718
## Actions and dynamics
19+
1820
- **Action types** (`env.action_type`):
1921
- `discrete` (default): classic dynamics use a single `MultiDiscrete([7*13])` index decoded into acceleration (`ACCELERATION_VALUES`) and steering (`STEERING_VALUES`); jerk dynamics use `MultiDiscrete([4, 3])` over `JERK_LONG`/`JERK_LAT`.
2022
- `continuous`: a 2-D Box in `[-1, 1]`. Classic scales to the max accel/steer magnitudes used in the discrete table. Jerk scales asymmetrically: negative values reach up to `-15 m/s^3` braking, positives up to `4 m/s^3` acceleration, lateral jerk up to `±4 m/s^3`.
@@ -23,6 +25,7 @@ See [Data](data.md) for how to produce the `.bin` inputs, including the binary l
2325
- `jerk`: integrates longitudinal/lateral jerk into accel, then into velocity/pose with steering limited to `±0.55 rad`. Speeds are clipped to `[0, 20] m/s`.
2426

2527
## Observation space
28+
2629
Shape is `ego_features + 63 * 7 + 200 * 7` = `1848` for classic dynamics (`ego_features = 7`) or `1851` for jerk dynamics (`ego_features = 10`). Computed in `compute_observations` (`drive.h`):
2730

2831
- **Ego block** (classic):
@@ -37,16 +40,17 @@ Shape is `ego_features + 63 * 7 + 200 * 7` = `1848` for classic dynamics (`ego_f
3740
- Longitudinal acceleration normalized to `[-15, 4]`
3841
- Lateral acceleration normalized to `[-4, 4]`
3942
- Respawn flag (index 9)
40-
- **Partner blocks**: Up to 63 other agents (active first, then static experts) within 50 m. Each uses 7 values: relative (x, y) in ego frame scaled by `0.02`, width/length normalized as above, relative heading encoded as `(cos Δθ, sin Δθ)`, and speed / `MAX_SPEED`. Zero-padded when fewer neighbors are present or when agents are in respawn.
43+
- **Partner blocks**: Up to `MAX_AGENTS-1` other agents (active first, then static experts) within 50 m. Each uses 7 values: relative (x, y) in ego frame scaled by `0.02 `, width/length normalized as above, relative heading encoded as `(cos Δθ, sin Δθ)`, and speed / `MAX_SPEED`. Zero-padded when fewer neighbors are present or when agents are in respawn.
4144
- **Road blocks**: Up to 200 nearby road segments pulled from a precomputed grid (`vision_range = 21`). Each entry stores relative midpoint (x, y) scaled by `0.02`, segment length / `MAX_ROAD_SEGMENT_LENGTH` (100 m), width / `MAX_ROAD_SCALE` (100), `(cos, sin)` of the segment direction in ego frame, and a type ID (`ROAD_LANE`..`DRIVEWAY` stored as `0..6`). Remaining slots are zero-padded.
4245

4346
## Rewards, termination, and metrics
47+
4448
- **Per-step rewards** (`c_step`):
4549
- Collision with another actor: `reward_vehicle_collision` (default `-0.5`)
4650
- Off-road (road-edge intersection): `reward_offroad_collision` (default `-0.2`)
4751
- Goal reached: `reward_goal` (default `1.0`) or `reward_goal_post_respawn` after a respawn
48-
- Optional ADE shaping: `reward_ade * avg_displacement_error`, where ADE is accumulated in `compute_agent_metrics`
49-
- **Termination**: No early truncation; episodes roll to `scenario_length` steps. If `goal_behavior` is respawn, `respawn_agent` resets the pose and marks `respawn_timestep` so the respawn flag shows up in observations.
52+
53+
- **Termination**: No early truncation; episodes roll to episode_length steps. If `goal_behavior` is respawn, `respawn_agent` resets the pose and marks `respawn_timestep` so the respawn flag shows up in observations.
5054
- **Logged metrics** (`add_log` aggregates over all active agents across envs):
5155
- `score`: reached goal without collision/off-road
5256
- `collision_rate` / `offroad_rate`: fraction of agents with ≥1 event in the episode
@@ -57,23 +61,28 @@ Shape is `ego_features + 63 * 7 + 200 * 7` = `1848` for classic dynamics (`ego_f
5761
`collision_behavior`, `offroad_behavior`, `reward_vehicle_collision_post_respawn`, and `spawn_immunity_timer` are parsed from the INI but currently unused in the stepping logic.
5862

5963
## Configuration files (`.ini`)
64+
6065
`pufferlib/config/default.ini` supplies global defaults. Environment-specific overrides live in `pufferlib/config/ocean/drive.ini` and are loaded first when you run `puffer train puffer_drive`; CLI flags (e.g., `--env.num-maps 128`) override both.
6166

6267
Key sections in `pufferlib/config/ocean/drive.ini`:
63-
- **[env]**: Simulator knobs: `num_agents` (policy slots, C core cap 64), `num_maps`, `scenario_length`, `resample_frequency`, `action_type`, `dynamics_model`, rewards, `goal_radius`, `goal_behavior`, `init_steps`, `init_mode`, `control_mode`; rendering toggles `render`, `render_interval`, `obs_only`, `show_grid`, `show_lasers`, `show_human_logs`, `render_map`.
68+
69+
- **[env]**: Simulator knobs: `num_agents` (policy slots, C core cap 64), `num_maps`, episode_length, `resample_frequency`, `action_type`, `dynamics_model`, rewards, `goal_radius`, `goal_behavior`, `init_steps`, `init_mode`, `control_mode`; rendering toggles `render`, `render_interval`, `obs_only`, `show_grid`, `show_lasers`, `show_human_logs`, `render_map`.
6470
- **[vec]**: Vectorization sizing (`num_envs`, `num_workers`, `batch_size`; backend defaults to multiprocessing).
6571
- **[policy]/[rnn]**: Model widths for the Torch policy (`input_size`, `hidden_size`) and optional LSTM wrapper.
6672
- **[train]**: PPO-style hyperparameters (timesteps, learning rate, clipping, batch/minibatch, BPTT horizon, optimizer choice) merged with any unspecified defaults from `pufferlib/config/default.ini`.
6773
- **[eval]**: WOSAC/human-replay switches and sizing (`eval.wosac_*`, `eval.human_replay_*`) mapped directly to the `Drive` kwargs in evaluation subprocesses.
6874

6975
## Model overview
76+
7077
Defined in `pufferlib/ocean/torch.py:Drive`:
78+
7179
- Three MLP encoders (ego, partners, roads) with LayerNorm. Partner and road encodings are max-pooled across instances.
7280
- Concatenated embedding → GELU → linear to `hidden_size`, then split into actor/value heads.
7381
- Discrete actions are emitted as logits per dimension (`MultiDiscrete`), continuous actions as Gaussian parameters (`softplus` std). Value head is a single linear output.
7482
- `Recurrent = pufferlib.models.LSTMWrapper` can wrap the policy using the `rnn` config entries; otherwise the policy is feed-forward.
7583

7684
## Drive source files (what lives where)
85+
7786
- `pufferlib/ocean/drive/drive.py`: Python Gymnasium-style wrapper that sets up buffers, validates map availability, seeds the C core via `binding.env_init`, and handles map resampling.
7887
- `pufferlib/ocean/drive/drive.h`: Main C implementation of stepping, observations, rewards/metrics, grid map, lane graph, and collision checking.
7988
- `pufferlib/ocean/drive/binding.c`: Python C-extension glue that exposes `Drive` to Python, handles shared buffer setup, logging, and reading the `.ini` config.
@@ -89,24 +98,25 @@ Defined in `pufferlib/ocean/torch.py:Drive`:
8998

9099
Determines which agents are **created** in the environment.
91100

92-
| Option | Description |
93-
| --- | --- |
94-
| `create_all_valid` | Create all entities valid at initialization (`traj_valid[init_steps] == 1`). |
95-
| `create_only_controlled` | Create only those agents that are controlled by the policy. |
101+
| Option | Description |
102+
| -------------------------- | ------------------------------------------------------------------------------ |
103+
| `create_all_valid` | Create all entities valid at initialization (`traj_valid[init_steps] == 1`). |
104+
| `create_only_controlled` | Create only those agents that are controlled by the policy. |
96105

97106
#### `control_mode`
98107

99108
Determines which created agents are **controlled** by the policy.
100109

101-
| Option | Description |
102-
| --- | --- |
103-
| `control_vehicles` (default) | Control only valid **vehicles** (not experts, beyond `MIN_DISTANCE_TO_GOAL`, under `MAX_AGENTS`). |
104-
| `control_agents` | Control all valid **agent types** (vehicles, cyclists, pedestrians). |
105-
| `control_tracks_to_predict` *(WOMD only)* | Control agents listed in the `tracks_to_predict` metadata. |
110+
| Option | Description |
111+
| --------------------------------------------- | ---------------------------------------------------------------------------------------------------------- |
112+
| `control_vehicles` (default) | Control only valid**vehicles** (not experts, beyond `MIN_DISTANCE_TO_GOAL`, under `MAX_AGENTS`). |
113+
| `control_agents` | Control all valid**agent types** (vehicles, cyclists, pedestrians). |
114+
| `control_tracks_to_predict` *(WOMD only)* | Control agents listed in the `tracks_to_predict` metadata. |
115+
| `control_sdc_only` *(WOMD only)* | Control just the self-driving car (SDC). |
106116

107117
### Termination conditions (`done`)
108118

109-
Episodes are never truncated before reaching `episode_len`. The `goal_behavior` argument controls agent behavior after reaching a goal early:
119+
The `goal_behavior` argument controls agent behavior after reaching a goal early:
110120

111121
- **`goal_behavior=0` (default):** Agents respawn at their initial position after reaching their goal (last valid log position).
112122
- **`goal_behavior=1`:** Agents receive new goals indefinitely after reaching each goal.

docs/train.md

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
## Training
2+
3+
### Basic training
4+
5+
Launch a training run with Weights & Biases logging:
6+
```bash
7+
puffer train puffer_drive --wandb --wandb-project "pufferdrive"
8+
```
9+
10+
### Environment configurations
11+
12+
**Default configuration (Waymo maps)**
13+
14+
The default settings in `drive.ini` are optimized for:
15+
16+
- Training in thousands of Waymo maps
17+
- Short episodes (91 steps)
18+
19+
**Carla maps configuration**
20+
For training agents to drive indefinitely in larger Carla maps, we recommend modifying `drive.ini` as follows:
21+
```ini
22+
[env]
23+
goal_speed = 30.0 # Target speed in m/s at the goal. Lower values discourage excessive speeding
24+
goal_behavior = 1 # 0: respawn, 1: generate_new_goals, 2: stop
25+
goal_target_distance = 25.0 # Distance to new goal when using generate_new_goals
26+
27+
# Episode settings
28+
episode_length = 200 # Increase for longer episode horizon
29+
resample_frequency = 100000 # No resampling needed (there are only a few Carla maps)
30+
termination_mode = 0 # 0: terminate at episode_length, 1: terminate after all agents reset
31+
32+
# Map settings
33+
map_dir = "resources/drive/binaries/carla"
34+
num_maps = 1
35+
```
36+
37+
**Note:** The default training hyperparameters work well for both configurations and typically don't need adjustment.
38+
39+
40+
## Controlled experiments
41+
42+
Run parameter sweeps for architecture search or multi-seed experiments:
43+
```bash
44+
puffer controlled_exp puffer_drive --wandb --wandb-project "pufferdrive2.0_carla" --tag speed
45+
```
46+
47+
Define parameter sweeps in `drive.ini`:
48+
```ini
49+
[controlled_exp.env.goal_speed]
50+
values = [10, 20, 30]
51+
```
52+
53+
This will launch separate training runs for each value in the list, useful for:
54+
- Hyperparameter tuning
55+
- Architecture search
56+
- Running multiple random seeds
57+
- Ablation studies
58+
59+
You can specify multiple controlled experiment parameters, and the system will iterate through all combinations.

mkdocs.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@ nav:
2727
- Get started: index.md
2828
- Docs:
2929
- Getting started: getting-started.md
30+
- Training agents: train.md
3031
- Simulator: simulator.md
3132
- Interactive scenario editor: scene-editor.md
3233
- Visualizer: visualizer.md

pufferlib/config/ocean/drive.ini

Lines changed: 11 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -25,23 +25,27 @@ action_type = discrete
2525
; Options: classic, jerk
2626
dynamics_model = classic
2727
reward_vehicle_collision = -0.5
28-
reward_offroad_collision = -0.5 # Use -0.05 for carla maps
29-
reward_ade = 0.0
28+
reward_offroad_collision = -0.5
3029
dt = 0.1
3130
reward_goal = 1.0
3231
reward_goal_post_respawn = 0.25
3332
; Meters around goal to be considered "reached"
3433
goal_radius = 2.0
34+
; Max target speed in m/s for the agent to maintain towards the goal
35+
goal_speed = 100.0
3536
; What to do when the goal is reached. Options: 0:"respawn", 1:"generate_new_goals", 2:"stop"
3637
goal_behavior = 0
38+
; Determines the target distance to the new goal in the case of goal_behavior = generate_new_goals.
39+
; Large numbers will select a goal point further away from the agent's current position.
40+
goal_target_distance = 25.0
3741
; Options: 0 - Ignore, 1 - Stop, 2 - Remove
3842
collision_behavior = 0
3943
; Options: 0 - Ignore, 1 - Stop, 2 - Remove
4044
offroad_behavior = 0
41-
; Number of steps before reset
42-
scenario_length = 91
45+
; Number of steps before
46+
episode_length = 91
4347
resample_frequency = 910
44-
termination_mode = 1 # 0 - terminate at scenario_length, 1 - terminate after all agents have been reset
48+
termination_mode = 1 # 0 - terminate at episode_length, 1 - terminate after all agents have been reset
4549
map_dir = "resources/drive/binaries/training"
4650
num_maps = 10000
4751
; Determines which step of the trajectory to initialize the agents at upon reset
@@ -85,7 +89,7 @@ render_interval = 1000
8589
; If True, show exactly what the agent sees in agent observation
8690
obs_only = True
8791
; Show grid lines
88-
show_grid = False
92+
show_grid = True
8993
; Draws lines from ego agent observed ORUs and road elements to show detection range
9094
show_lasers = False
9195
; Display human xy logs in the background
@@ -136,7 +140,7 @@ scale = auto
136140
distribution = log_normal
137141
min = 0.001
138142
mean = 0.005
139-
max = 0.01
143+
max = 0.03
140144
scale = auto
141145

142146
[sweep.train.gamma]

pufferlib/ocean/benchmark/metrics.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -249,7 +249,7 @@ def compute_interaction_features(
249249
scenario_mask = torch.as_tensor(scenario_mask_np, dtype=torch.bool, device=x_t.device)
250250
scenario_x = x_t[scenario_mask]
251251
scenario_y = y_t[scenario_mask]
252-
scenario_length = length_broadcast[scenario_mask]
252+
episode_length = length_broadcast[scenario_mask]
253253
scenario_width = width_broadcast[scenario_mask]
254254
scenario_heading = heading_t[scenario_mask]
255255
scenario_valid = valid_t[scenario_mask]
@@ -260,7 +260,7 @@ def compute_interaction_features(
260260
distances_to_objects = interaction_features.compute_distance_to_nearest_object(
261261
center_x=scenario_x,
262262
center_y=scenario_y,
263-
length=scenario_length,
263+
length=episode_length,
264264
width=scenario_width,
265265
heading=scenario_heading,
266266
valid=scenario_valid,
@@ -273,7 +273,7 @@ def compute_interaction_features(
273273
times_to_collision = interaction_features.compute_time_to_collision(
274274
center_x=scenario_x,
275275
center_y=scenario_y,
276-
length=scenario_length,
276+
length=episode_length,
277277
width=scenario_width,
278278
heading=scenario_heading,
279279
valid=scenario_valid,

pufferlib/ocean/drive/binding.c

Lines changed: 14 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,7 @@ static PyObject *my_shared(PyObject *self, PyObject *args, PyObject *kwargs) {
7575
int control_mode = unpack(kwargs, "control_mode");
7676
int init_steps = unpack(kwargs, "init_steps");
7777
int goal_behavior = unpack(kwargs, "goal_behavior");
78+
float goal_target_distance = unpack(kwargs, "goal_target_distance");
7879
int use_all_maps = unpack(kwargs, "use_all_maps");
7980
clock_gettime(CLOCK_REALTIME, &ts);
8081
srand(ts.tv_nsec);
@@ -94,6 +95,7 @@ static PyObject *my_shared(PyObject *self, PyObject *args, PyObject *kwargs) {
9495
env->control_mode = control_mode;
9596
env->init_steps = init_steps;
9697
env->goal_behavior = goal_behavior;
98+
env->goal_target_distance = goal_target_distance;
9799
snprintf(map_file, sizeof(map_file), "%s/map_%03d.bin", map_dir, map_id);
98100
env->entities = load_map_binary(map_file, env);
99101
set_active_agents(env);
@@ -175,11 +177,11 @@ static int my_init(Env *env, PyObject *args, PyObject *kwargs) {
175177
if (ini_parse(env->ini_file, handler, &conf) < 0) {
176178
printf("Error while loading %s", env->ini_file);
177179
}
178-
if (kwargs && PyDict_GetItemString(kwargs, "scenario_length")) {
179-
conf.scenario_length = (int)unpack(kwargs, "scenario_length");
180+
if (kwargs && PyDict_GetItemString(kwargs, "episode_length")) {
181+
conf.episode_length = (int)unpack(kwargs, "episode_length");
180182
}
181-
if (conf.scenario_length <= 0) {
182-
PyErr_SetString(PyExc_ValueError, "scenario_length must be > 0 (set in INI or kwargs)");
183+
if (conf.episode_length <= 0) {
184+
PyErr_SetString(PyExc_ValueError, "episode_length must be > 0 (set in INI or kwargs)");
183185
return -1;
184186
}
185187
env->action_type = conf.action_type;
@@ -188,8 +190,7 @@ static int my_init(Env *env, PyObject *args, PyObject *kwargs) {
188190
env->reward_offroad_collision = conf.reward_offroad_collision;
189191
env->reward_goal = conf.reward_goal;
190192
env->reward_goal_post_respawn = conf.reward_goal_post_respawn;
191-
env->reward_ade = conf.reward_ade;
192-
env->scenario_length = conf.scenario_length;
193+
env->episode_length = conf.episode_length;
193194
env->termination_mode = conf.termination_mode;
194195
env->collision_behavior = conf.collision_behavior;
195196
env->offroad_behavior = conf.offroad_behavior;
@@ -198,7 +199,9 @@ static int my_init(Env *env, PyObject *args, PyObject *kwargs) {
198199
env->init_mode = (int)unpack(kwargs, "init_mode");
199200
env->control_mode = (int)unpack(kwargs, "control_mode");
200201
env->goal_behavior = (int)unpack(kwargs, "goal_behavior");
202+
env->goal_target_distance = (float)unpack(kwargs, "goal_target_distance");
201203
env->goal_radius = (float)unpack(kwargs, "goal_radius");
204+
env->goal_speed = (float)unpack(kwargs, "goal_speed");
202205
char *map_dir = unpack_str(kwargs, "map_dir");
203206
int map_id = unpack(kwargs, "map_id");
204207
int max_agents = unpack(kwargs, "max_agents");
@@ -223,8 +226,11 @@ static int my_log(PyObject *dict, Log *log) {
223226
assign_to_dict(dict, "dnf_rate", log->dnf_rate);
224227
assign_to_dict(dict, "completion_rate", log->completion_rate);
225228
assign_to_dict(dict, "lane_alignment_rate", log->lane_alignment_rate);
226-
assign_to_dict(dict, "avg_offroad_per_agent", log->avg_offroad_per_agent);
227-
assign_to_dict(dict, "avg_collisions_per_agent", log->avg_collisions_per_agent);
229+
assign_to_dict(dict, "offroad_per_agent", log->offroad_per_agent);
230+
assign_to_dict(dict, "collisions_per_agent", log->collisions_per_agent);
231+
assign_to_dict(dict, "goals_sampled_this_episode", log->goals_sampled_this_episode);
232+
assign_to_dict(dict, "goals_reached_this_episode", log->goals_reached_this_episode);
233+
assign_to_dict(dict, "speed_at_goal", log->speed_at_goal);
228234
// assign_to_dict(dict, "avg_displacement_error", log->avg_displacement_error);
229235
return 0;
230236
}

0 commit comments

Comments
 (0)