You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fix goal resampling in Carla maps and make metrics suitable for resampling in longer episodes (#186)
* Make road lines and lanes visible in map.
* Simplify goal resample algorithm: Pick best road lane point in road graph.
* Delete redundant code.
* Make the target distance to the new goal configurable.
* Generalize metrics to work for longer episodes with resampling. Also delete a bunch of unused graph topology code.
* Minor
* Apply precommit.
* Fix in visualizer.
* fix metrics
* WIP
* Add goal behavior flag.
* Add fallback for goal resampling and cleanup.
* Make goal radius more visible.
* Minor
* Make grid appear in the background.
* Minor.
* Merge
* Fix bug in logging num goals reached and sampled.
* Add goal taret
* Use classic dynamics model.
* Fix descrepancies between demo() and eval_gif().
* Small bug fix.
* Reward shaping
* Termination mode must be 0 for Carla maps.
* Add all args from ini to demo() env.
* Clean up visualization code.
* Clean up metrics and vis.
* Fix metrics.
* Add diversity to agent view.
* Add better fallback.
* Reserve red for cars that are in collision.
* Keep track of current goals.
* Carla testing simple/
* Use classic dynamics by default.
* Fix small bug in goal logging (respawn).
* Always draw agent obs when resampling goals.
* Increase render videos timeout (carla maps take longer).
* Minor vis changes.
* Minor vis changes.
* Rmv displacement error for now and add goal speed target.
* Add optional goal speed.
* Incorporate suggestions.
* Revert settings.
* Revert settings.
* Revert settings.
* Fixes
* Add docs
* Minor
* Make grid appear in background.
* Edits.
* Typo.
* Minor visual adaptations.
---------
Co-authored-by: Daphne <daphn3cor@gmail.com>
Co-authored-by: julianh65 <jhunt17159@gmail.com>
Copy file name to clipboardExpand all lines: docs/simulator.md
+25-15Lines changed: 25 additions & 15 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,8 +3,9 @@
3
3
Deep dive into how the Drive environment is wired, what it expects as inputs, and how observations/actions/configs are shaped. The environment entrypoint is `pufferlib/ocean/drive/drive.py`, which wraps the C core in `pufferlib/ocean/drive/drive.h` via `binding.c`.
4
4
5
5
## Runtime inputs and lifecycle
6
+
6
7
-**Map binaries**: The environment scans `resources/drive/binaries` for `map_*.bin` files and requires at least one to load. Keep `num_maps` no larger than what is present on disk. During vectorized setup, `binding.shared` samples maps until it accumulates at least `num_agents` controllable entities, skipping maps with no valid agents (`set_active_agents` in `drive.h`).
7
-
-**Episode length**: Default `scenario_length = 91` to match the Waymo logs (trajectory data is 91 steps), but you can set `env.scenario_length` (CLI or `.ini`) to any positive value. Metrics are logged and `c_reset` is called when `timestep == scenario_length`.
8
+
-**Episode length**: Default `episode_length = 91` to match the Waymo logs (trajectory data is 91 steps), but you can set `env.episode_length` (CLI or `.ini`) to any positive value. Metrics are logged and `c_reset` is called when `timestep = episode_length`.
8
9
-**Resampling maps**: Python-side `Drive.step` reinitializes the vectorized environments every `resample_frequency` steps (default `910`, ~10 episodes) with fresh map IDs and seeds.
9
10
-**Initialization controls**:
10
11
-`init_steps` starts agents from a later timestep in the logged trajectory.
@@ -15,6 +16,7 @@ Deep dive into how the Drive environment is wired, what it expects as inputs, an
15
16
See [Data](data.md) for how to produce the `.bin` inputs, including the binary layout.
16
17
17
18
## Actions and dynamics
19
+
18
20
-**Action types** (`env.action_type`):
19
21
-`discrete` (default): classic dynamics use a single `MultiDiscrete([7*13])` index decoded into acceleration (`ACCELERATION_VALUES`) and steering (`STEERING_VALUES`); jerk dynamics use `MultiDiscrete([4, 3])` over `JERK_LONG`/`JERK_LAT`.
20
22
-`continuous`: a 2-D Box in `[-1, 1]`. Classic scales to the max accel/steer magnitudes used in the discrete table. Jerk scales asymmetrically: negative values reach up to `-15 m/s^3` braking, positives up to `4 m/s^3` acceleration, lateral jerk up to `±4 m/s^3`.
@@ -23,6 +25,7 @@ See [Data](data.md) for how to produce the `.bin` inputs, including the binary l
23
25
-`jerk`: integrates longitudinal/lateral jerk into accel, then into velocity/pose with steering limited to `±0.55 rad`. Speeds are clipped to `[0, 20] m/s`.
24
26
25
27
## Observation space
28
+
26
29
Shape is `ego_features + 63 * 7 + 200 * 7` = `1848` for classic dynamics (`ego_features = 7`) or `1851` for jerk dynamics (`ego_features = 10`). Computed in `compute_observations` (`drive.h`):
- Longitudinal acceleration normalized to `[-15, 4]`
38
41
- Lateral acceleration normalized to `[-4, 4]`
39
42
- Respawn flag (index 9)
40
-
-**Partner blocks**: Up to 63 other agents (active first, then static experts) within 50 m. Each uses 7 values: relative (x, y) in ego frame scaled by `0.02`, width/length normalized as above, relative heading encoded as `(cos Δθ, sin Δθ)`, and speed / `MAX_SPEED`. Zero-padded when fewer neighbors are present or when agents are in respawn.
43
+
-**Partner blocks**: Up to `MAX_AGENTS-1` other agents (active first, then static experts) within 50 m. Each uses 7 values: relative (x, y) in ego frame scaled by `0.02`, width/length normalized as above, relative heading encoded as `(cos Δθ, sin Δθ)`, and speed / `MAX_SPEED`. Zero-padded when fewer neighbors are present or when agents are in respawn.
41
44
-**Road blocks**: Up to 200 nearby road segments pulled from a precomputed grid (`vision_range = 21`). Each entry stores relative midpoint (x, y) scaled by `0.02`, segment length / `MAX_ROAD_SEGMENT_LENGTH` (100 m), width / `MAX_ROAD_SCALE` (100), `(cos, sin)` of the segment direction in ego frame, and a type ID (`ROAD_LANE`..`DRIVEWAY` stored as `0..6`). Remaining slots are zero-padded.
42
45
43
46
## Rewards, termination, and metrics
47
+
44
48
-**Per-step rewards** (`c_step`):
45
49
- Collision with another actor: `reward_vehicle_collision` (default `-0.5`)
- Goal reached: `reward_goal` (default `1.0`) or `reward_goal_post_respawn` after a respawn
48
-
- Optional ADE shaping: `reward_ade * avg_displacement_error`, where ADE is accumulated in `compute_agent_metrics`
49
-
-**Termination**: No early truncation; episodes roll to `scenario_length` steps. If `goal_behavior` is respawn, `respawn_agent` resets the pose and marks `respawn_timestep` so the respawn flag shows up in observations.
52
+
53
+
-**Termination**: No early truncation; episodes roll to episode_length steps. If `goal_behavior` is respawn, `respawn_agent` resets the pose and marks `respawn_timestep` so the respawn flag shows up in observations.
50
54
-**Logged metrics** (`add_log` aggregates over all active agents across envs):
51
55
-`score`: reached goal without collision/off-road
52
56
-`collision_rate` / `offroad_rate`: fraction of agents with ≥1 event in the episode
`collision_behavior`, `offroad_behavior`, `reward_vehicle_collision_post_respawn`, and `spawn_immunity_timer` are parsed from the INI but currently unused in the stepping logic.
58
62
59
63
## Configuration files (`.ini`)
64
+
60
65
`pufferlib/config/default.ini` supplies global defaults. Environment-specific overrides live in `pufferlib/config/ocean/drive.ini` and are loaded first when you run `puffer train puffer_drive`; CLI flags (e.g., `--env.num-maps 128`) override both.
61
66
62
67
Key sections in `pufferlib/config/ocean/drive.ini`:
-**[vec]**: Vectorization sizing (`num_envs`, `num_workers`, `batch_size`; backend defaults to multiprocessing).
65
71
-**[policy]/[rnn]**: Model widths for the Torch policy (`input_size`, `hidden_size`) and optional LSTM wrapper.
66
72
-**[train]**: PPO-style hyperparameters (timesteps, learning rate, clipping, batch/minibatch, BPTT horizon, optimizer choice) merged with any unspecified defaults from `pufferlib/config/default.ini`.
67
73
-**[eval]**: WOSAC/human-replay switches and sizing (`eval.wosac_*`, `eval.human_replay_*`) mapped directly to the `Drive` kwargs in evaluation subprocesses.
68
74
69
75
## Model overview
76
+
70
77
Defined in `pufferlib/ocean/torch.py:Drive`:
78
+
71
79
- Three MLP encoders (ego, partners, roads) with LayerNorm. Partner and road encodings are max-pooled across instances.
72
80
- Concatenated embedding → GELU → linear to `hidden_size`, then split into actor/value heads.
73
81
- Discrete actions are emitted as logits per dimension (`MultiDiscrete`), continuous actions as Gaussian parameters (`softplus` std). Value head is a single linear output.
74
82
-`Recurrent = pufferlib.models.LSTMWrapper` can wrap the policy using the `rnn` config entries; otherwise the policy is feed-forward.
75
83
76
84
## Drive source files (what lives where)
85
+
77
86
-`pufferlib/ocean/drive/drive.py`: Python Gymnasium-style wrapper that sets up buffers, validates map availability, seeds the C core via `binding.env_init`, and handles map resampling.
78
87
-`pufferlib/ocean/drive/drive.h`: Main C implementation of stepping, observations, rewards/metrics, grid map, lane graph, and collision checking.
79
88
-`pufferlib/ocean/drive/binding.c`: Python C-extension glue that exposes `Drive` to Python, handles shared buffer setup, logging, and reading the `.ini` config.
@@ -89,24 +98,25 @@ Defined in `pufferlib/ocean/torch.py:Drive`:
89
98
90
99
Determines which agents are **created** in the environment.
91
100
92
-
| Option | Description |
93
-
| --- | --- |
94
-
|`create_all_valid`| Create all entities valid at initialization (`traj_valid[init_steps] == 1`). |
95
-
|`create_only_controlled`| Create only those agents that are controlled by the policy. |
0 commit comments