Skip to content

Commit eab42d6

Browse files
authored
Update documents, pipelines, fix bugs: (#42)
1. CTSD pipeline supports action and pytorch distributed checkpoint (warning: incompatible with the optimizer checkpoint of previous version). 2. Update LiDAR VQVAE and Maskgit pipelines for temporal and auto-regressive generation. 3. Release DFoT on CTSD 3.5 config and checkpoint. 4. Release KITTI-360 included LiDAR VQVAE and Maskgit config and checkpoint. 5. Add scripts to make blank code (LiDAR VQVAE), make carla camera parameters (interactive generation), Carla control from steering (interactive generation) 6. Fix export script as nuScenes data. 7. Other minor fixes about the models, datasets, metrics.
1 parent 2989b3f commit eab42d6

38 files changed

Lines changed: 6601 additions & 412 deletions

README.md

Lines changed: 51 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -10,11 +10,13 @@ The driving world models generate multi-view images or videos of autonomous driv
1010

1111
The highlights are as follows:
1212

13-
1. **Significant improvement in the environmental diversity.** Through the use of multiple datasets, the model's generalization ability has been enhanced like never before. Take the example of a generation task controlled by layout conditions, such as a snowy city street or a lakeside highway with distant snow mountains, these scenarios are impossible tasks for generative models trained with a single dataset.
13+
1. **Transparent and reproducable training.** We provide complete training codes and configurations, allowing everyone to reproduce experiments, fine-tune on their own data, and customize development features as needed.
1414

15-
2. **Greatly improved generation quality.** Support for popular model architectures (SD 2.1, 3.5) enables more convenient utilization of the advanced pre-training generation capabilities within the community. Various training techniques, including multitasking and self-supervision, allow the model to utilize the information in autonomous driving video data more effectively.
15+
2. **Significant improvement in the environmental diversity.** Through the use of multiple datasets, the model's generalization ability has been enhanced like never before. Take the example of a generation task controlled by layout conditions, such as a snowy city street or a lakeside highway with distant snow mountains, these scenarios are impossible tasks for generative models trained with a single dataset.
1616

17-
3. **Convenient evaluation.** Evaluation follows the popular framework `torchmetrics`, which is easy to configure, develop, and integrate into the pipeline. Public configurations (such as FID, FVD on the nuScenes validation set) are provided to align other research works.
17+
3. **Greatly improved generation quality.** Support for popular model architectures (SD 2.1, 3.5) enables more convenient utilization of the advanced pre-training generation capabilities within the community. Various training techniques, including multitasking and self-supervision, allow the model to utilize the information in autonomous driving video data more effectively.
18+
19+
4. **Convenient evaluation.** Evaluation follows the popular framework `torchmetrics`, which is easy to configure, develop, and integrate into the pipeline. Public configurations (such as FID, FVD on the nuScenes validation set) are provided to align other research works.
1820

1921
Furthermore, our code modules are designed with high reusability in mind, for easy application in other projects.
2022

@@ -30,6 +32,7 @@ Currently, the project has implemented the following papers:
3032
3133
## News
3234

35+
* [2025/4/23] Update the [LiDAR VQVAE (including KITTI-360), LiDAR generation models](#lidar-models), and release the [DFoT on CTSD 3.5 model](#video-models).
3336
* [2025/3/17] Experimental release the [Interactive Generation with Carla](docs/InteractiveGeneration.md)
3437
* [2025/3/7] Release the [LiDAR Generation](#lidar-models)
3538
* [2025/3/4] Release the [CTSD 3.5 with layout condition](#video-models)
@@ -72,16 +75,22 @@ Our cross-view temporal SD (CTSD) pipeline support loading the pretrained SD 2.1
7275
| [SD 2.1](https://huggingface.co/stabilityai/stable-diffusion-2-1) | [Config](configs/ctsd/multi_datasets/ctsd_21_tirda_nwao.json), [Download](http://103.237.29.236:10030/ctsd_21_tirda_nwao_30k.pth) | [Config](configs/ctsd/multi_datasets/ctsd_21_tirda_bm_nwa.json), [Download](http://103.237.29.236:10030/ctsd_21_tirda_bm_nwa_30k.pth) |
7376
| [SD 3.0](https://huggingface.co/stabilityai/stable-diffusion-3-medium-diffusers) | | [UniMLVG Config](configs/ctsd/unimlvg/ctsd_unimlvg_stage3_tirda_bm_nwa.json), [Download](http://103.237.29.236:10030/ctsd_unimlvg_tirda_bm_nwa_60k.pth) |
7477
| [SD 3.5](https://huggingface.co/stabilityai/stable-diffusion-3.5-medium) | [Config](configs/ctsd/multi_datasets/ctsd_35_tirda_nwao.json), [Download](http://103.237.29.236:10030/ctsd_35_tirda_nwao_20k.pth) | [Config](configs/ctsd/multi_datasets/ctsd_35_tirda_bm_nwao.json), [Download](http://103.237.29.236:10030/ctsd_35_tirda_bm_nwao_40k.pth) |
78+
| [DFoT](https://arxiv.org/abs/2502.06764) on [SD 3.5](https://huggingface.co/stabilityai/stable-diffusion-3.5-medium) | | [Config](configs/ctsd/multi_datasets/ctsd_35_df16_tirda_bm_nwao.json), [Download](http://103.237.29.236:10030/ctsd_35_df16_tirda_bm_nwao_40k.pth) |
79+
80+
The FVD evaluation results for all downloadable models can be found at the bottom of the corresponding configuration files.
7581

7682
### LiDAR Models
7783

7884
You can download our pre-trained tokenzier and generation model in the following link.
7985

80-
| Model Architecture | Configs | Checkpoint Download |
81-
| :-: | :-: | :-: |
82-
| VQVAE | [Config](configs/lidar/lidar_vqvae_nwa.json) | [checkpoint](http://103.237.29.236:10030/lidar_vqvae_nwa_60k.pth), [blank code ](http://103.237.29.236:10030/lidar_vqvae_nwa_60k_blank_code.pkl) |
83-
| MaskGIT | [Config](configs/lidar/lidar_maskgit_layout_ns.json)| [checkpoint](http://103.237.29.236:10030/lidar_maskgit_nusc_150k.pth) |
84-
| Temporal MaskGIT | | |
86+
| Model Architecture | Dataset | Configs | Checkpoint Download |
87+
| :-: | :-: | :-: | :-: |
88+
| VQVAE | nuscene, waymo, argoverse | [Config](configs/lidar/lidar_vqvae_nwa.json) | [checkpoint](http://103.237.29.236:10030/lidar_vqvae_nwa_60k.pth), [blank code ](http://103.237.29.236:10030/lidar_vqvae_nwa_60k_blank_code.pkl) |
89+
| | nuscene, waymo, argoverse, kitti360 | [Config](configs/lidar/lidar_vqvae_nwak.json) | [checkpoint](http://103.237.29.236:10030/lidar_vqvae_nwak_80k.pth), [blank code](http://103.237.29.236:10030/lidar_vqvae_nwak_80k_blank_code.pkl) |
90+
| MaskGIT | nuscene | [Config](configs/lidar/lidar_maskgit_layout_ns.json) | [ckpt_with_vqvae_nwa](http://103.237.29.236:10030/lidar_maskgit_nusc_150k.pth) <br> [ckpt_with_vqvae_nwak](http://103.237.29.236:10030/lidar_maskgit_vq80k_layout_ns_120k.pth) |
91+
| | kitti360 | [Config](configs/lidar/lidar_maskgit_vq80k_layout_kt.json) | [checkpoint](http://103.237.29.236:10030/lidar_maskgit_vq80k_layout_kt_120k.pth)|
92+
| Temporal MaskGIT | nuscene | [Config](configs/lidar/lidar_maskgit_temporal_vq80k_layout_ns.json) | checkpoint(TODO) |
93+
| | kitti360 | [Config](configs/lidar/lidar_maskgit_temporal_vq80k_layout_kt.json) | checkpoint(TODO)|
8594
## Examples
8695

8796
### T2I, T2V generation with CTSD pipeline
@@ -106,13 +115,20 @@ PYTHONPATH=src python src/dwm/preview.py -c examples/ctsd_35_6views_video_genera
106115

107116
1. Download LiDAR VQVAE and LiDAR MaskGIT generation model checkpoint.
108117
2. Prepare the dataset ( [nuscenes_scene-0627_lidar_package.zip](http://103.237.29.236:10030/nuscenes_scene-0627_lidar_package.zip) ).
109-
3. Modify the values of `json_file`, `vq_point_cloud_ckpt_path`, `vq_blank_code_path` and `model_ckpt_path` to the paths of your dataset and checkpoints in the json file `examples/lidar_maskgit_preview.json` .
110-
4. Run the following command to visualize the LiDAR of the validation set and save the generated point cloud as `.bin` file.
118+
3. Modify the values of `json_file`, `vq_point_cloud_ckpt_path`, `vq_blank_code_path` and `model_ckpt_path` to the paths of your dataset and checkpoints in the json file `examples/lidar_maskgit_preview.json` or `examples/lidar_maskgit_temporal_preview.json` .
119+
4. For single-frame lidar generation, run the following command to visualize the LiDAR of the validation set and save the generated point cloud as `.bin` file.
111120

112121
```bash
113-
PYTHONPATH=src python src/dwm/preview.py -c examples/lidar_maskgit_preview.json -o output/test
122+
PYTHONPATH=src python src/dwm/preview.py -c examples/lidar_maskgit_preview.json -o output/single_frame_maskgit
114123
```
115124

125+
5. For lidar sequence generation, `enable_autoregressive_inference` flag is enabled in the config file to support autoregressive generation. If you would like to use ground truth data as reference frames, set `use_ground_truth_as_reference` as `true`. Alternatively, you can set it as `false` for generation from layout condition only. After setting up the config file, run the following command
126+
127+
```bash
128+
PYTHONPATH=src python3 -m torch.distributed.run --nnodes 1 --nproc-per-node 2 --node-rank 0 --master-addr 127.0.0.1 --master-port 29000 src/dwm/preview.py -c examples/lidar_maskgit_temporal_preview.json -o output/temporal_maskgit
129+
```
130+
131+
116132
## Train
117133

118134
Preparation:
@@ -165,3 +181,27 @@ Or distributed evaluation by `torch.distributed.run`, similar to the distributed
165181
* `tools` provides dataset and file processing scripts for faster initialization and reading.
166182

167183
Introduction about the [file system](src/dwm/fs/README.md), and [dataset](src/dwm/datasets/README.md).
184+
185+
## Citation
186+
If you find our OpenDWM useful in your research or refer to the provided baseline results, please star :star: this repository and consider citing our repo or papers :pencil::
187+
```
188+
@misc{opendwm,
189+
Year = {2025},
190+
Note = {https://github.com/SenseTime-FVG/OpenDWM},
191+
Title = {OpenDWM: Open Driving World Models}
192+
}
193+
194+
@article{chen2024unimlvg,
195+
title={UniMLVG: Unified Framework for Multi-view Long Video Generation with Comprehensive Control Capabilities for Autonomous Driving},
196+
author={Chen, Rui and Wu, Zehuan and Liu, Yichen and Guo, Yuxin and Ni, Jingcheng and Xia, Haifeng and Xia, Siyu},
197+
journal={arXiv preprint arXiv:2412.04842},
198+
year={2024}
199+
}
200+
201+
@article{ni2025maskgwm,
202+
title={MaskGWM: A Generalizable Driving World Model with Video Mask Reconstruction},
203+
author={Ni, Jingcheng and Guo, Yuxin and Liu, Yichen and Chen, Rui and Lu, Lewei and Wu, Zehuan},
204+
journal={arXiv preprint arXiv:2502.11663},
205+
year={2025}
206+
}
207+
```

README_intro_zh.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -12,11 +12,13 @@ https://github.com/user-attachments/assets/649d3b81-3b1f-44f9-9f51-4d1ed7756476
1212

1313
亮点如下:
1414

15-
1. **环境多样性的显著改进** 通过对多个数据集的使用,模型的泛化能力得到前所未有的提升。以布局条件控制生成任务为例,下雪的城市街道,远处有雪山的湖边高速路,这些场景对于仅使用单一数据集训练的生成模型都是不可能的任务
15+
1. **透明且可复现的训练** 我们提供完整的训练代码和配置,让大家可以根据需要进行实验复现、在自有数据上微调、定制开发功能
1616

17-
2. **大幅提升生成质量** 对于流行模型架构(SD 2.1, 3.5)的支持,可以更便捷地利用社区内先进的预训练生成能力。包括多任务、自监督在内的多种训练技巧,让模型更有效地利用视频数据里的信息
17+
2. **环境多样性的显著改进** 通过对多个数据集的使用,模型的泛化能力得到前所未有的提升。以布局条件控制生成任务为例,下雪的城市街道,远处有雪山的湖边高速路,这些场景对于仅使用单一数据集训练的生成模型都是不可能的任务
1818

19-
3. **方便测评。** 测评遵循流行框架 `torchmetrics`,易于配置、开发、并集成到已有管线。一些公开配置(例如在 nuScenes 验证集上的 FID, FVD)用于和其他研究工作对齐。
19+
3. **大幅提升生成质量。** 对于流行模型架构(SD 2.1, 3.5)的支持,可以更便捷地利用社区内先进的预训练生成能力。包括多任务、自监督在内的多种训练技巧,让模型更有效地利用视频数据里的信息。
20+
21+
4. **方便测评。** 测评遵循流行框架 `torchmetrics`,易于配置、开发、并集成到已有管线。一些公开配置(例如在 nuScenes 验证集上的 FID, FVD)用于和其他研究工作对齐。
2022

2123
此外,我们设计的代码模块考虑到了相当程度的可复用性,以便于在其他项目中应用。
2224

configs/README.md

Lines changed: 73 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,80 @@
22

33
The configuration files are in the JSON format. They include settings for the models, datasets, pipelines, or any arguments for the program.
44

5+
## Introduction
6+
7+
In our code, we mainly use JSON objects in three ways:
8+
9+
1. As a dictionary
10+
2. As a function's parameter list
11+
3. As a constructor and parameter for objects
12+
13+
### As a dictionary
14+
15+
The most common way for the config, for example:
16+
17+
```JSON
18+
{
19+
"guidance_scale": 4,
20+
"inference_steps": 40,
21+
"preview_image_size": [
22+
448,
23+
252
24+
]
25+
}
26+
```
27+
28+
The pipeline finds the corresponding value variable in the dictionary through the key, which determines the behavior at runtime.
29+
30+
### As a function's parameter list
31+
32+
The content of a JSON object is passed into a function, for example:
33+
34+
```JSON
35+
{
36+
"num_workers": 3,
37+
"prefetch_factor": 3,
38+
"persistent_workers": true
39+
}
40+
```
41+
42+
The PyTorch data loader will accept all the arguments by
43+
44+
```Python
45+
data_loader = torch.utils.data.DataLoader(
46+
dataset, **deserialized_json_object)
47+
```
48+
49+
In this case, you can fill in the required parameters according to the reference documentation of the function (such as the [data loader](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader) here).
50+
51+
### As a constructor and parameter for objects
52+
53+
The JSON object declares the name of the object to be created, as well as the parameters, for example:
54+
55+
```JSON
56+
{
57+
"_class_name": "torch.optim.AdamW",
58+
"lr": 6e-5,
59+
"betas": [
60+
0.9,
61+
0.975
62+
]
63+
}
64+
```
65+
66+
The "_class_name" is in the format of `{name_space}.{class_or_function_name}`, and other key-value pairs are used as parameters for the class constructor (e.g. [AdamW](https://pytorch.org/docs/stable/generated/torch.optim.AdamW.html#torch.optim.AdamW) here) or the function.
67+
68+
In the code, this type of object is parsed with `dwm.common.create_instance_from_config()` function.
69+
70+
With this design, the configuration, framework, and components are **loosely coupled**. For example, user can easily switch to a third-party optimizer "bitsandbytes.optim.Adam8bit" without editing the code. Developers can provide any component class (e.g. dataset, data transforms) without having to register to a specific framework.
71+
72+
## Development
73+
74+
### Name convention
75+
576
The configs in this folder are mainly about the pipelines and consumed by the `src/dwm/train.py`. So they are named in the format of `{pipeline_name}_{model_config}_{condition_config}_{data_config}.json`.
677

778
* Pipeline name: the python script name in the `src/dwm/pipelines`.
8-
* Model config: the most discriminative model arguments, such as `image`, `lidar`, `joint` for the holodrive models, or `spatial`, `crossview`, `temporal` for the SD models.
79+
* Model config: the most discriminative model arguments, such as `spatial`, `crossview`, `temporal` for the SD models.
980
* Condition config: the additional input for the model, such as `ts` for the "text description per scene", `ti` for the "text description per image", `b` for the box condition, `m` for the map condition.
10-
* Data config: `mini` for the debug purpose. Or combination of `nuscenes`, `argoverse`, `waymo`, `opendv`, for the data components. For some dataset, use `k` for "key frames", `a` for "all frames".
81+
* Data config: `mini` for the debug purpose. Combination of `nuscenes`, `argoverse`, `waymo`, `opendv` (or their initial letters), for the data components.

0 commit comments

Comments
 (0)