You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
1. CTSD pipeline supports action and pytorch distributed checkpoint (warning: incompatible with the optimizer checkpoint of previous version).
2. Update LiDAR VQVAE and Maskgit pipelines for temporal and auto-regressive generation.
3. Release DFoT on CTSD 3.5 config and checkpoint.
4. Release KITTI-360 included LiDAR VQVAE and Maskgit config and checkpoint.
5. Add scripts to make blank code (LiDAR VQVAE), make carla camera parameters (interactive generation), Carla control from steering (interactive generation)
6. Fix export script as nuScenes data.
7. Other minor fixes about the models, datasets, metrics.
Copy file name to clipboardExpand all lines: README.md
+51-11Lines changed: 51 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,11 +10,13 @@ The driving world models generate multi-view images or videos of autonomous driv
10
10
11
11
The highlights are as follows:
12
12
13
-
1.**Significant improvement in the environmental diversity.**Through the use of multiple datasets, the model's generalization ability has been enhanced like never before. Take the example of a generation task controlled by layout conditions, such as a snowy city street or a lakeside highway with distant snow mountains, these scenarios are impossible tasks for generative models trained with a single dataset.
13
+
1.**Transparent and reproducable training.**We provide complete training codes and configurations, allowing everyone to reproduce experiments, fine-tune on their own data, and customize development features as needed.
14
14
15
-
2.**Greatly improved generation quality.**Support for popular model architectures (SD 2.1, 3.5) enables more convenient utilization of the advanced pre-training generation capabilities within the community. Various training techniques, including multitasking and self-supervision, allow the model to utilize the information in autonomous driving video data more effectively.
15
+
2.**Significant improvement in the environmental diversity.**Through the use of multiple datasets, the model's generalization ability has been enhanced like never before. Take the example of a generation task controlled by layout conditions, such as a snowy city street or a lakeside highway with distant snow mountains, these scenarios are impossible tasks for generative models trained with a single dataset.
16
16
17
-
3.**Convenient evaluation.** Evaluation follows the popular framework `torchmetrics`, which is easy to configure, develop, and integrate into the pipeline. Public configurations (such as FID, FVD on the nuScenes validation set) are provided to align other research works.
17
+
3.**Greatly improved generation quality.** Support for popular model architectures (SD 2.1, 3.5) enables more convenient utilization of the advanced pre-training generation capabilities within the community. Various training techniques, including multitasking and self-supervision, allow the model to utilize the information in autonomous driving video data more effectively.
18
+
19
+
4.**Convenient evaluation.** Evaluation follows the popular framework `torchmetrics`, which is easy to configure, develop, and integrate into the pipeline. Public configurations (such as FID, FVD on the nuScenes validation set) are provided to align other research works.
18
20
19
21
Furthermore, our code modules are designed with high reusability in mind, for easy application in other projects.
20
22
@@ -30,6 +32,7 @@ Currently, the project has implemented the following papers:
30
32
31
33
## News
32
34
35
+
*[2025/4/23] Update the [LiDAR VQVAE (including KITTI-360), LiDAR generation models](#lidar-models), and release the [DFoT on CTSD 3.5 model](#video-models).
33
36
*[2025/3/17] Experimental release the [Interactive Generation with Carla](docs/InteractiveGeneration.md)
34
37
*[2025/3/7] Release the [LiDAR Generation](#lidar-models)
35
38
*[2025/3/4] Release the [CTSD 3.5 with layout condition](#video-models)
@@ -72,16 +75,22 @@ Our cross-view temporal SD (CTSD) pipeline support loading the pretrained SD 2.1
|[DFoT](https://arxiv.org/abs/2502.06764) on [SD 3.5](https://huggingface.co/stabilityai/stable-diffusion-3.5-medium)||[Config](configs/ctsd/multi_datasets/ctsd_35_df16_tirda_bm_nwao.json), [Download](http://103.237.29.236:10030/ctsd_35_df16_tirda_bm_nwao_40k.pth)|
79
+
80
+
The FVD evaluation results for all downloadable models can be found at the bottom of the corresponding configuration files.
75
81
76
82
### LiDAR Models
77
83
78
84
You can download our pre-trained tokenzier and generation model in the following link.
79
85
80
-
| Model Architecture | Configs | Checkpoint Download |
1. Download LiDAR VQVAE and LiDAR MaskGIT generation model checkpoint.
108
117
2. Prepare the dataset ( [nuscenes_scene-0627_lidar_package.zip](http://103.237.29.236:10030/nuscenes_scene-0627_lidar_package.zip) ).
109
-
3. Modify the values of `json_file`, `vq_point_cloud_ckpt_path`, `vq_blank_code_path` and `model_ckpt_path` to the paths of your dataset and checkpoints in the json file `examples/lidar_maskgit_preview.json` .
110
-
4.Run the following command to visualize the LiDAR of the validation set and save the generated point cloud as `.bin` file.
118
+
3. Modify the values of `json_file`, `vq_point_cloud_ckpt_path`, `vq_blank_code_path` and `model_ckpt_path` to the paths of your dataset and checkpoints in the json file `examples/lidar_maskgit_preview.json`or `examples/lidar_maskgit_temporal_preview.json`.
119
+
4.For single-frame lidar generation, run the following command to visualize the LiDAR of the validation set and save the generated point cloud as `.bin` file.
5. For lidar sequence generation, `enable_autoregressive_inference` flag is enabled in the config file to support autoregressive generation. If you would like to use ground truth data as reference frames, set `use_ground_truth_as_reference` as `true`. Alternatively, you can set it as `false` for generation from layout condition only. After setting up the config file, run the following command
@@ -165,3 +181,27 @@ Or distributed evaluation by `torch.distributed.run`, similar to the distributed
165
181
*`tools` provides dataset and file processing scripts for faster initialization and reading.
166
182
167
183
Introduction about the [file system](src/dwm/fs/README.md), and [dataset](src/dwm/datasets/README.md).
184
+
185
+
## Citation
186
+
If you find our OpenDWM useful in your research or refer to the provided baseline results, please star :star: this repository and consider citing our repo or papers :pencil::
Copy file name to clipboardExpand all lines: configs/README.md
+73-2Lines changed: 73 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,9 +2,80 @@
2
2
3
3
The configuration files are in the JSON format. They include settings for the models, datasets, pipelines, or any arguments for the program.
4
4
5
+
## Introduction
6
+
7
+
In our code, we mainly use JSON objects in three ways:
8
+
9
+
1. As a dictionary
10
+
2. As a function's parameter list
11
+
3. As a constructor and parameter for objects
12
+
13
+
### As a dictionary
14
+
15
+
The most common way for the config, for example:
16
+
17
+
```JSON
18
+
{
19
+
"guidance_scale": 4,
20
+
"inference_steps": 40,
21
+
"preview_image_size": [
22
+
448,
23
+
252
24
+
]
25
+
}
26
+
```
27
+
28
+
The pipeline finds the corresponding value variable in the dictionary through the key, which determines the behavior at runtime.
29
+
30
+
### As a function's parameter list
31
+
32
+
The content of a JSON object is passed into a function, for example:
33
+
34
+
```JSON
35
+
{
36
+
"num_workers": 3,
37
+
"prefetch_factor": 3,
38
+
"persistent_workers": true
39
+
}
40
+
```
41
+
42
+
The PyTorch data loader will accept all the arguments by
43
+
44
+
```Python
45
+
data_loader = torch.utils.data.DataLoader(
46
+
dataset, **deserialized_json_object)
47
+
```
48
+
49
+
In this case, you can fill in the required parameters according to the reference documentation of the function (such as the [data loader](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader) here).
50
+
51
+
### As a constructor and parameter for objects
52
+
53
+
The JSON object declares the name of the object to be created, as well as the parameters, for example:
54
+
55
+
```JSON
56
+
{
57
+
"_class_name": "torch.optim.AdamW",
58
+
"lr": 6e-5,
59
+
"betas": [
60
+
0.9,
61
+
0.975
62
+
]
63
+
}
64
+
```
65
+
66
+
The "_class_name" is in the format of `{name_space}.{class_or_function_name}`, and other key-value pairs are used as parameters for the class constructor (e.g. [AdamW](https://pytorch.org/docs/stable/generated/torch.optim.AdamW.html#torch.optim.AdamW) here) or the function.
67
+
68
+
In the code, this type of object is parsed with `dwm.common.create_instance_from_config()` function.
69
+
70
+
With this design, the configuration, framework, and components are **loosely coupled**. For example, user can easily switch to a third-party optimizer "bitsandbytes.optim.Adam8bit" without editing the code. Developers can provide any component class (e.g. dataset, data transforms) without having to register to a specific framework.
71
+
72
+
## Development
73
+
74
+
### Name convention
75
+
5
76
The configs in this folder are mainly about the pipelines and consumed by the `src/dwm/train.py`. So they are named in the format of `{pipeline_name}_{model_config}_{condition_config}_{data_config}.json`.
6
77
7
78
* Pipeline name: the python script name in the `src/dwm/pipelines`.
8
-
* Model config: the most discriminative model arguments, such as `image`, `lidar`, `joint` for the holodrive models, or `spatial`, `crossview`, `temporal` for the SD models.
79
+
* Model config: the most discriminative model arguments, such as `spatial`, `crossview`, `temporal` for the SD models.
9
80
* Condition config: the additional input for the model, such as `ts` for the "text description per scene", `ti` for the "text description per image", `b` for the box condition, `m` for the map condition.
10
-
* Data config: `mini` for the debug purpose. Or combination of `nuscenes`, `argoverse`, `waymo`, `opendv`, for the data components. For some dataset, use `k`for "key frames", `a` for "all frames".
81
+
* Data config: `mini` for the debug purpose. Combination of `nuscenes`, `argoverse`, `waymo`, `opendv` (or their initial letters), for the data components.
0 commit comments