Merge branch 'wuzehuan/MinorFixOnDocAndConfig' into 'main'

wuzehuan · wuzehuan · commit cd4e3c71baa7 · 2025-01-15T09:45:04.000+08:00
Update README.

See merge request vc-research/driving-world-models!40
diff --git a/README.md b/README.md
@@ -2,6 +2,10 @@
 
 [[中文简介](README_intro_zh.md)]
 
+https://github.com/user-attachments/assets/e73d22b1-e856-4bee-8bc3-a1e9755e8c25
+
+[Video link](https://youtu.be/j9RRj-xzOA4)
+
 Welcome to the OpenDWM project! This is an open-source initiative, focusing on autonomous driving video generation. Our mission is to provide a high-quality, controllable tool for generating autonomous driving videos using the latest technology. We aim to build a codebase that is both user-friendly and highly reusable, and hope to continuously improve the project through the collective wisdom of the community.
 
 The driving world models generate multi-view images or videos of autonomous driving scenes based on text and road environment layout conditions. Whether it's the environment, weather conditions, vehicle type, or driving path, you can adjust them according to your needs.
@@ -16,10 +20,6 @@ The highlights are as follows:
 
 Furthermore, our code modules are designed with high reusability in mind, for easy application in other projects.
 
-https://github.com/user-attachments/assets/e73d22b1-e856-4bee-8bc3-a1e9755e8c25
-
-[Video link](https://youtu.be/j9RRj-xzOA4)
-
 Currently, the project has implemented the following papers:
 
 > [UniMLVG: Unified Framework for Multi-view Long Video Generation with Comprehensive Control Capabilities for Autonomous Driving](https://sensetime-fvg.github.io/UniMLVG)<br>
@@ -62,6 +62,7 @@ Our cross-view temporal SD (CTSD) pipeline support loading the pretrained SD 2.1
 | Base model | Text conditioned <br/> driving generation | Text and layout (box, map) <br/> conditioned driving generation |
 | :-: | :-: | :-: |
 | [SD 2.1](https://huggingface.co/stabilityai/stable-diffusion-2-1) | [Config](configs/ctsd/multi_datasets/ctsd_21_tirda_nwao.json), [Download](http://103.237.29.236:10030/ctsd_21_tirda_nwao_30k.pth) | [Config](configs/ctsd/multi_datasets/ctsd_21_tirda_bm_nwa.json), [Download](http://103.237.29.236:10030/ctsd_21_tirda_bm_nwa_30k.pth) |
+| [SD 3.0](https://huggingface.co/stabilityai/stable-diffusion-3-medium-diffusers) | | [UniMLVG Config](configs/ctsd/unimlvg/unimlvg_stage3_tirda_nwa.json), Released by 2025-2-1 |
 | [SD 3.5](https://huggingface.co/stabilityai/stable-diffusion-3.5-medium) | [Config](configs/ctsd/multi_datasets/ctsd_35_tirda_nwao.json), [Download](http://103.237.29.236:10030/ctsd_35_tirda_nwao_20k.pth) | [Config](configs/ctsd/multi_datasets/ctsd_35_tirda_bm_nwa.json), Released by 2025-2-1 |
 
 ## Examples
diff --git a/README_intro_zh.md b/README_intro_zh.md
@@ -2,6 +2,10 @@
 
 [[English README](README.md)]
 
+https://github.com/user-attachments/assets/e73d22b1-e856-4bee-8bc3-a1e9755e8c25
+
+[视频链接](https://youtu.be/j9RRj-xzOA4)
+
 欢迎来到 OpenDWM 项目！这是一个专注于自动驾驶视频生成的开源项目。我们的使命是提供一个高质量、可控的、使用最新技术的自动驾驶视频生成工具。我们的目标是构建一个既用户友好，又高度可复用的代码库，并希望通过聚集社区智慧，不断改进。
 
 驾驶世界模型根据文本和道路环境布局条件，生成自动驾驶场景的多视角图像或视频。无论是环境、天气条件、车辆类型，还是驾驶路径，你都可以根据需求来调整。
@@ -16,10 +20,6 @@
 
 此外，我们设计的代码模块考虑到了相当程度的可复用性，以便于在其他项目中应用。
 
-https://github.com/user-attachments/assets/e73d22b1-e856-4bee-8bc3-a1e9755e8c25
-
-[视频链接](https://youtu.be/j9RRj-xzOA4)
-
 截止现在，本项目实现了以下论文中的技巧：
 
 > [UniMLVG: Unified Framework for Multi-view Long Video Generation with Comprehensive Control Capabilities for Autonomous Driving](https://sensetime-fvg.github.io/UniMLVG)<br>