Skip to content

Commit bbae8d2

Browse files
committed
update
1 parent 14452e6 commit bbae8d2

3 files changed

Lines changed: 21 additions & 17 deletions

File tree

figure/pipeline.png

406 KB
Loading

figure/teaser.png

744 KB
Loading

readme.md

Lines changed: 21 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,19 @@
1-
# <img src="figure/logo.png" alt="icon" width="50" height="50"> Code2World: A GUI World Model via Renderable Code Generation
2-
<p align="center">
3-
<a href="https://amap-ml.github.io/Code2World" target="_blank"><img src="https://img.shields.io/badge/Project-Page-brightgreen"></a>
4-
<a href="https://arxiv.org/abs/2602.09856" target="_blank"><img src="https://img.shields.io/badge/arXiv-2511.02778-red"></a>
5-
<a href="https://huggingface.co/GD-ML/Code2World" target="_blank"><img src="https://img.shields.io/badge/🤗%20Model-Code2World-ffd21e"></a>
6-
<a href="https://huggingface.co/datasets/GD-ML/AndroidCode" target="_blank"><img src="https://img.shields.io/badge/🤗%20Dataset-AndroidCode-ffd21e"></a>
7-
<!-- <a href="https://huggingface.co/papers/2602.09856" target="_blank"><img src="https://img.shields.io/badge/🤗%20Daily%20Paper-2602.09856-ffd21e"></a>/ -->
8-
</p>
9-
Official implementation for Code2World, a novel VLM-based GUI World Model that predicts dynamic transitions via renderable code generation.
1+
# <img src="figure/logo.png" alt="icon" width="24" height="24"> Code2World: A GUI World Model via Renderable Code Generation
2+
<div style='display:flex; gap: 0.25rem; '>
3+
<a href='LICENCE'><img src='https://img.shields.io/badge/License-Apache 2.0-g.svg'></a>
4+
<a href='https://arxiv.org/abs/2602.09856'><img src='https://img.shields.io/badge/Paper-PDF-red'></a>
5+
</div>
6+
This is the official repo for Code2World, a novel VLM-based GUI World Model that predicts dynamic transitions via renderable code generation.
7+
8+
9+
10+
11+
## 🎯 Overview
12+
Autonomous GUI agents interact with environments by perceiving interfaces and executing actions.As a virtual sandbox, GUI World model empowers agents with human-like foresight by enabling action-conditioned prediction.However, existing text- and pixel- based approaches struggle to simultaneously achieve high visual fidelity and fine-grained structural controllability.To this end, we propose **Code2World**, a vision-language coder that simulates next visual state via **renderable code generation**. Especially, to address the data scarcity problem, we construct **AndroidCode** by translating GUI trajectories into high-fidelity HTML and refining synthesized code through a visual-feedback revision loop, resulting in **over 80K** high-quality screen-action pairs. To adapt existing VLMs into code prediction, we first perform SFT as a cold start for format layout following, then further apply **Render-Aware Reinforcement Learning** which uses the final rendered outcome by enforcing visual semantic fidelity and action consistency. Extensive experiments demonstrate that Code2World-8B achieves the top-performing next UI prediction, rivaling the competitive GPT-5 and Gemini-3-Pro-Image. Notably, \textit{Code2World significantly enhances downstream navigation success rates in a flexible manner}, boosting Gemini-2.5-Flash by {+9.5\%} on AndroidWorld navigation.
13+
<!-- ![pipeline](figure/pipeline.jpg) -->
14+
![teaser](figure/teaser.png)
15+
_Figure 1. Illustration of Code2World. Given a current GUI observation and an action, Code2World predicts the next screenshot via renderable code generation._
16+
1017

1118
## 🕹️ Usage
1219
### Environment Setup
@@ -53,14 +60,11 @@ _Figure 5. Apply product filters by tapping the "Apply Filter" button in the e-c
5360
## 📑 Citation
5461
If you find our project useful, we hope you can star our repo and cite our paper as follows:
5562
```
56-
@misc{code2world,
57-
title={Code2World: A GUI World Model via Renderable Code Generation},
58-
author={Yuhao Zheng and Li'an Zhong and Yi Wang and Rui Dai and Kaikui Liu and Xiangxiang Chu and Linyuan Lv and Philip Torr and Kevin Qinghong Lin},
59-
year={2026},
60-
eprint={2602.09856},
61-
archivePrefix={arXiv},
62-
primaryClass={cs.CV},
63-
url={https://arxiv.org/abs/2602.09856},
63+
@article{zheng2026code2world,
64+
title={Code2World: A GUI World Model via Renderable Code Generation},
65+
author={Zheng, Yuhao and Zhong, Li'an and Wang, Yi and Dai, Rui and Liu, Kaikui and Chu, Xiangxiang and Lv, Linyuan and Torr, Philip and Lin, Kevin Qinghong},
66+
journal={arXiv preprint arXiv:2602.09856},
67+
year={2026}
6468
}
6569
```
6670

0 commit comments

Comments
 (0)