Skip to content

Commit 0146573

Browse files
committed
update
1 parent d5037d3 commit 0146573

4 files changed

Lines changed: 43 additions & 10 deletions

File tree

index.html

Lines changed: 43 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212
<meta name="viewport" content="width=device-width, initial-scale=1">
1313

1414
<title>Code2World: A GUI World Model via Renderable Code Generation</title>
15-
<link rel="icon" type="image/x-icon" href="static/images/favicon.ico">
15+
<link rel="icon" type="image/x-icon" href="static/images/logo.ico">
1616
<link href="https://fonts.googleapis.com/css?family=Google+Sans|Noto+Sans|Castoro" rel="stylesheet">
1717

1818
<link rel="stylesheet" href="static/css/bulma.min.css">
@@ -114,8 +114,8 @@ <h1 class="title is-spaced is-2" style="margin-bottom: 0.5rem;">A GUI World Mode
114114

115115
<span class="link-block">
116116
<a href="https://huggingface.co/GD-ML/Code2World" target="_blank" class="external-link button is-normal is-rounded is-dark">
117-
<span class="icon">
118-
<img src="static/images/modelscope.svg" alt="" width="18" height="18" />
117+
<span class="icon">
118+
🤗
119119
</span>
120120
<span>Model</span>
121121
</a>
@@ -139,17 +139,45 @@ <h2 class="title is-3">Abstract</h2>
139139
</p>
140140
</div>
141141
<div class="container is-max-desktop">
142-
<img src="static/images/framework.png" alt="Code2World Framework" style="max-width: 100%; height: auto;"/>
142+
<img src="static/images/teaser.png" alt="Code2World Framework" style="max-width: 100%; height: auto;"/>
143143
</div>
144144
<div class="content has-text-centered">
145-
<p><em>Figure 1. Illustration of the "Propose, Simulate, Select" pipeline for Code2World enhanced GUI agent, exemplified by an AndroidWorld task. <strong>(1) Propose</strong>: The GUI agent generates K candidate actions, with <strong>red</strong> and <strong>green</strong> highlighting hallucinated/irrational reasoning and logically sound reasoning, respectively. <strong>(2) Simulate</strong>: Code2World predicts the execution result of each candidate via renderable code generation. <strong>(3) Select</strong>: By evaluating the rendered future states, the system identifies the potential failure in the original policy and rectifies the decision, ultimately selecting the optimal action that aligns with the user's intent.</em></p>
145+
<p><em>Figure 1. <strong>Illustration of Code2World.</strong> Given a current GUI observation and an action, Code2World predicts the next screenshot via renderable code generation.</em></p>
146146
</div>
147147
</div>
148148
</div>
149149
</div>
150150
</section>
151151

152152

153+
<section class="section hero is-small tight-section">
154+
<div class="container is-max-desktop">
155+
<div class="columns is-centered has-text-centered">
156+
<div class="column is-four-fifths">
157+
<div class="box my-2">
158+
<h2 class="title is-3">Methodology</h2>
159+
160+
<div class="container is-max-desktop">
161+
<img src="static/images/pipeline.png" alt="Code2World pipeline" style="max-width: 100%; height: auto;"/>
162+
</div>
163+
<div class="content has-text-centered">
164+
<p><em>Figure 2. <strong>Left: Illustration of Data Synthesis.</strong> The high-fidelity <em>AndroidCode</em> dataset is curated via <em>constrainted initial synthesis</em> and a <em>visual-feedback revision loop</em>, where synthesized HTML is iteratively refined based on rendered visual discrepancies to ensure strict alignment (SigLIP score &gt; 0.9). <strong>Right: Two-stage Model Optimization.</strong> The pipeline progresses from an SFT cold start to <em>Render-Aware Reinforcement Learning (RARL)</em>. Utilizing Group Relative Policy Optimization (GRPO), the model optimizes dual rewards—visual semantic (R<sub>sem</sub>) and action consistency (R<sub>act</sub>)—derived directly from <em>rendered outcomes</em> to enforce structural and logical fidelity.</em></p>
165+
</div>
166+
167+
<div class="container is-max-desktop">
168+
<img src="static/images/framework.png" alt="Code2World Framework" style="max-width: 100%; height: auto;"/>
169+
</div>
170+
<div class="content has-text-centered">
171+
<p><em>Figure 3. Illustration of the "Propose, Simulate, Select" pipeline for Code2World enhanced GUI agent, exemplified by an AndroidWorld task. <strong>(1) Propose</strong>: The GUI agent generates K candidate actions, with <strong>red</strong> and <strong>green</strong> highlighting hallucinated/irrational reasoning and logically sound reasoning, respectively. <strong>(2) Simulate</strong>: Code2World predicts the execution result of each candidate via renderable code generation. <strong>(3) Select</strong>: By evaluating the rendered future states, the system identifies the potential failure in the original policy and rectifies the decision, ultimately selecting the optimal action that aligns with the user's intent.</em></p>
172+
</div>
173+
174+
175+
</div>
176+
</div>
177+
</div>
178+
</div>
179+
</section>
180+
153181
<section class="section hero is-small tight-section">
154182
<div class="container is-max-desktop">
155183
<div class="columns is-centered has-text-centered">
@@ -179,16 +207,16 @@ <h2 class="title is-3">Examples</h2>
179207
</div>
180208
<div class="container is-max-desktop">
181209
<img src="static/images/case1.png" alt="Case 1" style="max-width: 100%; height: auto; margin-bottom: 1rem;"/>
182-
<p><em>Figure 2. Launch the email application from the home screen to access the inbox.</em></p>
210+
<p><em>Figure 4. Launch the email application from the home screen to access the inbox.</em></p>
183211
<br>
184212
<img src="static/images/case2.png" alt="Case 2" style="max-width: 100%; height: auto; margin-bottom: 1rem;"/>
185-
<p><em>Figure 3. Click on "All News" button in the Cerebra Research application to view news content.</em></p>
213+
<p><em>Figure 5. Click on "All News" button in the Cerebra Research application to view news content.</em></p>
186214
<br>
187215
<img src="static/images/case3.png" alt="Case 3" style="max-width: 100%; height: auto; margin-bottom: 1rem;"/>
188-
<p><em>Figure 4. Mark a reminder task as completed by tapping the "Complete" button in the Reminder app.</em></p>
216+
<p><em>Figure 6. Mark a reminder task as completed by tapping the "Complete" button in the Reminder app.</em></p>
189217
<br>
190218
<img src="static/images/case4.png" alt="Case 4" style="max-width: 100%; height: auto; margin-bottom: 1rem;"/>
191-
<p><em>Figure 5. Apply product filters by tapping the "Apply Filter" button in the e-commerce app to refresh the item list.</em></p>
219+
<p><em>Figure 7. Apply product filters by tapping the "Apply Filter" button in the e-commerce app to refresh the item list.</em></p>
192220
</div>
193221
</div>
194222
</div>
@@ -203,7 +231,12 @@ <h2 class="title">Citation</h2>
203231
<p>If you find our project useful, we hope you can star our repo and cite our paper as follows:</p>
204232
<pre class="bibtex-block">
205233
<code>
206-
11111
234+
@article{zheng2026code2world,
235+
title={Code2World: A GUI World Model via Renderable Code Generation},
236+
author={Zheng, Yuhao and Zhong, Li'an and Wang, Yi and Dai, Rui and Liu, Kaikui and Chu, Xiangxiang and Lv, Linyuan and Torr, Philip and Lin, Kevin Qinghong},
237+
journal={arXiv preprint arXiv:2602.09856},
238+
year={2026}
239+
}
207240
</code></pre>
208241
</div>
209242
</div>

static/images/logo.ico

444 KB
Binary file not shown.

static/images/pipeline.png

406 KB
Loading

static/images/teaser.png

744 KB
Loading

0 commit comments

Comments
 (0)