You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: pages/Incanation/index.html
+10-13Lines changed: 10 additions & 13 deletions
Original file line number
Diff line number
Diff line change
@@ -8,14 +8,11 @@
8
8
<metaproperty="og:title" content="Incantation: Natural Language as the Action Interface for Multi-Entity Video World Models" />
9
9
<metaproperty="og:description" content="Natural language beats closed action IDs as the action interface: 89% vs. 43% cross-entity transfer, 90% vs. 0% out-of-vocabulary prompt control, and 3-entity control from two-entity training." />
<imgsrc="static/images/figs/teaser.png" alt="Incantation cross-entity transfer and multi-entity control teaser" />
90
+
<imgsrc="static/images/figs/teaser.jpg" alt="Incantation cross-entity transfer and multi-entity control teaser" width="1659" height="713" decoding="async" fetchpriority="high" />
94
91
</div>
95
92
<pclass="caption">
96
93
<strong>Cross-entity action transfer and multi-entity control in Elden Ring.</strong>
@@ -199,7 +196,7 @@ <h3>Sliding cache without positional drift</h3>
199
196
</article>
200
197
</div>
201
198
<divclass="figure-shell dark-figure">
202
-
<imgsrc="static/images/figs/workflow.png" alt="Incantation workflow with language-conditioned pretraining and Self-Forcing distillation" />
199
+
<imgsrc="static/images/figs/workflow.jpg" alt="Incantation workflow with language-conditioned pretraining and Self-Forcing distillation" width="1314" height="806" loading="lazy" decoding="async" />
203
200
<pclass="caption">Training and streaming workflow: language-conditioned pretraining followed by ordinary-differential-equation-initialized Self-Forcing distillation.</p>
204
201
</div>
205
202
</div>
@@ -259,7 +256,7 @@ <h2>Attention that respects time</h2>
259
256
<pclass="section-sub">Each action prompt describes the current frame, so Incantation prevents that prompt from contaminating committed history frames.</p>
260
257
<divclass="asset-grid">
261
258
<divclass="figure-shell">
262
-
<imgsrc="static/images/figs/masked-attention.png" alt="Decoupled text cross-attention restricted to the noisy target frame" />
259
+
<imgsrc="static/images/figs/masked-attention.png" alt="Decoupled text cross-attention restricted to the noisy target frame" width="1175" height="419" loading="lazy" decoding="async" />
263
260
<pclass="caption">Text cross-attention is applied only to the noisy target frame; history frames keep bidirectional self-attention.</p>
264
261
</div>
265
262
<divclass="text-panel">
@@ -394,7 +391,7 @@ <h3>Evidence: 3-entity control from 2-entity training</h3>
<imgsrc="static/images/figs/baseline-comparison.png" alt="Qualitative comparison against Seedance, Kling, LongLive, and Incantation" />
394
+
<imgsrc="static/images/figs/baseline-comparison.jpg" alt="Qualitative comparison against Seedance, Kling, LongLive, and Incantation" width="1373" height="800" loading="lazy" decoding="async" />
398
395
<pclass="caption">Qualitative comparison on Elden Ring. Strong video generators preserve visual fidelity, but Incantation is the method designed for per-frame player-boss action control.</p>
<pclass="section-sub">The same architecture and recipe are applied to Elden Ring and the visually unrelated King of Fighters world, changing only action-vocabulary slots. Real-time streaming is an enabling system property, not the core interface claim.</p>
436
433
<divclass="asset-grid">
437
434
<divclass="figure-shell">
438
-
<imgsrc="static/images/figs/margit-rollout.png" alt="Elden Ring long-horizon generated rollout" />
<pclass="caption">Elden Ring rollout sampled from a continuous generated session.</p>
440
437
</div>
441
438
<divclass="figure-shell">
442
-
<imgsrc="static/images/figs/kof-rollout.png" alt="King of Fighters generated rollout under the same architecture" />
439
+
<imgsrc="static/images/figs/kof-rollout.jpg" alt="King of Fighters generated rollout under the same architecture" width="788" height="723" loading="lazy" decoding="async" />
443
440
<pclass="caption">King of Fighters rollout under the same architecture and training recipe.</p>
444
441
</div>
445
442
</div>
@@ -536,7 +533,7 @@ <h3>0.25 s</h3>
536
533
537
534
<divclass="asset-grid" style="margin-top:30px;">
538
535
<divclass="figure-shell">
539
-
<imgsrc="static/images/figs/annotator.png" alt="Local annotation interface for Incantation" />
<pclass="caption">Blinded action-control accuracy interface. Annotators see the generated clip and per-entity target action, but not whether it came from natural language or the Action-ID baseline.</p>
0 commit comments