Skip to content

Commit da58c1b

Browse files
authored
Update inference.html
1 parent 0b09fde commit da58c1b

1 file changed

Lines changed: 5 additions & 37 deletions

File tree

docs/TheMatrixDocs/inference.html

Lines changed: 5 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -458,7 +458,7 @@ <h2>1.Inference with the_matrix.py<a class="headerlink" href="#inference-with-th
458458
<h2>2. Inference with run_interactive.sh<a class="headerlink" href="#inference-with-run-interactive-sh" title="Permalink to this heading">#</a></h2>
459459
<section id="summary">
460460
<h3>Summary<a class="headerlink" href="#summary" title="Permalink to this heading">#</a></h3>
461-
<p>run_interactive.sh launches a fully parallelized, low-latency pipeline that generates video at <strong>16 FPS</strong> end-to-end (i.e. real-time). This script leverages our 8-GPU DiT &amp; VAE parallel inference, stream consistency models, and fused data training to reduce a single-GPU baseline’s 32 s per 4 s video down to 4 s—a <strong>8× speedup</strong>—while maintaining infinite-horizon stability.</p>
461+
<p>run_interactive.sh launches a fully parallelized, low-latency pipeline that generates video at <strong>16 FPS</strong> end-to-end (i.e. real-time). This script leverages our 8-GPU DiT &amp; VAE parallel inference, stream consistency models to reduce a single-GPU baseline’s 32 s per 4 s video down to 4 s—a <strong>8× speedup</strong>—while maintaining infinite-horizon stability.</p>
462462
</section>
463463
<section id="highlights">
464464
<h3>Highlights<a class="headerlink" href="#highlights" title="Permalink to this heading">#</a></h3>
@@ -468,7 +468,7 @@ <h3>Highlights<a class="headerlink" href="#highlights" title="Permalink to this
468468
<li><p><strong>Stream Consistency Models</strong>
469469
Novel consistency losses yield <strong>7–10× higher throughput</strong> over naïve frame-by-frame generation.</p></li>
470470
<li><p><strong>Real-Time Feedback Loop</strong>
471-
Sustains a continuous <strong>16 FPS</strong> generation/playback cycle with <strong>&lt; 50 ms</strong> input-to-output latency.</p></li>
471+
Sustains a continuous <strong>16 FPS</strong> generation/playback cycle in real time.</p></li>
472472
</ul>
473473
</section>
474474
<section id="two-inference-modes">
@@ -484,45 +484,14 @@ <h3>Two Inference Modes<a class="headerlink" href="#two-inference-modes" title="
484484
- Ideal for continuous/live deployments or performance benchmarking.</p></li>
485485
</ol>
486486
</section>
487-
<section id="performance-comparison">
488-
<h3>Performance Comparison<a class="headerlink" href="#performance-comparison" title="Permalink to this heading">#</a></h3>
489-
<table class="table" id="id1">
490-
<caption><span class="caption-text">Inference throughput comparison for a 4 s video</span><a class="headerlink" href="#id1" title="Permalink to this table">#</a></caption>
491-
<colgroup>
492-
<col style="width: 25.0%" />
493-
<col style="width: 25.0%" />
494-
<col style="width: 25.0%" />
495-
<col style="width: 25.0%" />
496-
</colgroup>
497-
<thead>
498-
<tr class="row-odd"><th class="head"><p>Mode</p></th>
499-
<th class="head"><p>GPUs used</p></th>
500-
<th class="head"><p>FPS achieved</p></th>
501-
<th class="head"><p>Total latency</p></th>
502-
</tr>
503-
</thead>
504-
<tbody>
505-
<tr class="row-even"><td><p>Baseline API</p></td>
506-
<td><p>1</p></td>
507-
<td><p>~2</p></td>
508-
<td><p>~32 s</p></td>
509-
</tr>
510-
<tr class="row-odd"><td><p>Interactive</p></td>
511-
<td><p>8</p></td>
512-
<td><p>16</p></td>
513-
<td><p>~4 s</p></td>
514-
</tr>
515-
</tbody>
516-
</table>
517-
</section>
518487
<section id="configuration">
519488
<h3>Configuration<a class="headerlink" href="#configuration" title="Permalink to this heading">#</a></h3>
520489
<p>At the top of <cite>run_interactive.sh</cite>, set:</p>
521-
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="c1"># GPUs for DiT stage (must sum to 8)</span>
490+
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="c1"># GPUs for DiT stage</span>
522491
<span class="nv">NUM_GPUS_DIT</span><span class="o">=</span><span class="m">1</span>
523492

524-
<span class="c1"># GPUs for VAE stage (NUM_GPUS_DIT + NUM_GPUS_VAE = 8)</span>
525-
<span class="nv">NUM_GPUS_VAE</span><span class="o">=</span><span class="m">7</span>
493+
<span class="c1"># GPUs for VAE stage</span>
494+
<span class="nv">NUM_GPUS_VAE</span><span class="o">=</span><span class="m">3</span>
526495

527496
<span class="c1"># Path to stage4 model weights</span>
528497
<span class="nv">MODEL_PATH</span><span class="o">=</span><span class="s2">&quot;../models/stage4&quot;</span>
@@ -623,7 +592,6 @@ <h3>Environment Variables<a class="headerlink" href="#environment-variables" tit
623592
<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#summary">Summary</a></li>
624593
<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#highlights">Highlights</a></li>
625594
<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#two-inference-modes">Two Inference Modes</a></li>
626-
<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#performance-comparison">Performance Comparison</a></li>
627595
<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#configuration">Configuration</a></li>
628596
<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#usage">Usage</a></li>
629597
<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#sub-script-start-dit-sh">Sub-script: start_dit.sh</a></li>

0 commit comments

Comments
 (0)