Skip to content

Commit 0b09fde

Browse files
authored
Update inference.html
1 parent 58f25fc commit 0b09fde

1 file changed

Lines changed: 78 additions & 26 deletions

File tree

docs/TheMatrixDocs/inference.html

Lines changed: 78 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -456,45 +456,93 @@ <h2>1.Inference with the_matrix.py<a class="headerlink" href="#inference-with-th
456456
</section>
457457
<section id="inference-with-run-interactive-sh">
458458
<h2>2. Inference with run_interactive.sh<a class="headerlink" href="#inference-with-run-interactive-sh" title="Permalink to this heading">#</a></h2>
459-
<p>The <cite>run_interactive.sh</cite> script orchestrates a multi-stage pipeline using Ray, DIT and VAE processes. It performs the following steps:</p>
459+
<section id="summary">
460+
<h3>Summary<a class="headerlink" href="#summary" title="Permalink to this heading">#</a></h3>
461+
<p>run_interactive.sh launches a fully parallelized, low-latency pipeline that generates video at <strong>16 FPS</strong> end-to-end (i.e. real-time). This script leverages our 8-GPU DiT &amp; VAE parallel inference, stream consistency models, and fused data training to reduce a single-GPU baseline’s 32 s per 4 s video down to 4 s—a <strong>8× speedup</strong>—while maintaining infinite-horizon stability.</p>
462+
</section>
463+
<section id="highlights">
464+
<h3>Highlights<a class="headerlink" href="#highlights" title="Permalink to this heading">#</a></h3>
465+
<ul class="simple">
466+
<li><p><strong>8-GPU Parallel Inference</strong>
467+
DiT and VAE stages each slice work across 8 GPUs for a <strong>6–8× speedup</strong> vs. single-GPU.</p></li>
468+
<li><p><strong>Stream Consistency Models</strong>
469+
Novel consistency losses yield <strong>7–10× higher throughput</strong> over naïve frame-by-frame generation.</p></li>
470+
<li><p><strong>Real-Time Feedback Loop</strong>
471+
Sustains a continuous <strong>16 FPS</strong> generation/playback cycle with <strong>&lt; 50 ms</strong> input-to-output latency.</p></li>
472+
</ul>
473+
</section>
474+
<section id="two-inference-modes">
475+
<h3>Two Inference Modes<a class="headerlink" href="#two-inference-modes" title="Permalink to this heading">#</a></h3>
460476
<ol class="arabic simple">
461-
<li><p>Stop any existing Ray cluster</p></li>
462-
<li><p>Compute <cite>CUDA_VISIBLE_DEVICES</cite> based on configured GPU counts</p></li>
463-
<li><p>Start Ray head node</p></li>
464-
<li><p>Launch, in order (some in background):
465-
- <cite>create_ray_pipe.py</cite>
466-
- <cite>main.py</cite>
467-
- <cite>start_dit.sh</cite> (DIT inference)
468-
- <cite>start_decoding_daemon.py</cite> (VAE decoding daemon)</p></li>
477+
<li><p><strong>API-Driven (`the_matrix.py`)</strong>
478+
- Use when embedding generation inside your Python app.
479+
- Offers interactive control via <cite>the_matrix.generate(…)</cite> calls.
480+
- Suitable for few-shot or ad-hoc video snippets.</p></li>
481+
<li><p><strong>Scripted Pipeline (`run_interactive.sh`)</strong>
482+
- End-to-end shell script for bulk or real-time production.
483+
- Spins up a Ray cluster, runs all stages in parallel, and tears down automatically.
484+
- Ideal for continuous/live deployments or performance benchmarking.</p></li>
469485
</ol>
486+
</section>
487+
<section id="performance-comparison">
488+
<h3>Performance Comparison<a class="headerlink" href="#performance-comparison" title="Permalink to this heading">#</a></h3>
489+
<table class="table" id="id1">
490+
<caption><span class="caption-text">Inference throughput comparison for a 4 s video</span><a class="headerlink" href="#id1" title="Permalink to this table">#</a></caption>
491+
<colgroup>
492+
<col style="width: 25.0%" />
493+
<col style="width: 25.0%" />
494+
<col style="width: 25.0%" />
495+
<col style="width: 25.0%" />
496+
</colgroup>
497+
<thead>
498+
<tr class="row-odd"><th class="head"><p>Mode</p></th>
499+
<th class="head"><p>GPUs used</p></th>
500+
<th class="head"><p>FPS achieved</p></th>
501+
<th class="head"><p>Total latency</p></th>
502+
</tr>
503+
</thead>
504+
<tbody>
505+
<tr class="row-even"><td><p>Baseline API</p></td>
506+
<td><p>1</p></td>
507+
<td><p>~2</p></td>
508+
<td><p>~32 s</p></td>
509+
</tr>
510+
<tr class="row-odd"><td><p>Interactive</p></td>
511+
<td><p>8</p></td>
512+
<td><p>16</p></td>
513+
<td><p>~4 s</p></td>
514+
</tr>
515+
</tbody>
516+
</table>
517+
</section>
470518
<section id="configuration">
471519
<h3>Configuration<a class="headerlink" href="#configuration" title="Permalink to this heading">#</a></h3>
472-
<p>At the top of <cite>run_interactive.sh</cite>, set the following variables:</p>
473-
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="c1"># Number of GPUs for DIT stage</span>
520+
<p>At the top of <cite>run_interactive.sh</cite>, set:</p>
521+
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="c1"># GPUs for DiT stage (must sum to 8)</span>
474522
<span class="nv">NUM_GPUS_DIT</span><span class="o">=</span><span class="m">1</span>
475523

476-
<span class="c1"># Number of GPUs for VAE stage</span>
477-
<span class="nv">NUM_GPUS_VAE</span><span class="o">=</span><span class="m">3</span>
524+
<span class="c1"># GPUs for VAE stage (NUM_GPUS_DIT + NUM_GPUS_VAE = 8)</span>
525+
<span class="nv">NUM_GPUS_VAE</span><span class="o">=</span><span class="m">7</span>
478526

479527
<span class="c1"># Path to stage4 model weights</span>
480528
<span class="nv">MODEL_PATH</span><span class="o">=</span><span class="s2">&quot;../models/stage4&quot;</span>
481529
</pre></div>
482530
</div>
483-
<p>The script will assemble:</p>
531+
<p>The script computes:</p>
484532
<ul class="simple">
485-
<li><p><strong>GPU_IDS</strong>: a comma-separated list <cite>NUM_GPUS_DIT,NUM_GPUS_DIT+1,…</cite></p></li>
486-
<li><p><strong>CUDA_VISIBLE_DEVICES</strong>: exported before Ray and Python processes</p></li>
533+
<li><p><strong>GPU_IDS</strong>: comma-separated list <cite>NUM_GPUS_DIT,…,NUM_GPUS_DIT+NUM_GPUS_VAE-1</cite></p></li>
534+
<li><p><strong>CUDA_VISIBLE_DEVICES</strong>: exported for Ray &amp; all Python processes</p></li>
487535
</ul>
488536
</section>
489537
<section id="usage">
490538
<h3>Usage<a class="headerlink" href="#usage" title="Permalink to this heading">#</a></h3>
491-
<p>Run the entire pipeline with:</p>
539+
<p>Run the full pipeline:</p>
492540
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>bash<span class="w"> </span>run_interactive.sh
493541
</pre></div>
494542
</div>
495-
<p>Alternatively, export the three variables as environment variables:</p>
496-
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">export</span><span class="w"> </span><span class="nv">NUM_GPUS_DIT</span><span class="o">=</span><span class="m">1</span>
497-
<span class="nb">export</span><span class="w"> </span><span class="nv">NUM_GPUS_VAE</span><span class="o">=</span><span class="m">3</span>
543+
<p>Or override via environment:</p>
544+
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">export</span><span class="w"> </span><span class="nv">NUM_GPUS_DIT</span><span class="o">=</span><span class="m">2</span>
545+
<span class="nb">export</span><span class="w"> </span><span class="nv">NUM_GPUS_VAE</span><span class="o">=</span><span class="m">6</span>
498546
<span class="nb">export</span><span class="w"> </span><span class="nv">MODEL_PATH</span><span class="o">=</span><span class="s2">&quot;../models/stage4&quot;</span>
499547
bash<span class="w"> </span>run_interactive.sh
500548
</pre></div>
@@ -507,20 +555,20 @@ <h3>Sub-script: start_dit.sh<a class="headerlink" href="#sub-script-start-dit-sh
507555
</div>
508556
<dl class="field-list simple">
509557
<dt class="field-odd">NUM_GPUS_DIT<span class="colon">:</span></dt>
510-
<dd class="field-odd"><p>Number of GPUs to allocate for the DIT process.</p>
558+
<dd class="field-odd"><p>Number of GPUs allocated to DiT.</p>
511559
</dd>
512560
<dt class="field-even">MODEL_PATH<span class="colon">:</span></dt>
513-
<dd class="field-even"><p>Path to the directory or prefix of stage4 model checkpoint files.</p>
561+
<dd class="field-even"><p>Directory or prefix of stage4 checkpoint files.</p>
514562
</dd>
515563
</dl>
516564
</section>
517565
<section id="environment-variables">
518566
<h3>Environment Variables<a class="headerlink" href="#environment-variables" title="Permalink to this heading">#</a></h3>
519567
<ul class="simple">
520568
<li><p><strong>CUDA_VISIBLE_DEVICES</strong>
521-
Computed by the script as a comma-separated list to assign GPUs.</p></li>
569+
List of GPU indices assigned to Ray head, DiT, VAE, etc.</p></li>
522570
<li><p><strong>PYTORCH_CUDA_ALLOC_CONF</strong>
523-
Set to <cite>expandable_segments:True</cite> to configure PyTorch allocator.</p></li>
571+
Set to <cite>expandable_segments:True</cite> to optimize CUDA allocator behavior.</p></li>
524572
</ul>
525573
</section>
526574
</section>
@@ -572,6 +620,10 @@ <h3>Environment Variables<a class="headerlink" href="#environment-variables" tit
572620
<ul class="visible nav section-nav flex-column">
573621
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#inference-with-the-matrix-py">1.Inference with the_matrix.py</a></li>
574622
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#inference-with-run-interactive-sh">2. Inference with run_interactive.sh</a><ul class="nav section-nav flex-column">
623+
<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#summary">Summary</a></li>
624+
<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#highlights">Highlights</a></li>
625+
<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#two-inference-modes">Two Inference Modes</a></li>
626+
<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#performance-comparison">Performance Comparison</a></li>
575627
<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#configuration">Configuration</a></li>
576628
<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#usage">Usage</a></li>
577629
<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#sub-script-start-dit-sh">Sub-script: start_dit.sh</a></li>
@@ -584,7 +636,7 @@ <h3>Environment Variables<a class="headerlink" href="#environment-variables" tit
584636
<div class="sidebar-secondary-item">
585637

586638
<div class="tocsection sourcelink">
587-
<a href="source/inference.rst.txt">
639+
<a href="_sources/inference.rst.txt">
588640
<i class="fa-solid fa-file-lines"></i> Show Source
589641
</a>
590642
</div>
@@ -646,4 +698,4 @@ <h3>Environment Variables<a class="headerlink" href="#environment-variables" tit
646698

647699
</footer>
648700
</body>
649-
</html>
701+
</html>

0 commit comments

Comments
 (0)