@@ -456,45 +456,93 @@ <h2>1.Inference with the_matrix.py<a class="headerlink" href="#inference-with-th
456456</ section >
457457< section id ="inference-with-run-interactive-sh ">
458458< h2 > 2. Inference with run_interactive.sh< a class ="headerlink " href ="#inference-with-run-interactive-sh " title ="Permalink to this heading "> #</ a > </ h2 >
459- < p > The < cite > run_interactive.sh</ cite > script orchestrates a multi-stage pipeline using Ray, DIT and VAE processes. It performs the following steps:</ p >
459+ < section id ="summary ">
460+ < h3 > Summary< a class ="headerlink " href ="#summary " title ="Permalink to this heading "> #</ a > </ h3 >
461+ < p > run_interactive.sh launches a fully parallelized, low-latency pipeline that generates video at < strong > 16 FPS</ strong > end-to-end (i.e. real-time). This script leverages our 8-GPU DiT & VAE parallel inference, stream consistency models, and fused data training to reduce a single-GPU baseline’s 32 s per 4 s video down to 4 s—a < strong > 8× speedup</ strong > —while maintaining infinite-horizon stability.</ p >
462+ </ section >
463+ < section id ="highlights ">
464+ < h3 > Highlights< a class ="headerlink " href ="#highlights " title ="Permalink to this heading "> #</ a > </ h3 >
465+ < ul class ="simple ">
466+ < li > < p > < strong > 8-GPU Parallel Inference</ strong >
467+ DiT and VAE stages each slice work across 8 GPUs for a < strong > 6–8× speedup</ strong > vs. single-GPU.</ p > </ li >
468+ < li > < p > < strong > Stream Consistency Models</ strong >
469+ Novel consistency losses yield < strong > 7–10× higher throughput</ strong > over naïve frame-by-frame generation.</ p > </ li >
470+ < li > < p > < strong > Real-Time Feedback Loop</ strong >
471+ Sustains a continuous < strong > 16 FPS</ strong > generation/playback cycle with < strong > < 50 ms</ strong > input-to-output latency.</ p > </ li >
472+ </ ul >
473+ </ section >
474+ < section id ="two-inference-modes ">
475+ < h3 > Two Inference Modes< a class ="headerlink " href ="#two-inference-modes " title ="Permalink to this heading "> #</ a > </ h3 >
460476< ol class ="arabic simple ">
461- < li > < p > Stop any existing Ray cluster </ p > </ li >
462- < li > < p > Compute < cite > CUDA_VISIBLE_DEVICES </ cite > based on configured GPU counts </ p > </ li >
463- < li > < p > Start Ray head node </ p > </ li >
464- < li > < p > Launch, in order (some in background):
465- - < cite > create_ray_pipe.py </ cite >
466- - < cite > main.py </ cite >
467- - < cite > start_dit.sh </ cite > (DIT inference)
468- - < cite > start_decoding_daemon.py </ cite > (VAE decoding daemon) </ p > </ li >
477+ < li > < p > < strong > API-Driven (`the_matrix.py`) </ strong >
478+ - Use when embedding generation inside your Python app.
479+ - Offers interactive control via < cite > the_matrix.generate(…) </ cite > calls.
480+ - Suitable for few-shot or ad-hoc video snippets. </ p > </ li >
481+ < li > < p > < strong > Scripted Pipeline (`run_interactive.sh`) </ strong >
482+ - End-to-end shell script for bulk or real-time production.
483+ - Spins up a Ray cluster, runs all stages in parallel, and tears down automatically.
484+ - Ideal for continuous/live deployments or performance benchmarking. </ p > </ li >
469485</ ol >
486+ </ section >
487+ < section id ="performance-comparison ">
488+ < h3 > Performance Comparison< a class ="headerlink " href ="#performance-comparison " title ="Permalink to this heading "> #</ a > </ h3 >
489+ < table class ="table " id ="id1 ">
490+ < caption > < span class ="caption-text "> Inference throughput comparison for a 4 s video</ span > < a class ="headerlink " href ="#id1 " title ="Permalink to this table "> #</ a > </ caption >
491+ < colgroup >
492+ < col style ="width: 25.0% " />
493+ < col style ="width: 25.0% " />
494+ < col style ="width: 25.0% " />
495+ < col style ="width: 25.0% " />
496+ </ colgroup >
497+ < thead >
498+ < tr class ="row-odd "> < th class ="head "> < p > Mode</ p > </ th >
499+ < th class ="head "> < p > GPUs used</ p > </ th >
500+ < th class ="head "> < p > FPS achieved</ p > </ th >
501+ < th class ="head "> < p > Total latency</ p > </ th >
502+ </ tr >
503+ </ thead >
504+ < tbody >
505+ < tr class ="row-even "> < td > < p > Baseline API</ p > </ td >
506+ < td > < p > 1</ p > </ td >
507+ < td > < p > ~2</ p > </ td >
508+ < td > < p > ~32 s</ p > </ td >
509+ </ tr >
510+ < tr class ="row-odd "> < td > < p > Interactive</ p > </ td >
511+ < td > < p > 8</ p > </ td >
512+ < td > < p > 16</ p > </ td >
513+ < td > < p > ~4 s</ p > </ td >
514+ </ tr >
515+ </ tbody >
516+ </ table >
517+ </ section >
470518< section id ="configuration ">
471519< h3 > Configuration< a class ="headerlink " href ="#configuration " title ="Permalink to this heading "> #</ a > </ h3 >
472- < p > At the top of < cite > run_interactive.sh</ cite > , set the following variables :</ p >
473- < div class ="highlight-bash notranslate "> < div class ="highlight "> < pre > < span > </ span > < span class ="c1 "> # Number of GPUs for DIT stage</ span >
520+ < p > At the top of < cite > run_interactive.sh</ cite > , set:</ p >
521+ < div class ="highlight-bash notranslate "> < div class ="highlight "> < pre > < span > </ span > < span class ="c1 "> # GPUs for DiT stage (must sum to 8) </ span >
474522< span class ="nv "> NUM_GPUS_DIT</ span > < span class ="o "> =</ span > < span class ="m "> 1</ span >
475523
476- < span class ="c1 "> # Number of GPUs for VAE stage</ span >
477- < span class ="nv "> NUM_GPUS_VAE</ span > < span class ="o "> =</ span > < span class ="m "> 3 </ span >
524+ < span class ="c1 "> # GPUs for VAE stage (NUM_GPUS_DIT + NUM_GPUS_VAE = 8) </ span >
525+ < span class ="nv "> NUM_GPUS_VAE</ span > < span class ="o "> =</ span > < span class ="m "> 7 </ span >
478526
479527< span class ="c1 "> # Path to stage4 model weights</ span >
480528< span class ="nv "> MODEL_PATH</ span > < span class ="o "> =</ span > < span class ="s2 "> "../models/stage4"</ span >
481529</ pre > </ div >
482530</ div >
483- < p > The script will assemble :</ p >
531+ < p > The script computes :</ p >
484532< ul class ="simple ">
485- < li > < p > < strong > GPU_IDS</ strong > : a comma-separated list < cite > NUM_GPUS_DIT,NUM_GPUS_DIT+1,… </ cite > </ p > </ li >
486- < li > < p > < strong > CUDA_VISIBLE_DEVICES</ strong > : exported before Ray and Python processes</ p > </ li >
533+ < li > < p > < strong > GPU_IDS</ strong > : comma-separated list < cite > NUM_GPUS_DIT,…, NUM_GPUS_DIT+NUM_GPUS_VAE-1 </ cite > </ p > </ li >
534+ < li > < p > < strong > CUDA_VISIBLE_DEVICES</ strong > : exported for Ray & all Python processes</ p > </ li >
487535</ ul >
488536</ section >
489537< section id ="usage ">
490538< h3 > Usage< a class ="headerlink " href ="#usage " title ="Permalink to this heading "> #</ a > </ h3 >
491- < p > Run the entire pipeline with :</ p >
539+ < p > Run the full pipeline:</ p >
492540< div class ="highlight-bash notranslate "> < div class ="highlight "> < pre > < span > </ span > bash< span class ="w "> </ span > run_interactive.sh
493541</ pre > </ div >
494542</ div >
495- < p > Alternatively, export the three variables as environment variables :</ p >
496- < div class ="highlight-bash notranslate "> < div class ="highlight "> < pre > < span > </ span > < span class ="nb "> export</ span > < span class ="w "> </ span > < span class ="nv "> NUM_GPUS_DIT</ span > < span class ="o "> =</ span > < span class ="m "> 1 </ span >
497- < span class ="nb "> export</ span > < span class ="w "> </ span > < span class ="nv "> NUM_GPUS_VAE</ span > < span class ="o "> =</ span > < span class ="m "> 3 </ span >
543+ < p > Or override via environment:</ p >
544+ < div class ="highlight-bash notranslate "> < div class ="highlight "> < pre > < span > </ span > < span class ="nb "> export</ span > < span class ="w "> </ span > < span class ="nv "> NUM_GPUS_DIT</ span > < span class ="o "> =</ span > < span class ="m "> 2 </ span >
545+ < span class ="nb "> export</ span > < span class ="w "> </ span > < span class ="nv "> NUM_GPUS_VAE</ span > < span class ="o "> =</ span > < span class ="m "> 6 </ span >
498546< span class ="nb "> export</ span > < span class ="w "> </ span > < span class ="nv "> MODEL_PATH</ span > < span class ="o "> =</ span > < span class ="s2 "> "../models/stage4"</ span >
499547bash< span class ="w "> </ span > run_interactive.sh
500548</ pre > </ div >
@@ -507,20 +555,20 @@ <h3>Sub-script: start_dit.sh<a class="headerlink" href="#sub-script-start-dit-sh
507555</ div >
508556< dl class ="field-list simple ">
509557< dt class ="field-odd "> NUM_GPUS_DIT< span class ="colon "> :</ span > </ dt >
510- < dd class ="field-odd "> < p > Number of GPUs to allocate for the DIT process .</ p >
558+ < dd class ="field-odd "> < p > Number of GPUs allocated to DiT .</ p >
511559</ dd >
512560< dt class ="field-even "> MODEL_PATH< span class ="colon "> :</ span > </ dt >
513- < dd class ="field-even "> < p > Path to the directory or prefix of stage4 model checkpoint files.</ p >
561+ < dd class ="field-even "> < p > Directory or prefix of stage4 checkpoint files.</ p >
514562</ dd >
515563</ dl >
516564</ section >
517565< section id ="environment-variables ">
518566< h3 > Environment Variables< a class ="headerlink " href ="#environment-variables " title ="Permalink to this heading "> #</ a > </ h3 >
519567< ul class ="simple ">
520568< li > < p > < strong > CUDA_VISIBLE_DEVICES</ strong >
521- Computed by the script as a comma-separated list to assign GPUs .</ p > </ li >
569+ List of GPU indices assigned to Ray head, DiT, VAE, etc .</ p > </ li >
522570< li > < p > < strong > PYTORCH_CUDA_ALLOC_CONF</ strong >
523- Set to < cite > expandable_segments:True</ cite > to configure PyTorch allocator.</ p > </ li >
571+ Set to < cite > expandable_segments:True</ cite > to optimize CUDA allocator behavior .</ p > </ li >
524572</ ul >
525573</ section >
526574</ section >
@@ -572,6 +620,10 @@ <h3>Environment Variables<a class="headerlink" href="#environment-variables" tit
572620 < ul class ="visible nav section-nav flex-column ">
573621< li class ="toc-h2 nav-item toc-entry "> < a class ="reference internal nav-link " href ="#inference-with-the-matrix-py "> 1.Inference with the_matrix.py</ a > </ li >
574622< li class ="toc-h2 nav-item toc-entry "> < a class ="reference internal nav-link " href ="#inference-with-run-interactive-sh "> 2. Inference with run_interactive.sh</ a > < ul class ="nav section-nav flex-column ">
623+ < li class ="toc-h3 nav-item toc-entry "> < a class ="reference internal nav-link " href ="#summary "> Summary</ a > </ li >
624+ < li class ="toc-h3 nav-item toc-entry "> < a class ="reference internal nav-link " href ="#highlights "> Highlights</ a > </ li >
625+ < li class ="toc-h3 nav-item toc-entry "> < a class ="reference internal nav-link " href ="#two-inference-modes "> Two Inference Modes</ a > </ li >
626+ < li class ="toc-h3 nav-item toc-entry "> < a class ="reference internal nav-link " href ="#performance-comparison "> Performance Comparison</ a > </ li >
575627< li class ="toc-h3 nav-item toc-entry "> < a class ="reference internal nav-link " href ="#configuration "> Configuration</ a > </ li >
576628< li class ="toc-h3 nav-item toc-entry "> < a class ="reference internal nav-link " href ="#usage "> Usage</ a > </ li >
577629< li class ="toc-h3 nav-item toc-entry "> < a class ="reference internal nav-link " href ="#sub-script-start-dit-sh "> Sub-script: start_dit.sh</ a > </ li >
@@ -584,7 +636,7 @@ <h3>Environment Variables<a class="headerlink" href="#environment-variables" tit
584636 < div class ="sidebar-secondary-item ">
585637
586638 < div class ="tocsection sourcelink ">
587- < a href ="source /inference.rst.txt ">
639+ < a href ="_sources /inference.rst.txt ">
588640 < i class ="fa-solid fa-file-lines "> </ i > Show Source
589641 </ a >
590642 </ div >
@@ -646,4 +698,4 @@ <h3>Environment Variables<a class="headerlink" href="#environment-variables" tit
646698
647699 </ footer >
648700 </ body >
649- </ html >
701+ </ html >
0 commit comments