Skip to content

Commit 54ca1f3

Browse files
committed
--
1 parent ae272aa commit 54ca1f3

1 file changed

Lines changed: 54 additions & 5 deletions

File tree

efficient-on-device-llm-inference/index.html

Lines changed: 54 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -31,14 +31,15 @@ <h1>Efficient and Private On-device LLM Inference</h1>
3131
</p>
3232
<div class="hero-actions">
3333
<a class="button button-primary" href="#overview">Project Overview</a>
34-
<a class="button button-secondary" href="#publications">Publications</a>
34+
<a class="button button-secondary" href="#softwares">Project Outcomes</a>
3535
</div>
3636
</div>
3737

3838
<aside class="hero-toc" aria-label="Table of contents">
3939
<p class="section-label">Table of Content</p>
4040
<nav class="toc-inner">
4141
<a href="#overview">Overview</a>
42+
<a href="#softwares">Softwares</a>
4243
<a href="#publications">Publications</a>
4344
<a href="#team">Team</a>
4445
</nav>
@@ -71,10 +72,59 @@ <h2>Project overview</h2>
7172
</div>
7273
</section>
7374

74-
<section class="section alt" id="publications">
75+
<section class="section alt" id="softwares">
7576
<div class="container">
76-
<p class="section-label">Publications</p>
77-
<h2>Project outcomes</h2>
77+
<p class="section-label">Project outcomes</p>
78+
<h2>Softwares</h2>
79+
<p class="section-intro">
80+
Beyond papers, this project also produces software artifacts that make efficient low-bit inference practical in real systems.
81+
</p>
82+
83+
<div class="card-grid">
84+
<article class="paper-card" id="software-rsr-core">
85+
<div class="paper-tag">Software</div>
86+
<h3>RSR-core</h3>
87+
<p class="paper-meta">
88+
A high-performance engine for low-bit matrix-vector multiplication across CPU and CUDA backends.
89+
</p>
90+
<div class="paper-feature">
91+
<figure class="paper-figure">
92+
<a href="https://drive.google.com/file/d/1ub-MITJUepmfBLkyUZFb50hbJsuhgwCH/view?usp=sharing" target="_blank" rel="noopener noreferrer">
93+
<img src="https://raw.githubusercontent.com/UIC-InDeXLab/RSR-core/main/assets/rsr_baseline_compare.webp" alt="RSR-core demo visual comparing RSR against a Hugging Face baseline">
94+
</a>
95+
</figure>
96+
97+
<div class="paper-feature-copy">
98+
<p>
99+
<em>RSR-core</em> is the systems implementation of the Redundant Segment Reduction framework for efficient low-bit inference. The repository provides the core kernels, model integrations, and benchmarking pipeline needed to accelerate binary and ternary matrix-vector multiplication, which is a dominant operation in low-bit neural inference and LLM decoding.
100+
</p>
101+
<p>
102+
The engine supports both CPU and CUDA backends and exposes optimized low-level kernels together with Python wrappers for 1-bit and 1.58-bit multiplication. It is designed to bridge the gap between the algorithmic gains of RSR and practical deployment in real inference pipelines.
103+
</p>
104+
<p>
105+
The software also includes Hugging Face integration for preprocessing quantized models into RSR format and running accelerated inference from those preprocessed artifacts. In addition, the repository provides benchmarking scripts for kernel-level and end-to-end evaluation, making it easy to reproduce performance results on local hardware.
106+
</p>
107+
<p>
108+
For interactive use, <em>RSR-core</em> includes a web dashboard built with FastAPI and Vite/React. The interface supports model browsing, preprocessing, side-by-side backend comparison, inference, and benchmark visualization, turning the project into a production-oriented workflow rather than a standalone prototype.
109+
</p>
110+
<p>
111+
According to the repository benchmarks, the engine achieves substantial speedups over Hugging Face PyTorch baselines, including up to 62× higher throughput on CPU and up to 1.9× faster token generation on CUDA for popular ternary LLMs. The demo visual above links to the project’s video demonstration from the repository README.
112+
</p>
113+
<p class="paper-links">
114+
<a href="https://github.com/UIC-InDeXLab/RSR-core" target="_blank" rel="noopener noreferrer">Repository</a>
115+
<a href="https://drive.google.com/file/d/1ub-MITJUepmfBLkyUZFb50hbJsuhgwCH/view?usp=sharing" target="_blank" rel="noopener noreferrer">Demo Video</a>
116+
</p>
117+
</div>
118+
</div>
119+
</article>
120+
</div>
121+
</div>
122+
</section>
123+
124+
<section class="section" id="publications">
125+
<div class="container">
126+
<p class="section-label">Project outcomes</p>
127+
<h2>Publications</h2>
78128
<p class="section-intro">
79129
This project currently includes three papers on algorithmic and systems foundations for efficient low-bit neural inference.
80130
</p>
@@ -162,7 +212,6 @@ <h3>RSR-core: A High-Performance Engine for Low-Bit Matrix-Vector Multiplication
162212
</p>
163213
<p class="paper-links">
164214
<a href="https://arxiv.org/pdf/2603.27462" target="_blank" rel="noopener noreferrer">Paper</a>
165-
<a href="https://github.com/UIC-InDeXLab/RSR-core" target="_blank" rel="noopener noreferrer">RSR-core Repository</a>
166215
</p>
167216
</div>
168217
</div>

0 commit comments

Comments
 (0)