Skip to content

Commit 1c51157

Browse files
committed
--
1 parent d0a9761 commit 1c51157

8 files changed

Lines changed: 617 additions & 0 deletions

File tree

js/indexstart.js

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ st+= ' <a class="dropdown-item" target="_blank" href="'+path+'nsf-iis-
2222
st+= ' <a class="dropdown-item" target="_blank" href="'+path+'efficient-llm-inference/index.html">Efficient LLM Inference</a>';
2323
st+= ' <a class="dropdown-item" target="_blank" href="'+path+'trustworthy-llms-and-llm-agents/index.html">Trustworthy LLMs and LLM Agents</a>';
2424
st+= ' <a class="dropdown-item" target="_blank" href="'+path+'genai4dm/index.html">Generative AI for Multimodal Data Management</a>';
25+
st+= ' <a class="dropdown-item" target="_blank" href="'+path+'vectordb/index.html">VectorDB | Theoretically-sound and Practically-efficient Vector Data Retrieval</a>';
2526
st+= ' </div>';
2627
st += ' </li> ';
2728
//st += ' <li class="nav-item active"> ';

sitemap.xml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,4 +24,7 @@
2424
<url>
2525
<loc>https://uic-indexlab.github.io/genai4dm/</loc>
2626
</url>
27+
<url>
28+
<loc>https://uic-indexlab.github.io/vectordb/</loc>
29+
</url>
2730
</urlset>

vectordb/.DS_Store

6 KB
Binary file not shown.

vectordb/imgs/HENN.jpg

125 KB
Loading

vectordb/imgs/randomaccess.jpg

185 KB
Loading

vectordb/index.html

Lines changed: 218 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,218 @@
1+
<!DOCTYPE html>
2+
<html lang="en">
3+
<head>
4+
<meta charset="UTF-8" />
5+
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
6+
<title>Theoretically-sound and Practically-efficient Vector Data Retrieval</title>
7+
<meta name="description" content="Project webpage for Theoretically-sound and Practically-efficient Vector Data Retrieval (VectorDB)." />
8+
<meta name="robots" content="index,follow,max-image-preview:large" />
9+
<link rel="canonical" href="https://uic-indexlab.github.io/vectordb/" />
10+
<meta property="og:title" content="Theoretically-sound and Practically-efficient Vector Data Retrieval" />
11+
<meta property="og:description" content="Project webpage for Theoretically-sound and Practically-efficient Vector Data Retrieval (VectorDB)." />
12+
<meta property="og:type" content="website" />
13+
<meta property="og:url" content="https://uic-indexlab.github.io/vectordb/" />
14+
<link rel="preconnect" href="https://fonts.googleapis.com">
15+
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
16+
<link href="https://fonts.googleapis.com/css2?family=Inter:wght@400;500;600;700;800&display=swap" rel="stylesheet">
17+
<link rel="stylesheet" href="style.css" />
18+
</head>
19+
<body>
20+
<header class="hero" id="top">
21+
<div class="container hero-inner">
22+
<div class="hero-layout">
23+
<div class="hero-main">
24+
<div class="eyebrow">Project Website</div>
25+
<h4>Theoretically-sound and Practically-efficient</h4>
26+
<h1>Vector Data Retrieval</h1>
27+
<p class="subtitle">
28+
bridging rigorous retrieval foundations with deployable vector database systems
29+
</p>
30+
<p class="hero-text">
31+
This project studies the algorithmic and systems foundations of vector retrieval, with emphasis on ranked retrieval, similarity search, and random-access operations that make modern vector databases and RAG pipelines both efficient and reliable.
32+
</p>
33+
<div class="hero-actions">
34+
<a class="button button-primary" href="#overview">Project Overview</a>
35+
<a class="button button-secondary" href="#publications">Publications</a>
36+
</div>
37+
</div>
38+
39+
<aside class="hero-toc" aria-label="Table of contents">
40+
<p class="section-label">Table of Content</p>
41+
<nav class="toc-inner">
42+
<a href="#overview">Overview</a>
43+
<a href="#directions">Research Directions</a>
44+
<a href="#publications">Publications</a>
45+
<a href="#team">Team</a>
46+
</nav>
47+
</aside>
48+
</div>
49+
</div>
50+
</header>
51+
52+
<main>
53+
<section class="section" id="overview">
54+
<div class="container two-col">
55+
<div>
56+
<p class="section-label">Overview</p>
57+
<h2>Why this project matters</h2>
58+
</div>
59+
<div>
60+
<p>
61+
The rapid rise of large language models (LLMs) and Retrieval-Augmented Generation (RAG) for information retrieval, question answering, and knowledge-intensive reasoning, together with the growing prevalence of large-scale unstructured data, has driven the emergence of vector database management systems. These systems extend traditional database architectures by efficiently storing, indexing, and querying vector representations such as embeddings, which now sit at the core of many machine learning and AI applications.
62+
</p>
63+
<p>
64+
At the heart of this landscape are ranked retrieval and similarity search. These problems underpin vector database systems, nearest-neighbor search, and retrieval pipelines in LLM-based applications, where both efficiency and reliability are essential. Classical approximate nearest neighbor algorithms provide strong theoretical guarantees but often fail to scale in practice, while heuristic methods perform remarkably well empirically yet offer little in the way of provable guarantees.
65+
</p>
66+
<p>
67+
This project focuses on closing that gap. In particular, it studies how to design vector retrieval methods that preserve the rigor of theory while matching the performance expectations of real-world systems. A central challenge is that many ANN and ranked retrieval algorithms are optimized for top-<em>k</em> search when <em>k</em> is small, but modern interactive data systems also require random access to dynamically ordered items over high-dimensional feature spaces.
68+
</p>
69+
<p>
70+
By connecting data management theory with practical systems building, VectorDB aims to support a new generation of vector retrieval engines that are scalable, analyzable, and dependable enough for production use in search, recommendation, multimodal analytics, and LLM-powered reasoning.
71+
</p>
72+
</div>
73+
</div>
74+
</section>
75+
76+
<section class="section alt" id="directions">
77+
<div class="container two-col">
78+
<div>
79+
<p class="section-label">Research Directions</p>
80+
<h2>What we do</h2>
81+
</div>
82+
<div>
83+
<div class="card">
84+
<p class="section-intro">
85+
Our research examines the foundations and systems questions needed to make vector retrieval both theoretically principled and practically effective. The project asks:
86+
</p>
87+
<ul class="paper-points">
88+
<li>How can we bridge the longstanding gap between theoretically grounded ANN algorithms and the heuristic methods that dominate practical vector search engines?</li>
89+
<li>How can ranked retrieval and similarity search be made reliable enough for deployment in vector databases, RAG systems, and other knowledge-intensive AI pipelines?</li>
90+
<li>How can we develop efficient algorithms for random-access ranked retrieval, where ordering is defined dynamically over high-dimensional vector spaces rather than by fixed relational attributes?</li>
91+
<li>How should indexing, pruning, and access methods be redesigned so that random access and exploration remain efficient even when top-<em>k</em> is not the only retrieval mode that matters?</li>
92+
<li>What theoretical guarantees are most useful in practice for vector data systems: approximation quality, stability, latency bounds, memory efficiency, or robustness under distribution shift?</li>
93+
<li>How can vector retrieval algorithms better support interactive exploration, hybrid search, and adaptive ranking in systems that combine structured data, embeddings, and learned query operators?</li>
94+
<li>How can these ideas inform end-to-end vector database architectures that are not only fast on benchmarks, but also dependable, maintainable, and extensible in real deployments?</li>
95+
</ul>
96+
</div>
97+
</div>
98+
</div>
99+
</section>
100+
101+
<section class="section" id="publications">
102+
<div class="container">
103+
<p class="section-label">Project outcomes</p>
104+
<h2>Publications</h2>
105+
<p class="section-intro">
106+
This section will track publications on vector databases, ranked retrieval, similarity search, and random-access algorithms for high-dimensional data systems.
107+
</p>
108+
109+
<div class="pub-layout">
110+
<aside class="pub-toc">
111+
<h3>Publications</h3>
112+
<ol>
113+
<li><a href="#henn-corr-2025">HENN</a></li>
114+
<li><a href="#random-access-2026">Random Access</a></li>
115+
</ol>
116+
<p class="toc-note">This list will be updated as project papers are added.</p>
117+
</aside>
118+
119+
<div class="pub-content">
120+
<article class="paper-card" id="henn-corr-2025">
121+
<div class="paper-tag">CoRR 2025</div>
122+
<h3>HENN: A Hierarchical Epsilon Net Navigation Graph for Approximate Nearest Neighbor Search</h3>
123+
<p class="paper-meta">
124+
Mohsen Dehghankar and Abolfazl Asudeh · <em>CoRR</em> abs/2505.17368, 2025
125+
</p>
126+
<div class="paper-feature">
127+
<figure class="paper-figure">
128+
<img src="imgs/HENN.jpg" alt="Illustration for the HENN paper on hierarchical graph indexing for approximate nearest neighbor search">
129+
</figure>
130+
131+
<div class="paper-feature-copy">
132+
<p>
133+
Hierarchical graph-based algorithms such as HNSW have become the dominant practical approach for Approximate Nearest Neighbor search, yet their success has long outpaced their theory. They achieve strong empirical performance, but their reliance on randomized heuristic graph construction leaves open critical questions about query-time guarantees and worst-case recall. At the same time, many theoretically grounded ANN structures remain difficult to implement and rarely match the scale or simplicity demanded by real systems.
134+
</p>
135+
<p>
136+
This paper introduces <em>HENN</em>, the Hierarchical Epsilon Net Navigation Graph, as a graph-based indexing structure that combines rigorous guarantees with practical efficiency. Built on the theory of <em>&epsilon;</em>-nets, HENN guarantees polylogarithmic worst-case query time while preserving high recall and keeping implementation overhead low enough to remain realistic for deployment.
137+
</p>
138+
<p>
139+
Beyond proposing a new structure, the paper also gives theoretical insight into the empirical success of HNSW by establishing a probabilistic polylogarithmic query-time bound for it. That comparison is important, because prior hierarchical methods may degrade to linear query time under adversarial inputs, whereas HENN maintains provable performance independent of the underlying data distribution.
140+
</p>
141+
<p>
142+
Extensive experiments show that HENN achieves faster query time while maintaining competitive recall across a range of data distributions, including adversarial settings. The overall result is a robust and scalable ANN solution that closes part of the longstanding gap between principled retrieval theory and high-performance vector search in practice.
143+
</p>
144+
<p class="paper-citation">
145+
Citation: Dehghankar, Mohsen, and Abolfazl Asudeh. 2025. <em>HENN: A Hierarchical Epsilon Net Navigation Graph for Approximate Nearest Neighbor Search</em>. Preprint, CoRR abs/2505.17368.
146+
</p>
147+
<p class="paper-links">
148+
<a href="https://arxiv.org/pdf/2505.17368" target="_blank" rel="noopener noreferrer">Paper</a>
149+
</p>
150+
</div>
151+
</div>
152+
</article>
153+
154+
<article class="paper-card" id="random-access-2026">
155+
<div class="paper-tag">Under Review 2026</div>
156+
<h3>Random-Access Ranked Retrieval and Similarity Search</h3>
157+
<p class="paper-meta">
158+
Mohsen Dehghankar, Abolfazl Asudeh, Raghav Mittal, Suraj Shetiya, and Gautam Das · Under Review, 2026
159+
</p>
160+
<div class="paper-feature">
161+
<figure class="paper-figure">
162+
<img src="imgs/randomaccess.jpg" alt="Illustration for the Random Access paper on ranked retrieval and similarity search">
163+
</figure>
164+
165+
<div class="paper-feature-copy">
166+
<p>
167+
Random access is a fundamental operation behind many efficient search and exploration algorithms, but modern interactive data systems increasingly rely on ranked retrieval and similarity search where orderings are defined dynamically over high-dimensional feature spaces. That shift creates a new challenge: how can systems directly access the tuple at a desired rank when the ranking itself is induced by geometric or embedding-based structure rather than static stored attributes?
168+
</p>
169+
<p>
170+
This paper formalizes the <em>Random-Access Ranked Retrieval</em> (RAR) problem and extends the framework to similarity search. On the theoretical side, it develops an efficient algorithm based on geometric arrangements that achieves logarithmic query time. However, because that approach incurs exponential space complexity in high dimensions, the work also introduces a second family of algorithms based on <em>&epsilon;</em>-sampling that use only linear space.
171+
</p>
172+
<p>
173+
Since exactly locating the tuple at a specific rank is closely tied to the range counting problem and therefore inherently difficult, the paper further defines a relaxed variant called <em>&kappa;</em>-Random-Access Ranked Retrieval. Instead of returning one exact tuple, this formulation returns a small subset of size <em>&kappa;</em> guaranteed to contain the target item. To support this efficiently, the work introduces an intermediate problem, <em>Stripe Range Retrieval</em> (SRR), together with a hierarchical sampling data structure specialized for narrow stripe queries.
174+
</p>
175+
<p>
176+
The resulting methods achieve practical scalability across both dataset size and dimensionality. The paper proves near-optimal bounds for the proposed algorithms and validates them experimentally on real and synthetic datasets, showing scalability to millions of tuples and hundreds of dimensions while supporting efficient ranked retrieval and similarity search.
177+
</p>
178+
<p class="paper-citation">
179+
Citation: Mohsen Dehghankar, Abolfazl Asudeh, Raghav Mittal, Suraj Shetiya, and Gautam Das. 2026. <em>Random-Access Ranked Retrieval and Similarity Search</em>. Under Review.
180+
</p>
181+
</div>
182+
</div>
183+
</article>
184+
</div>
185+
</div>
186+
</div>
187+
</section>
188+
189+
<section class="section" id="team">
190+
<div class="container two-col">
191+
<div>
192+
<p class="section-label">Team</p>
193+
<h2>Project investigators</h2>
194+
</div>
195+
<div class="team-list">
196+
<div class="team-item">
197+
<h3><a href="https://www.cs.uic.edu/~asudeh/" target="_blank" rel="noopener noreferrer">A. Asudeh</a></h3>
198+
<p>Faculty</p>
199+
</div>
200+
<div class="team-item">
201+
<h3><a href="https://mohsendehghankar.github.io/" target="_blank" rel="noopener noreferrer">Mohsen Dehghankar</a></h3>
202+
<p>Lead PhD Student</p>
203+
</div>
204+
</div>
205+
</div>
206+
</section>
207+
</main>
208+
209+
<footer class="site-footer">
210+
<div class="container footer-inner">
211+
<p>Theoretically-sound and Practically-efficient Vector Data Retrieval</p>
212+
<p>InDeX Lab · UIC Computer Science</p>
213+
</div>
214+
</footer>
215+
216+
<script src="script.js"></script>
217+
</body>
218+
</html>

vectordb/script.js

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
const tocLinks = Array.from(document.querySelectorAll('.toc-inner a'));
2+
const sections = tocLinks
3+
.map(link => document.querySelector(link.getAttribute('href')))
4+
.filter(Boolean);
5+
6+
const setActiveLink = () => {
7+
const y = window.scrollY + 120;
8+
let activeId = sections[0]?.id;
9+
10+
sections.forEach(section => {
11+
if (section.offsetTop <= y) activeId = section.id;
12+
});
13+
14+
tocLinks.forEach(link => {
15+
const isActive = link.getAttribute('href') === `#${activeId}`;
16+
link.classList.toggle('active', isActive);
17+
});
18+
};
19+
20+
window.addEventListener('scroll', setActiveLink, { passive: true });
21+
window.addEventListener('load', setActiveLink);

0 commit comments

Comments
 (0)