Skip to content

Commit cb14919

Browse files
committed
update readme
1 parent 6858a64 commit cb14919

2 files changed

Lines changed: 59 additions & 20 deletions

File tree

README.md

Lines changed: 59 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,32 +1,53 @@
11
<div align="center">
22

3-
# ConDB
3+
<img src="assets/banner.png" alt="ConDB Banner" />
44

5-
<p align="center"><b>Context Database for Hierarchical Document Trees</b></p>
5+
<br/>
6+
7+
# ConDB: The KV-Cache Native Context Database
68

79
<p align="center">
8-
Store, navigate, and query hierarchical document structures with LLM-powered reasoning retrieval.
10+
A new type of database optimized for reasoning-driven tree search — fast, context-aware retrieval at scale with up to 70% less token cost.
911
</p>
1012

1113
</div>
1214

1315
---
1416

15-
## What is ConDB?
17+
## 🌲 What is ConDB?
18+
19+
**ConDB** (Context Database) is a tree-structured context database that uses LLM-powered **reasoning-based retrieval** via tree search instead of vector similarity — no vector DB, no chunking. It accepts [PageIndex](https://github.com/VectifyAI/PageIndex)-compatible document trees, [ChatIndex](https://github.com/VectifyAI/ChatIndex) conversation trees, filesystem trees, and custom hierarchical JSON — with no runtime dependency on either. The LLM reasons over the tree, like a human expert using a table of contents, to locate relevant content.
20+
21+
### Why not vector search?
22+
23+
- **Similarity ≠ relevance** — vector search retrieves what looks similar, not what is truly relevant. Similar-looking chunks may differ in intent (low accuracy), while truly relevant information may be expressed in very different language and get missed entirely (low recall). True relevance requires reasoning
24+
- **Chunking breaks semantic continuity** — documents must be split into fixed-size segments to fit embedding models, causing context fragmentation that destroys their natural structure and cross-section relationships
25+
- **Retrieval is blind to context** — embedding models encode the query alone, ignoring conversational history, user intent, and other contextual signals
26+
27+
ConDB replaces this with **reasoning-based tree search**: the LLM performs node-level relevance classification over a hierarchical index, incorporating full context — making retrieval adaptive, explainable, and traceable.
1628

17-
**ConDB** stores hierarchical document trees in a SQLite database and provides LLM-powered **reasoning-based retrieval** to query them — no vector DB, no chunking. It accepts pageindex-compatible trees, chat trees, and custom hierarchical JSON without taking a runtime code dependency on PageIndex itself.
29+
### What makes ConDB different
1830

19-
**Key capabilities:**
31+
- **Fast tree search at scale** — reasoning-driven tree search with block partitioning and parallel processing, supporting complex, context-aware retrieval over large hierarchical structures
32+
- **KV-cache native** — the first database designed around LLM KV-cache reuse. By caching intermediate results during tree search, ConDB reduces token usage by up to 70% with no loss in accuracy. The same efficiency gains extend to memory systems for long-context reasoning at scale
33+
- **Unified long-context infrastructure** — a single system for both static and dynamic long-context workloads
2034

21-
- **Hierarchical storage** — store document trees, chat trees, and custom hierarchical JSON in SQLite
22-
- **Reasoning-based retrieval** — LLM navigates the tree to find relevant content, like a human expert
35+
### Static long context
36+
Structured, persistent knowledge — documents (via [PageIndex](https://github.com/VectifyAI/PageIndex)), file systems, and codebases. Scalable retrieval within large, organized hierarchies.
37+
38+
### Dynamic long context
39+
Evolving, runtime context — agent memory, long conversations (via [ChatIndex](https://github.com/VectifyAI/ChatIndex)), and autoresearch. Systems can continuously update, retrieve, and reason over newly generated information.
40+
41+
### Key capabilities
42+
43+
- **Hierarchical storage** — document trees, chat trees, and custom hierarchical JSON in SQLite
2344
- **Multiple retrieval strategies** — beam search for small trees, block retrieval for large documents
24-
- **Multi-provider LLM support**works with Anthropic (Claude) and OpenAI (GPT) out of the box
45+
- **Multi-provider LLM support** — Anthropic (Claude) and OpenAI (GPT) out of the box
2546
- **Extensible** — plug in custom storage backends, LLM providers, or retrieval strategies
2647

2748
---
2849

29-
## Quick Start
50+
## Quick Start
3051

3152
### Install
3253

@@ -73,7 +94,7 @@ ct.close()
7394

7495
---
7596

76-
## Configuration
97+
## ⚙️ Configuration
7798

7899
Create a `.env` file:
79100

@@ -93,14 +114,14 @@ llm = Config.get_llm_client()
93114

94115
---
95116

96-
## Retrieval Strategies
117+
## 🔍 Retrieval Strategies
97118

98119
ConDB automatically selects the best retrieval strategy based on tree size:
99120

100121
| Strategy | Best for | How it works |
101122
|----------|----------|--------------|
102123
| **Beam** | Small trees (< 50 nodes) | LLM evaluates and selects promising branches at each depth level |
103-
| **Block** | Large documents (50+ nodes) | Splits tree into token-bounded blocks, LLM reasons over each block |
124+
| **Block** | Large documents (50+ nodes) | Splits tree into token-bounded blocks, LLM reasons over each block. KV-cache native — caches intermediate block results to cut token usage by up to 70% |
104125

105126
You can also specify a strategy explicitly:
106127

@@ -110,7 +131,7 @@ result = db.query(tree_id, "question", strategy="block", beam_size=3)
110131

111132
---
112133

113-
## Benchmark Snapshot
134+
## 📈 Benchmark Snapshot
114135

115136
Current filesystem benchmark summary lives in [bench/fs_block_beam_vertical.md](bench/fs_block_beam_vertical.md).
116137

@@ -130,7 +151,7 @@ These numbers are benchmark snapshots, not hard guarantees; exact cost and laten
130151

131152
---
132153

133-
## Architecture
154+
## 🧩 Architecture
134155

135156
```
136157
contextdb/
@@ -151,7 +172,7 @@ contextdb/
151172

152173
---
153174

154-
## Extending
175+
## 🔌 Extending
155176

156177
<details>
157178
<summary><b>Custom Storage Backend</b></summary>
@@ -184,21 +205,39 @@ ct = ContextTree("db.sqlite", llm=MyLLM())
184205

185206
---
186207

187-
## Testing
208+
## 🧪 Testing
188209

189210
```bash
190211
./run_tests.sh all
191212
```
192213

193214
---
194215

195-
## Related Projects
216+
## 🧭 Related Projects
196217

197-
- [**PageIndex**](https://github.com/VectifyAI/PageIndex) — one possible external producer of pageindex-compatible document trees
218+
- [**PageIndex**](https://github.com/VectifyAI/PageIndex) — vectorless, reasoning-based RAG that builds hierarchical tree indexes from long documents
219+
- [**ChatIndex**](https://github.com/VectifyAI/ChatIndex) — tree indexing for long conversations, enabling reasoning-based retrieval over chat histories
198220
- [**AgentFS**](https://github.com/anthropics/agentfs) — filesystem for AI agents
199221

200222
---
201223

202-
## License
224+
## 📄 License
203225

204226
Apache-2.0
227+
228+
---
229+
230+
### Connect with Us
231+
232+
<div align="center">
233+
234+
[![Twitter](https://img.shields.io/badge/Twitter-000000?style=for-the-badge&logo=x&logoColor=white)](https://x.com/PageIndexAI)&ensp;
235+
[![LinkedIn](https://img.shields.io/badge/LinkedIn-0077B5?style=for-the-badge&logo=linkedin&logoColor=white)](https://www.linkedin.com/company/vectify-ai/)&ensp;
236+
[![Discord](https://img.shields.io/badge/Discord-5865F2?style=for-the-badge&logo=discord&logoColor=white)](https://discord.com/invite/VuXuf29EUj)&ensp;
237+
[![Contact Us](https://img.shields.io/badge/Contact_Us-3B82F6?style=for-the-badge&logo=envelope&logoColor=white)](https://ii2abc2jejf.typeform.com/to/tK3AXl8T)
238+
239+
</div>
240+
241+
---
242+
243+
© 2026 [Vectify AI](https://vectify.ai)

assets/banner.png

830 KB
Loading

0 commit comments

Comments
 (0)