Skip to content

Commit e6c9e0f

Browse files
Merge branch 'main' into v0.3
2 parents 979b494 + efb9c1f commit e6c9e0f

1 file changed

Lines changed: 58 additions & 36 deletions

File tree

README.md

Lines changed: 58 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -1,32 +1,52 @@
11
<div align="center">
22

3-
# ConDB
3+
<img src="https://docs.pageindex.ai/images/condb.png" alt="ConDB Banner" />
44

5-
<p align="center"><b>Context Database for Hierarchical Document Trees</b></p>
5+
<br/>
66

7-
<p align="center">
8-
Store, navigate, and query hierarchical document structures with LLM-powered reasoning retrieval.
9-
</p>
7+
# ConDB: The KV-Cache Native Context Database
8+
9+
<p align="center"><i>A new context database for reasoning-driven retrieval via tree search.<br/>
10+
Fast, context-aware retrieval at scale with up to 70% less token cost.</i></p>
1011

1112
</div>
1213

1314
---
1415

15-
## What is ConDB?
16+
## 🌲 What is ConDB?
17+
18+
**ConDB** (Context Database) is a tree-structured context database that uses LLM-powered **reasoning-based retrieval** via tree search instead of vector similarity — no vector DB, no chunking. It accepts [PageIndex](https://github.com/VectifyAI/PageIndex)-compatible document trees, [ChatIndex](https://github.com/VectifyAI/ChatIndex) conversation trees, filesystem trees, and custom hierarchical JSON — with no runtime dependency on either. The LLM reasons over the tree, like a human expert using a table of contents, to locate relevant content.
19+
20+
### Why not vector search?
21+
22+
- **Similarity ≠ relevance** — vector search retrieves what looks similar, not what is truly relevant. Similar-looking chunks may differ in intent (low accuracy), while truly relevant information may be expressed in very different language and get missed entirely (low recall). True relevance requires reasoning
23+
- **Chunking breaks semantic continuity** — documents must be split into fixed-size segments to fit embedding models, causing context fragmentation that destroys their natural structure and cross-section relationships
24+
- **Retrieval is blind to context** — embedding models encode the query alone, ignoring conversational history, user intent, and other contextual signals
25+
26+
ConDB replaces this with **reasoning-based tree search**: the LLM performs node-level relevance classification over a hierarchical index, incorporating full context — making retrieval adaptive, explainable, and traceable.
1627

17-
**ConDB** stores hierarchical document trees in a SQLite database and provides LLM-powered **reasoning-based retrieval** to query them — no vector DB, no chunking. It accepts pageindex-compatible trees, chat trees, and custom hierarchical JSON without taking a runtime code dependency on PageIndex itself.
28+
### What makes ConDB different
1829

19-
**Key capabilities:**
30+
- **Fast tree search at scale** — reasoning-driven tree search with block partitioning and parallel processing, supporting complex, context-aware retrieval over large hierarchical structures
31+
- **KV-cache native** — the first database designed around LLM KV-cache reuse. By caching intermediate results during tree search, ConDB reduces token usage by up to 70% with no loss in accuracy. The same efficiency gains extend to memory systems for long-context reasoning at scale
32+
- **Unified long-context infrastructure** — a single system for both static and dynamic long-context workloads
2033

21-
- **Hierarchical storage** — store document trees, chat trees, and custom hierarchical JSON in SQLite
22-
- **Reasoning-based retrieval** — LLM navigates the tree to find relevant content, like a human expert
34+
### Static long context
35+
Structured, persistent knowledge — documents (via [PageIndex](https://github.com/VectifyAI/PageIndex)), file systems, and codebases. Scalable retrieval within large, organized hierarchies.
36+
37+
### Dynamic long context
38+
Evolving, runtime context — agent memory, long conversations (via [ChatIndex](https://github.com/VectifyAI/ChatIndex)), and autoresearch. Systems can continuously update, retrieve, and reason over newly generated information.
39+
40+
### Key capabilities
41+
42+
- **Hierarchical storage** — document trees, chat trees, and custom hierarchical JSON in SQLite
2343
- **Multiple retrieval strategies** — beam search for small trees, block retrieval for large documents
24-
- **Multi-provider LLM support**works with Anthropic (Claude) and OpenAI (GPT) out of the box
44+
- **Multi-provider LLM support** — Anthropic (Claude) and OpenAI (GPT) out of the box
2545
- **Extensible** — plug in custom storage backends, LLM providers, or retrieval strategies
2646

2747
---
2848

29-
## Quick Start
49+
## 🚀 Getting Started
3050

3151
### Install
3252

@@ -71,9 +91,7 @@ tree_id = ct.index_markdown_file("doc.md", tree_builder=build_markdown_tree)
7191
ct.close()
7292
```
7393

74-
---
75-
76-
## Configuration
94+
### Configuration
7795

7896
Create a `.env` file with your API keys:
7997

@@ -104,14 +122,14 @@ LLM_MODEL=claude-opus-4-6 python your_script.py
104122

105123
---
106124

107-
## Retrieval Strategies
125+
## 🔍 Retrieval Strategies
108126

109127
ConDB automatically selects the best retrieval strategy based on tree size:
110128

111129
| Strategy | Best for | How it works |
112130
|----------|----------|--------------|
113-
| **Beam** | Small trees (< 50 nodes) | LLM evaluates and selects promising branches at each depth level |
114-
| **Block** | Large documents (50+ nodes) | Splits tree into token-bounded blocks, LLM reasons over each block |
131+
| **Beam** | Small trees <br/> (< 50 nodes) | LLM evaluates and selects promising branches at each depth level |
132+
| **Block** | Large documents <br/> (50+ nodes) | Splits tree into token-bounded blocks, LLM reasons over each block. KV-cache native — caches intermediate block results to cut token usage by up to 70% |
115133

116134
You can also specify a strategy explicitly:
117135

@@ -121,7 +139,7 @@ result = db.query(tree_id, "question", strategy="block", beam_size=3)
121139

122140
---
123141

124-
## Benchmark
142+
## 📈 Benchmark Snapshot
125143

126144
Two benchmarks live under `bench/`.
127145

@@ -171,7 +189,9 @@ any `--doc` and any `--config` to benchmark a different document.
171189

172190
---
173191

174-
## Architecture
192+
## 🧩 Learn More
193+
194+
### Architecture
175195

176196
```
177197
contextdb/
@@ -190,12 +210,9 @@ contextdb/
190210
└── prompts/ # Jinja2 prompt templates
191211
```
192212

193-
---
213+
### Extending
194214

195-
## Extending
196-
197-
<details>
198-
<summary><b>Custom Storage Backend</b></summary>
215+
**Custom Storage Backend**
199216

200217
```python
201218
from contextdb import StorageProtocol
@@ -207,10 +224,8 @@ class MyStorage:
207224

208225
ct = ContextTree(storage=MyStorage())
209226
```
210-
</details>
211227

212-
<details>
213-
<summary><b>Custom LLM Provider</b></summary>
228+
**Custom LLM Provider**
214229

215230
```python
216231
from contextdb import LLMProtocol
@@ -221,25 +236,32 @@ class MyLLM:
221236

222237
ct = ContextTree("db.sqlite", llm=MyLLM())
223238
```
224-
</details>
225239

226-
---
227-
228-
## Testing
240+
### Testing
229241

230242
```bash
231243
./run_tests.sh all
232244
```
233245

234246
---
235247

236-
## Related Projects
248+
## 💬 Community
249+
250+
### Related Projects
237251

238-
- [**PageIndex**](https://github.com/VectifyAI/PageIndex) — one possible external producer of pageindex-compatible document trees
252+
- [**PageIndex**](https://github.com/VectifyAI/PageIndex) — vectorless, reasoning-based RAG that builds hierarchical tree indexes from long documents
253+
- [**ChatIndex**](https://github.com/VectifyAI/ChatIndex) — tree indexing for long conversations, enabling reasoning-based retrieval over chat histories
239254
- [**AgentFS**](https://github.com/anthropics/agentfs) — filesystem for AI agents
240255

256+
### Connect with Us
257+
258+
[![Twitter](https://img.shields.io/badge/Twitter-000000?style=for-the-badge&logo=x&logoColor=white)](https://x.com/PageIndexAI)&ensp;
259+
[![LinkedIn](https://img.shields.io/badge/LinkedIn-0077B5?style=for-the-badge&logo=linkedin&logoColor=white)](https://www.linkedin.com/company/vectify-ai/)&ensp;
260+
[![Discord](https://img.shields.io/badge/Discord-5865F2?style=for-the-badge&logo=discord&logoColor=white)](https://discord.com/invite/VuXuf29EUj)&ensp;
261+
[![Contact Us](https://img.shields.io/badge/Contact_Us-3B82F6?style=for-the-badge&logo=envelope&logoColor=white)](https://ii2abc2jejf.typeform.com/to/tK3AXl8T)
262+
241263
---
242264

243-
## License
265+
Licensed under [Apache 2.0](LICENSE).
244266

245-
Apache-2.0
267+
© 2026 [Vectify AI](https://vectify.ai)

0 commit comments

Comments
 (0)