Skip to content

Commit ed2de03

Browse files
committed
Merge branch 'v0.3' of github.com:VectifyAI/ConDB into v0.3
2 parents dad82a8 + e6c9e0f commit ed2de03

1 file changed

Lines changed: 58 additions & 36 deletions

File tree

README.md

Lines changed: 58 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -1,32 +1,52 @@
11
<div align="center">
22

3-
# ConDB
3+
<img src="https://docs.pageindex.ai/images/condb.png" alt="ConDB Banner" />
44

5-
<p align="center"><b>Context Database for Hierarchical Document Trees</b></p>
5+
<br/>
66

7-
<p align="center">
8-
Store, navigate, and query hierarchical document structures with LLM-powered reasoning retrieval.
9-
</p>
7+
# ConDB: The KV-Cache Native Context Database
8+
9+
<p align="center"><i>A new context database for reasoning-driven retrieval via tree search.<br/>
10+
Fast, context-aware retrieval at scale with up to 70% less token cost.</i></p>
1011

1112
</div>
1213

1314
---
1415

15-
## What is ConDB?
16+
## 🌲 What is ConDB?
17+
18+
**ConDB** (Context Database) is a tree-structured context database that uses LLM-powered **reasoning-based retrieval** via tree search instead of vector similarity — no vector DB, no chunking. It accepts [PageIndex](https://github.com/VectifyAI/PageIndex)-compatible document trees, [ChatIndex](https://github.com/VectifyAI/ChatIndex) conversation trees, filesystem trees, and custom hierarchical JSON — with no runtime dependency on either. The LLM reasons over the tree, like a human expert using a table of contents, to locate relevant content.
19+
20+
### Why not vector search?
21+
22+
- **Similarity ≠ relevance** — vector search retrieves what looks similar, not what is truly relevant. Similar-looking chunks may differ in intent (low accuracy), while truly relevant information may be expressed in very different language and get missed entirely (low recall). True relevance requires reasoning
23+
- **Chunking breaks semantic continuity** — documents must be split into fixed-size segments to fit embedding models, causing context fragmentation that destroys their natural structure and cross-section relationships
24+
- **Retrieval is blind to context** — embedding models encode the query alone, ignoring conversational history, user intent, and other contextual signals
25+
26+
ConDB replaces this with **reasoning-based tree search**: the LLM performs node-level relevance classification over a hierarchical index, incorporating full context — making retrieval adaptive, explainable, and traceable.
1627

17-
**ConDB** stores hierarchical document trees in a SQLite database and provides LLM-powered **reasoning-based retrieval** to query them — no vector DB, no chunking. It accepts pageindex-compatible trees, chat trees, and custom hierarchical JSON without taking a runtime code dependency on PageIndex itself.
28+
### What makes ConDB different
1829

19-
**Key capabilities:**
30+
- **Fast tree search at scale** — reasoning-driven tree search with block partitioning and parallel processing, supporting complex, context-aware retrieval over large hierarchical structures
31+
- **KV-cache native** — the first database designed around LLM KV-cache reuse. By caching intermediate results during tree search, ConDB reduces token usage by up to 70% with no loss in accuracy. The same efficiency gains extend to memory systems for long-context reasoning at scale
32+
- **Unified long-context infrastructure** — a single system for both static and dynamic long-context workloads
2033

21-
- **Hierarchical storage** — store document trees, chat trees, and custom hierarchical JSON in SQLite
22-
- **Reasoning-based retrieval** — LLM navigates the tree to find relevant content, like a human expert
34+
### Static long context
35+
Structured, persistent knowledge — documents (via [PageIndex](https://github.com/VectifyAI/PageIndex)), file systems, and codebases. Scalable retrieval within large, organized hierarchies.
36+
37+
### Dynamic long context
38+
Evolving, runtime context — agent memory, long conversations (via [ChatIndex](https://github.com/VectifyAI/ChatIndex)), and autoresearch. Systems can continuously update, retrieve, and reason over newly generated information.
39+
40+
### Key capabilities
41+
42+
- **Hierarchical storage** — document trees, chat trees, and custom hierarchical JSON in SQLite
2343
- **Multiple retrieval strategies** — beam search for small trees, block retrieval for large documents
24-
- **Multi-provider LLM support**works with Anthropic (Claude) and OpenAI (GPT) out of the box
44+
- **Multi-provider LLM support** — Anthropic (Claude) and OpenAI (GPT) out of the box
2545
- **Extensible** — plug in custom storage backends, LLM providers, or retrieval strategies
2646

2747
---
2848

29-
## Quick Start
49+
## 🚀 Getting Started
3050

3151
### Install
3252

@@ -71,9 +91,7 @@ tree_id = ct.index_markdown_file("doc.md", tree_builder=build_markdown_tree)
7191
ct.close()
7292
```
7393

74-
---
75-
76-
## Configuration
94+
### Configuration
7795

7896
Create a `.env` file with your API keys:
7997

@@ -104,14 +122,14 @@ LLM_MODEL=claude-opus-4-6 python your_script.py
104122

105123
---
106124

107-
## Retrieval Strategies
125+
## 🔍 Retrieval Strategies
108126

109127
ConDB automatically selects the best retrieval strategy based on tree size:
110128

111129
| Strategy | Best for | How it works |
112130
|----------|----------|--------------|
113-
| **Beam** | Small trees (< 50 nodes) | LLM evaluates and selects promising branches at each depth level |
114-
| **Block** | Large documents (50+ nodes) | Splits tree into token-bounded blocks, LLM reasons over each block |
131+
| **Beam** | Small trees <br/> (< 50 nodes) | LLM evaluates and selects promising branches at each depth level |
132+
| **Block** | Large documents <br/> (50+ nodes) | Splits tree into token-bounded blocks, LLM reasons over each block. KV-cache native — caches intermediate block results to cut token usage by up to 70% |
115133

116134
You can also specify a strategy explicitly:
117135

@@ -121,7 +139,7 @@ result = db.query(tree_id, "question", strategy="block", beam_size=3)
121139

122140
---
123141

124-
## Benchmark
142+
## 📈 Benchmark Snapshot
125143

126144
Two benchmarks live under `bench/`.
127145

@@ -183,7 +201,9 @@ any `--doc` and any `--config` to benchmark a different document.
183201

184202
---
185203

186-
## Architecture
204+
## 🧩 Learn More
205+
206+
### Architecture
187207

188208
```
189209
contextdb/
@@ -202,12 +222,9 @@ contextdb/
202222
└── prompts/ # Jinja2 prompt templates
203223
```
204224

205-
---
225+
### Extending
206226

207-
## Extending
208-
209-
<details>
210-
<summary><b>Custom Storage Backend</b></summary>
227+
**Custom Storage Backend**
211228

212229
```python
213230
from contextdb import StorageProtocol
@@ -219,10 +236,8 @@ class MyStorage:
219236

220237
ct = ContextTree(storage=MyStorage())
221238
```
222-
</details>
223239

224-
<details>
225-
<summary><b>Custom LLM Provider</b></summary>
240+
**Custom LLM Provider**
226241

227242
```python
228243
from contextdb import LLMProtocol
@@ -233,25 +248,32 @@ class MyLLM:
233248

234249
ct = ContextTree("db.sqlite", llm=MyLLM())
235250
```
236-
</details>
237251

238-
---
239-
240-
## Testing
252+
### Testing
241253

242254
```bash
243255
./run_tests.sh all
244256
```
245257

246258
---
247259

248-
## Related Projects
260+
## 💬 Community
261+
262+
### Related Projects
249263

250-
- [**PageIndex**](https://github.com/VectifyAI/PageIndex) — one possible external producer of pageindex-compatible document trees
264+
- [**PageIndex**](https://github.com/VectifyAI/PageIndex) — vectorless, reasoning-based RAG that builds hierarchical tree indexes from long documents
265+
- [**ChatIndex**](https://github.com/VectifyAI/ChatIndex) — tree indexing for long conversations, enabling reasoning-based retrieval over chat histories
251266
- [**AgentFS**](https://github.com/anthropics/agentfs) — filesystem for AI agents
252267

268+
### Connect with Us
269+
270+
[![Twitter](https://img.shields.io/badge/Twitter-000000?style=for-the-badge&logo=x&logoColor=white)](https://x.com/PageIndexAI)&ensp;
271+
[![LinkedIn](https://img.shields.io/badge/LinkedIn-0077B5?style=for-the-badge&logo=linkedin&logoColor=white)](https://www.linkedin.com/company/vectify-ai/)&ensp;
272+
[![Discord](https://img.shields.io/badge/Discord-5865F2?style=for-the-badge&logo=discord&logoColor=white)](https://discord.com/invite/VuXuf29EUj)&ensp;
273+
[![Contact Us](https://img.shields.io/badge/Contact_Us-3B82F6?style=for-the-badge&logo=envelope&logoColor=white)](https://ii2abc2jejf.typeform.com/to/tK3AXl8T)
274+
253275
---
254276

255-
## License
277+
Licensed under [Apache 2.0](LICENSE).
256278

257-
Apache-2.0
279+
© 2026 [Vectify AI](https://vectify.ai)

0 commit comments

Comments
 (0)