Skip to content

Commit 10f5a76

Browse files
lvcabilgeyucel
andauthored
Add ArcadeDB document store integration (#411)
* Add ArcadeDB document store integration ArcadeDB is a multi-model database with native HNSW vector search, document storage, and SQL metadata filtering in a single backend. * Update integrations/arcadedb.md Co-authored-by: Bilge Yücel <bilge.yucel@deepset.ai> --------- Co-authored-by: Bilge Yücel <bilge.yucel@deepset.ai>
1 parent 1eb3233 commit 10f5a76

2 files changed

Lines changed: 178 additions & 0 deletions

File tree

integrations/arcadedb.md

Lines changed: 178 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,178 @@
1+
---
2+
layout: integration
3+
name: ArcadeDB
4+
description: Use ArcadeDB as a document store with native HNSW vector search for Haystack
5+
authors:
6+
- name: ArcadeData Ltd
7+
socials:
8+
github: ArcadeData
9+
twitter: arcade_db
10+
pypi: https://pypi.org/project/arcadedb-haystack/
11+
repo: https://github.com/ArcadeData/arcadedb-haystack
12+
type: Document Store
13+
report_issue: https://github.com/ArcadeData/arcadedb-haystack/issues
14+
logo: /logos/arcadedb.png
15+
version: Haystack 2.0
16+
toc: true
17+
---
18+
19+
### Table of Contents
20+
21+
- [Overview](#overview)
22+
- [Installation](#installation)
23+
- [Usage](#usage)
24+
- [License](#license)
25+
26+
## Overview
27+
28+
An integration of [ArcadeDB](https://arcadedb.com) with [Haystack](https://docs.haystack.deepset.ai/docs/intro) by [ArcadeData](https://arcadedata.com).
29+
30+
Most RAG setups need separate backends for documents, vectors, and metadata search. ArcadeDB replaces all three in a single multi-model database:
31+
32+
- **Document storage** — vertex-based records with flexible MAP metadata
33+
- **HNSW vector search** — native approximate nearest neighbor index via `vectorNeighbors()` (cosine, euclidean, dot product)
34+
- **SQL filtering** — full SQL WHERE clauses on metadata fields
35+
- **No special drivers** — pure HTTP/JSON API, no binary protocol or custom driver required
36+
37+
The library provides an `ArcadeDBDocumentStore` that implements the Haystack [DocumentStore protocol](https://docs.haystack.deepset.ai/docs/document-store#documentstore-protocol), plus pipeline-ready retriever components:
38+
39+
- **ArcadeDBDocumentStore** — stores Documents as ArcadeDB vertices with embeddings indexed by a dedicated HNSW Vector Index for dense retrieval.
40+
- **ArcadeDBEmbeddingRetriever** — a [retriever component](https://docs.haystack.deepset.ai/docs/retrievers) that queries the vector index to find related Documents, with support for metadata filtering and runtime parameter overrides.
41+
42+
```text
43+
+-----------------------------+
44+
| ArcadeDB Database |
45+
+-----------------------------+
46+
| |
47+
| +----------------+ |
48+
| | Document | |
49+
write_documents | +----------------+ |
50+
+------------------------+----->| properties | |
51+
| | | | |
52+
+---------+----------+ | | embedding | |
53+
| | | +--------+-------+ |
54+
| ArcadeDBDocument | | | |
55+
| Store | | |index/query |
56+
+---------+----------+ | | |
57+
| | +---------+---------+ |
58+
| | | HNSW Vector Index | |
59+
+----------------------->| | | |
60+
_embedding_retrieval | | (for embedding) | |
61+
| +-------------------+ |
62+
| |
63+
+-----------------------------+
64+
```
65+
66+
In the above diagram:
67+
68+
- `Document` is an ArcadeDB vertex type
69+
- `properties` are Document [attributes](https://docs.haystack.deepset.ai/docs/data-classes#document) stored as vertex properties
70+
- `embedding` is a vector property of type `LIST[FLOAT]`, indexed by ArcadeDB's native HNSW index
71+
- `HNSW Vector Index` provides approximate nearest neighbor search via `vectorNeighbors()`
72+
73+
## Installation
74+
75+
`arcadedb-haystack` can be installed using pip:
76+
77+
```bash
78+
pip install arcadedb-haystack
79+
```
80+
81+
## Usage
82+
83+
Once installed, you can start using `ArcadeDBDocumentStore` as any other document store that supports embeddings.
84+
85+
```python
86+
from haystack_integrations.document_stores.arcadedb import ArcadeDBDocumentStore
87+
88+
document_store = ArcadeDBDocumentStore(
89+
url="http://localhost:2480",
90+
database="haystack",
91+
embedding_dimension=384,
92+
similarity_function="cosine",
93+
)
94+
```
95+
96+
You will need a running ArcadeDB instance. The simplest way is with Docker:
97+
98+
```bash
99+
docker run -d -p 2480:2480 \
100+
-e JAVA_OPTS="-Darcadedb.server.rootPassword=arcadedb" \
101+
arcadedata/arcadedb:latest
102+
```
103+
104+
Set credentials via environment variables:
105+
106+
```bash
107+
export ARCADEDB_USERNAME=root
108+
export ARCADEDB_PASSWORD=arcadedb
109+
```
110+
111+
### Writing documents
112+
113+
```python
114+
from haystack import Document
115+
from haystack.document_stores.types import DuplicatePolicy
116+
117+
documents = [
118+
Document(
119+
content="ArcadeDB supports graphs, documents, and vectors.",
120+
meta={"source": "docs", "category": "database"},
121+
)
122+
]
123+
document_store.write_documents(documents, policy=DuplicatePolicy.OVERWRITE)
124+
```
125+
126+
### Retrieving documents
127+
128+
`ArcadeDBEmbeddingRetriever` can be used in a pipeline to retrieve documents by querying the HNSW vector index with an embedded query, including metadata filtering:
129+
130+
```python
131+
from haystack import Document, Pipeline
132+
from haystack.components.embedders import SentenceTransformersTextEmbedder, SentenceTransformersDocumentEmbedder
133+
from haystack_integrations.components.retrievers.arcadedb import ArcadeDBEmbeddingRetriever
134+
from haystack_integrations.document_stores.arcadedb import ArcadeDBDocumentStore
135+
136+
document_store = ArcadeDBDocumentStore(
137+
url="http://localhost:2480",
138+
database="haystack",
139+
embedding_dimension=384,
140+
)
141+
142+
# Index documents with embeddings
143+
documents = [
144+
Document(content="My name is Morgan and I live in Paris.", meta={"release_date": "2018-12-09"})
145+
]
146+
147+
document_embedder = SentenceTransformersDocumentEmbedder(model="sentence-transformers/all-MiniLM-L6-v2")
148+
documents_with_embeddings = document_embedder.run(documents)
149+
document_store.write_documents(documents_with_embeddings.get("documents"))
150+
151+
# Build retrieval pipeline
152+
pipeline = Pipeline()
153+
pipeline.add_component("text_embedder", SentenceTransformersTextEmbedder(model="sentence-transformers/all-MiniLM-L6-v2"))
154+
pipeline.add_component("retriever", ArcadeDBEmbeddingRetriever(document_store=document_store))
155+
pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
156+
157+
result = pipeline.run(
158+
data={
159+
"text_embedder": {"text": "What cities do people live in?"},
160+
"retriever": {
161+
"top_k": 5,
162+
"filters": {"field": "release_date", "operator": "==", "value": "2018-12-09"},
163+
},
164+
}
165+
)
166+
167+
documents = result["retriever"]["documents"]
168+
```
169+
170+
### More examples
171+
172+
You can find more examples in the [repository](https://github.com/ArcadeData/arcadedb-haystack/tree/main/examples):
173+
174+
- [embedding_retrieval.py](https://github.com/ArcadeData/arcadedb-haystack/blob/main/examples/embedding_retrieval.py) — Full workflow demonstrating document indexing and vector similarity retrieval with ArcadeDB.
175+
176+
## License
177+
178+
`arcadedb-haystack` is distributed under the terms of the [Apache 2.0](https://spdx.org/licenses/Apache-2.0.html) license.

logos/arcadedb.png

24.1 KB
Loading

0 commit comments

Comments
 (0)