You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
description: Use Supabase Postgres (with pgvector) as a Document Store for Haystack
4
+
description: Use Supabase as a Document Store for Haystack — pgvector for embedding search, PGroonga for full-text BM25 search, and Supabase Storage for file downloads
5
5
authors:
6
6
- name: deepset
7
7
socials:
@@ -21,47 +21,60 @@ toc: true
21
21
-[Overview](#overview)
22
22
-[Installation](#installation)
23
23
-[Usage](#usage)
24
+
-[pgvector Components](#pgvector-components)
25
+
-[PGroonga Components](#pgroonga-components)
26
+
-[Supabase Storage](#supabase-storage)
24
27
-[License](#license)
25
28
26
29
## Overview
27
30
28
-
[Supabase](https://supabase.com/) is an open-source Postgres platform with the `pgvector` extension pre-installed. The `supabase-haystack` package lets you use a Supabase database as a [Document Store](https://docs.haystack.deepset.ai/docs/document-store) in a Haystack pipeline, with both dense embedding retrieval and keyword retrieval.
31
+
[Supabase](https://supabase.com/) is an open-source Postgres platform. The `supabase-haystack` package provides three sets of components for building Haystack pipelines:
29
32
30
-
It's a thin wrapper around [`pgvector-haystack`](https://haystack.deepset.ai/integrations/pgvector-documentstore), so it inherits all of its functionality: three vector similarity functions (`cosine_similarity`, `inner_product`, `l2_distance`), exact or HNSW search, metadata filtering, and keyword retrieval via PostgreSQL's `ts_rank_cd`. The two Supabase-specific defaults are that the connection string is read from `SUPABASE_DB_URL` and that `create_extension` is `False` (Supabase enables pgvector for you).
33
+
1.**pgvector** — dense embedding and keyword retrieval via the `pgvector` extension (pre-installed on Supabase).
34
+
2.**PGroonga** — full-text BM25 search via the `pgroonga` extension (no embeddings required).
35
+
3.**Supabase Storage** — download files from a Supabase Storage bucket into `ByteStream` objects ready for indexing.
36
+
37
+
The pgvector components are a thin wrapper around [`pgvector-haystack`](https://haystack.deepset.ai/integrations/pgvector-documentstore), inheriting all of its functionality: three vector similarity functions (`cosine_similarity`, `inner_product`, `l2_distance`), exact or HNSW search, metadata filtering, and keyword retrieval via PostgreSQL's `ts_rank_cd`. The two Supabase-specific defaults are that the connection string is read from `SUPABASE_DB_URL` and that `create_extension` is `False` (Supabase enables pgvector for you).
31
38
32
39
## Installation
33
40
34
41
```bash
35
42
pip install supabase-haystack
36
43
```
37
44
45
+
For the pgvector components, set the database connection string:
For local development, the [`docker-compose.yml`](https://github.com/deepset-ai/haystack-core-integrations/blob/main/integrations/supabase/docker-compose.yml) in the repo spins up a pgvector Postgres on `localhost:5432`.
43
58
44
59
## Usage
45
60
46
-
### Components
61
+
### pgvector Components
62
+
63
+
These components use Supabase Postgres with the `pgvector` extension for embedding-based and keyword retrieval.
47
64
48
65
-`SupabasePgvectorDocumentStore`: stores Haystack `Document` objects (content, embedding, metadata, optional blob) in a Postgres table, and handles writes, filtering, and both sync and async retrieval.
49
-
-`SupabasePgvectorEmbeddingRetriever`: dense Retriever that compares a query embedding against stored embeddings using the configured `vector_function`.
66
+
-`SupabasePgvectorEmbeddingRetriever`: dense Retriever that compares a query embedding against stored embeddings using the configured `vector_function` (`cosine_similarity`, `inner_product`, or `l2_distance`).
50
67
-`SupabasePgvectorKeywordRetriever`: keyword Retriever that scores documents with PostgreSQL's `ts_rank_cd`, considering term frequency, proximity, and section weight.
51
68
52
-
###Example
69
+
#### Indexing
53
70
54
71
```python
55
72
from haystack import Document, Pipeline
56
-
from haystack.components.embedders import (
57
-
SentenceTransformersDocumentEmbedder,
58
-
SentenceTransformersTextEmbedder,
59
-
)
73
+
from haystack.components.embedders import SentenceTransformersDocumentEmbedder
60
74
from haystack.components.writers import DocumentWriter
61
75
from haystack.document_stores.types import DuplicatePolicy
62
76
63
77
from haystack_integrations.document_stores.supabase import SupabasePgvectorDocumentStore
64
-
from haystack_integrations.components.retrievers.supabase import SupabasePgvectorEmbeddingRetriever
@@ -99,6 +121,88 @@ for doc in result["retriever"]["documents"]:
99
121
100
122
For keyword or hybrid (dense + keyword) retrieval, swap in or combine `SupabasePgvectorKeywordRetriever` — it takes a `query` string directly and can be joined with the embedding retriever via `DocumentJoiner` using reciprocal rank fusion.
101
123
124
+
### PGroonga Components
125
+
126
+
These components use the [PGroonga](https://pgroonga.github.io/) PostgreSQL extension for fast, multilingual full-text BM25 search. No embeddings are required — retrieval works on plain text queries.
127
+
128
+
**Prerequisites:** enable the PGroonga extension in your Supabase project:
129
+
130
+
```sql
131
+
CREATE EXTENSION IF NOT EXISTS pgroonga;
132
+
```
133
+
134
+
-`SupabaseGroongaDocumentStore`: stores `Document` objects in a Postgres table with a PGroonga index on the content column. Supports both sync and async operations. Authenticates via `SUPABASE_SERVICE_KEY` and a project URL rather than a raw connection string.
135
+
-`SupabaseGroongaBM25Retriever`: full-text Retriever backed by `SupabaseGroongaDocumentStore`. Accepts a plain text `query` and returns ranked documents using PGroonga BM25 scoring. Supports both `run()` (sync) and `run_async()` (async).
136
+
137
+
#### Indexing
138
+
139
+
```python
140
+
from haystack import Document
141
+
from haystack.document_stores.types import DuplicatePolicy
142
+
from haystack.utils import Secret
143
+
144
+
from haystack_integrations.document_stores.supabase import SupabaseGroongaDocumentStore
result = retriever.run(query="languages spoken around the world")
169
+
for doc in result["documents"]:
170
+
print(doc.score, "—", doc.content)
171
+
```
172
+
173
+
### Supabase Storage
174
+
175
+
-`SupabaseBucketDownloader`: downloads files from a Supabase Storage bucket and returns them as `ByteStream` objects. Each stream carries `meta["file_path"]` and `meta["bucket_name"]`. Supports optional extension filtering (e.g. `[".pdf", ".txt"]`). Designed to feed directly into document converters in indexing pipelines.
176
+
177
+
```python
178
+
from haystack import Pipeline
179
+
from haystack.components.converters import PyPDFToDocument
180
+
from haystack.components.writers import DocumentWriter
181
+
from haystack.utils import Secret
182
+
183
+
from haystack_integrations.components.downloaders.supabase import SupabaseBucketDownloader
184
+
from haystack_integrations.document_stores.supabase import SupabasePgvectorDocumentStore
0 commit comments