Skip to content

Commit c3e0226

Browse files
docs: adding missing components to Supabase documentation (#507)
* initial import * separating indexing and retrieval
1 parent 26e229b commit c3e0226

1 file changed

Lines changed: 116 additions & 12 deletions

File tree

integrations/supabase.md

Lines changed: 116 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
layout: integration
33
name: Supabase
4-
description: Use Supabase Postgres (with pgvector) as a Document Store for Haystack
4+
description: Use Supabase as a Document Store for Haystack — pgvector for embedding search, PGroonga for full-text BM25 search, and Supabase Storage for file downloads
55
authors:
66
- name: deepset
77
socials:
@@ -21,47 +21,60 @@ toc: true
2121
- [Overview](#overview)
2222
- [Installation](#installation)
2323
- [Usage](#usage)
24+
- [pgvector Components](#pgvector-components)
25+
- [PGroonga Components](#pgroonga-components)
26+
- [Supabase Storage](#supabase-storage)
2427
- [License](#license)
2528

2629
## Overview
2730

28-
[Supabase](https://supabase.com/) is an open-source Postgres platform with the `pgvector` extension pre-installed. The `supabase-haystack` package lets you use a Supabase database as a [Document Store](https://docs.haystack.deepset.ai/docs/document-store) in a Haystack pipeline, with both dense embedding retrieval and keyword retrieval.
31+
[Supabase](https://supabase.com/) is an open-source Postgres platform. The `supabase-haystack` package provides three sets of components for building Haystack pipelines:
2932

30-
It's a thin wrapper around [`pgvector-haystack`](https://haystack.deepset.ai/integrations/pgvector-documentstore), so it inherits all of its functionality: three vector similarity functions (`cosine_similarity`, `inner_product`, `l2_distance`), exact or HNSW search, metadata filtering, and keyword retrieval via PostgreSQL's `ts_rank_cd`. The two Supabase-specific defaults are that the connection string is read from `SUPABASE_DB_URL` and that `create_extension` is `False` (Supabase enables pgvector for you).
33+
1. **pgvector** — dense embedding and keyword retrieval via the `pgvector` extension (pre-installed on Supabase).
34+
2. **PGroonga** — full-text BM25 search via the `pgroonga` extension (no embeddings required).
35+
3. **Supabase Storage** — download files from a Supabase Storage bucket into `ByteStream` objects ready for indexing.
36+
37+
The pgvector components are a thin wrapper around [`pgvector-haystack`](https://haystack.deepset.ai/integrations/pgvector-documentstore), inheriting all of its functionality: three vector similarity functions (`cosine_similarity`, `inner_product`, `l2_distance`), exact or HNSW search, metadata filtering, and keyword retrieval via PostgreSQL's `ts_rank_cd`. The two Supabase-specific defaults are that the connection string is read from `SUPABASE_DB_URL` and that `create_extension` is `False` (Supabase enables pgvector for you).
3138

3239
## Installation
3340

3441
```bash
3542
pip install supabase-haystack
3643
```
3744

45+
For the pgvector components, set the database connection string:
46+
3847
```bash
3948
export SUPABASE_DB_URL="postgresql://postgres.[project-ref]:[password]@aws-0-[region].pooler.supabase.com:5432/postgres"
4049
```
4150

51+
For the PGroonga and Storage components, set the project URL and service role key:
52+
53+
```bash
54+
export SUPABASE_SERVICE_KEY="<your-service-role-key>"
55+
```
56+
4257
For local development, the [`docker-compose.yml`](https://github.com/deepset-ai/haystack-core-integrations/blob/main/integrations/supabase/docker-compose.yml) in the repo spins up a pgvector Postgres on `localhost:5432`.
4358

4459
## Usage
4560

46-
### Components
61+
### pgvector Components
62+
63+
These components use Supabase Postgres with the `pgvector` extension for embedding-based and keyword retrieval.
4764

4865
- `SupabasePgvectorDocumentStore`: stores Haystack `Document` objects (content, embedding, metadata, optional blob) in a Postgres table, and handles writes, filtering, and both sync and async retrieval.
49-
- `SupabasePgvectorEmbeddingRetriever`: dense Retriever that compares a query embedding against stored embeddings using the configured `vector_function`.
66+
- `SupabasePgvectorEmbeddingRetriever`: dense Retriever that compares a query embedding against stored embeddings using the configured `vector_function` (`cosine_similarity`, `inner_product`, or `l2_distance`).
5067
- `SupabasePgvectorKeywordRetriever`: keyword Retriever that scores documents with PostgreSQL's `ts_rank_cd`, considering term frequency, proximity, and section weight.
5168

52-
### Example
69+
#### Indexing
5370

5471
```python
5572
from haystack import Document, Pipeline
56-
from haystack.components.embedders import (
57-
SentenceTransformersDocumentEmbedder,
58-
SentenceTransformersTextEmbedder,
59-
)
73+
from haystack.components.embedders import SentenceTransformersDocumentEmbedder
6074
from haystack.components.writers import DocumentWriter
6175
from haystack.document_stores.types import DuplicatePolicy
6276

6377
from haystack_integrations.document_stores.supabase import SupabasePgvectorDocumentStore
64-
from haystack_integrations.components.retrievers.supabase import SupabasePgvectorEmbeddingRetriever
6578

6679
document_store = SupabasePgvectorDocumentStore(
6780
table_name="haystack_documents",
@@ -84,6 +97,15 @@ indexing.add_component("writer", DocumentWriter(
8497
document_store=document_store, policy=DuplicatePolicy.OVERWRITE))
8598
indexing.connect("embedder", "writer")
8699
indexing.run({"embedder": {"documents": documents}})
100+
```
101+
102+
#### Retrieval
103+
104+
```python
105+
from haystack import Pipeline
106+
from haystack.components.embedders import SentenceTransformersTextEmbedder
107+
108+
from haystack_integrations.components.retrievers.supabase import SupabasePgvectorEmbeddingRetriever
87109

88110
querying = Pipeline()
89111
querying.add_component("text_embedder", SentenceTransformersTextEmbedder(
@@ -99,6 +121,88 @@ for doc in result["retriever"]["documents"]:
99121

100122
For keyword or hybrid (dense + keyword) retrieval, swap in or combine `SupabasePgvectorKeywordRetriever` — it takes a `query` string directly and can be joined with the embedding retriever via `DocumentJoiner` using reciprocal rank fusion.
101123

124+
### PGroonga Components
125+
126+
These components use the [PGroonga](https://pgroonga.github.io/) PostgreSQL extension for fast, multilingual full-text BM25 search. No embeddings are required — retrieval works on plain text queries.
127+
128+
**Prerequisites:** enable the PGroonga extension in your Supabase project:
129+
130+
```sql
131+
CREATE EXTENSION IF NOT EXISTS pgroonga;
132+
```
133+
134+
- `SupabaseGroongaDocumentStore`: stores `Document` objects in a Postgres table with a PGroonga index on the content column. Supports both sync and async operations. Authenticates via `SUPABASE_SERVICE_KEY` and a project URL rather than a raw connection string.
135+
- `SupabaseGroongaBM25Retriever`: full-text Retriever backed by `SupabaseGroongaDocumentStore`. Accepts a plain text `query` and returns ranked documents using PGroonga BM25 scoring. Supports both `run()` (sync) and `run_async()` (async).
136+
137+
#### Indexing
138+
139+
```python
140+
from haystack import Document
141+
from haystack.document_stores.types import DuplicatePolicy
142+
from haystack.utils import Secret
143+
144+
from haystack_integrations.document_stores.supabase import SupabaseGroongaDocumentStore
145+
146+
document_store = SupabaseGroongaDocumentStore(
147+
supabase_url="https://<project-ref>.supabase.co",
148+
supabase_key=Secret.from_env_var("SUPABASE_SERVICE_KEY"),
149+
table_name="haystack_fts_documents",
150+
recreate_table=True,
151+
)
152+
document_store.warm_up()
153+
154+
documents = [
155+
Document(content="There are over 7,000 languages spoken around the world today."),
156+
Document(content="Elephants have been observed to recognize themselves in mirrors."),
157+
Document(content="Bioluminescent waves can be seen in the Maldives and Puerto Rico."),
158+
]
159+
document_store.write_documents(documents, policy=DuplicatePolicy.OVERWRITE)
160+
```
161+
162+
#### Retrieval
163+
164+
```python
165+
from haystack_integrations.components.retrievers.supabase import SupabaseGroongaBM25Retriever
166+
167+
retriever = SupabaseGroongaBM25Retriever(document_store=document_store, top_k=3)
168+
result = retriever.run(query="languages spoken around the world")
169+
for doc in result["documents"]:
170+
print(doc.score, "", doc.content)
171+
```
172+
173+
### Supabase Storage
174+
175+
- `SupabaseBucketDownloader`: downloads files from a Supabase Storage bucket and returns them as `ByteStream` objects. Each stream carries `meta["file_path"]` and `meta["bucket_name"]`. Supports optional extension filtering (e.g. `[".pdf", ".txt"]`). Designed to feed directly into document converters in indexing pipelines.
176+
177+
```python
178+
from haystack import Pipeline
179+
from haystack.components.converters import PyPDFToDocument
180+
from haystack.components.writers import DocumentWriter
181+
from haystack.utils import Secret
182+
183+
from haystack_integrations.components.downloaders.supabase import SupabaseBucketDownloader
184+
from haystack_integrations.document_stores.supabase import SupabasePgvectorDocumentStore
185+
186+
document_store = SupabasePgvectorDocumentStore(
187+
table_name="haystack_documents",
188+
embedding_dimension=384,
189+
)
190+
191+
indexing = Pipeline()
192+
indexing.add_component("downloader", SupabaseBucketDownloader(
193+
supabase_url="https://<project-ref>.supabase.co",
194+
supabase_key=Secret.from_env_var("SUPABASE_SERVICE_KEY"),
195+
bucket_name="my-documents",
196+
file_extensions=[".pdf"],
197+
))
198+
indexing.add_component("converter", PyPDFToDocument())
199+
indexing.add_component("writer", DocumentWriter(document_store=document_store))
200+
indexing.connect("downloader.streams", "converter.sources")
201+
indexing.connect("converter.documents", "writer.documents")
202+
203+
indexing.run({"downloader": {"sources": ["reports/q1.pdf", "reports/q2.pdf"]}})
204+
```
205+
102206
## License
103207

104-
`supabase-haystack` is distributed under the terms of the [Apache-2.0](https://spdx.org/licenses/Apache-2.0.html) license.
208+
`supabase-haystack` is distributed under the terms of the [Apache-2.0](https://spdx.org/licenses/Apache-2.0.html) license.

0 commit comments

Comments
 (0)