Skip to content

Commit ee2dedd

Browse files
authored
Add Superlinked integration (#441)
* Add Superlinked integration Adds an integration page for Superlinked (SIE), a self-hosted inference engine for embeddings, reranking, and extraction. The sie-haystack package provides Haystack 2.0 components for 85+ embedding models (dense, sparse, multivector/ColBERT, multimodal), cross-encoder and late-interaction reranking, and zero-shot entity/relation/classification extraction. * docs(superlinked): address review feedback - Use the new haystack_integrations.components.{embedders,rankers,extractors}.sie namespace throughout (released in sie-haystack 0.2.0), matching the Haystack convention kacperlukawski pointed out. - Replace bulleted link list with inline Python examples for every component (dense, sparse, multivector, image, reranker, extractor) so the integration page is self-contained and reviewers do not need the package source to see how components plug into pipelines. - Add a complete end-to-end RAG pipeline example chaining SIETextEmbedder, InMemoryEmbeddingRetriever, and SIERanker. - Tighten the description and overview to state explicitly that one SIE endpoint backs all components. * docs(superlinked): fix broken model catalog URL and remove api_key claim - The components in sie-haystack (SIETextEmbedder, SIERanker, SIEExtractor, ...) do not currently accept an api_key init kwarg, so claiming it here was inaccurate. Reworded the Installation and End-to-End sections to only reference base_url, which is what the components actually take. - /docs/reference/models/ returns 404 on superlinked.com; the live model catalog is at /models. Updated both references. - Removed trailing slashes on superlinked.com docs URLs to avoid the 308 hop that drops hash fragments (notably on the #extraction anchor link). * docs(superlinked): align with sie-web haystack guide Cross-referenced every example and reference against the canonical source at https://superlinked.com/docs/integrations/haystack (sie-web) so users who click through from here see the same models, conventions, and APIs. - Docker tag: sie-server:latest -> sie-server:default (matches sie-web). Added the --gpus all variant as an inline comment so GPU users do not need to leave the page. - Reranker model in both the standalone Reranking example and the End-to-End RAG pipeline: BAAI/bge-reranker-v2-m3 -> jinaai/jina-reranker-v2-base-multilingual (matches the sie-web default and the SIERanker init default). - SIERanker score access: SIERanker stores its reranker score in doc.meta["score"], not doc.score. Updated both examples to use doc.meta.get("score", 0), with a one-line note calling this out so readers are not surprised. sie-web uses the same pattern. - Relation extraction example labels and text aligned with sie-web (labels=["works_for", "ceo_of", "founded"] on the same sentence). - Added a short Text Classification example using knowledgator/gliclass-base-v1.0 (matches sie-web) so all four SIEExtractor output types (entities, relations, classifications, objects) have concrete coverage on the page. - Object detection kept as a prose pointer with a link to the full guide. Verified: all model IDs exist in the SIE catalog (packages/sie_server/models/), all URLs return 200 without redirects, no em dashes.
1 parent 2e13465 commit ee2dedd

2 files changed

Lines changed: 269 additions & 0 deletions

File tree

integrations/superlinked.md

Lines changed: 269 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,269 @@
1+
---
2+
layout: integration
3+
name: Superlinked
4+
description: Use Superlinked (SIE) embeddings, reranking, and extraction in Haystack pipelines.
5+
authors:
6+
- name: Superlinked
7+
socials:
8+
github: superlinked
9+
linkedin: superlinked
10+
pypi: https://pypi.org/project/sie-haystack/
11+
repo: https://github.com/superlinked/sie/tree/main/integrations/sie_haystack
12+
type: Model Provider
13+
report_issue: https://github.com/superlinked/sie/issues
14+
logo: /logos/superlinked.png
15+
version: Haystack 2.0
16+
toc: true
17+
---
18+
19+
### **Table of Contents**
20+
21+
- [Overview](#overview)
22+
- [Installation](#installation)
23+
- [Usage](#usage)
24+
- [Dense Embeddings](#dense-embeddings)
25+
- [Sparse Embeddings](#sparse-embeddings)
26+
- [Multivector (ColBERT) Embeddings](#multivector-colbert-embeddings)
27+
- [Image Embeddings](#image-embeddings)
28+
- [Reranking](#reranking)
29+
- [Extraction](#extraction)
30+
- [End-to-End RAG Pipeline](#end-to-end-rag-pipeline)
31+
- [Resources](#resources)
32+
- [License](#license)
33+
34+
## Overview
35+
36+
[Superlinked's](https://superlinked.com) Search Inference Engine (SIE) is a self-hosted inference server for embeddings, reranking, and extraction. The `sie-haystack` package provides Haystack 2.0 components that route requests through a single SIE endpoint for 85+ embedding models (dense, sparse, multivector/ColBERT, multimodal), cross-encoder reranking, and zero-shot entity, relation, classification, and object-detection extraction.
37+
38+
All components live under the standard Haystack integrations namespace: `haystack_integrations.components.{embedders,rankers,extractors}.sie`.
39+
40+
Start a local SIE server with Docker before running any of the examples below:
41+
42+
```bash
43+
docker run -p 8080:8080 ghcr.io/superlinked/sie-server:default
44+
# Or with an NVIDIA GPU:
45+
# docker run --gpus all -p 8080:8080 ghcr.io/superlinked/sie-server:default
46+
```
47+
48+
## Installation
49+
50+
```bash
51+
pip install sie-haystack
52+
```
53+
54+
This installs `sie-sdk` and `haystack-ai` as dependencies. Every component accepts `base_url` and a `model` identifier from the [Superlinked model catalog](https://superlinked.com/models). Swapping models is a parameter change, not a new deployment.
55+
56+
## Usage
57+
58+
### Dense Embeddings
59+
60+
Use `SIEDocumentEmbedder` in an indexing pipeline and `SIETextEmbedder` at query time.
61+
62+
Indexing:
63+
64+
```python
65+
from haystack import Document, Pipeline
66+
from haystack.components.writers import DocumentWriter
67+
from haystack.document_stores.in_memory import InMemoryDocumentStore
68+
from haystack_integrations.components.embedders.sie import SIEDocumentEmbedder
69+
70+
document_store = InMemoryDocumentStore()
71+
documents = [
72+
Document(content="Python is a high-level programming language."),
73+
Document(content="France is a country in Western Europe."),
74+
Document(content="Berlin is the capital of Germany."),
75+
]
76+
77+
indexing_pipeline = Pipeline()
78+
indexing_pipeline.add_component(
79+
"embedder",
80+
SIEDocumentEmbedder(base_url="http://localhost:8080", model="BAAI/bge-m3"),
81+
)
82+
indexing_pipeline.add_component("writer", DocumentWriter(document_store=document_store))
83+
indexing_pipeline.connect("embedder", "writer")
84+
85+
indexing_pipeline.run({"embedder": {"documents": documents}})
86+
```
87+
88+
Query-side:
89+
90+
```python
91+
from haystack_integrations.components.embedders.sie import SIETextEmbedder
92+
93+
embedder = SIETextEmbedder(base_url="http://localhost:8080", model="BAAI/bge-m3")
94+
result = embedder.run(text="What is Python?")
95+
query_vector = result["embedding"] # list[float]
96+
```
97+
98+
### Sparse Embeddings
99+
100+
For hybrid search with SPLADE-style or BGE-M3 sparse vectors:
101+
102+
```python
103+
from haystack_integrations.components.embedders.sie import SIESparseTextEmbedder
104+
105+
embedder = SIESparseTextEmbedder(base_url="http://localhost:8080", model="BAAI/bge-m3")
106+
result = embedder.run(text="What is machine learning?")
107+
print(result["sparse_embedding"].keys()) # dict_keys(['indices', 'values'])
108+
```
109+
110+
Use `SIESparseDocumentEmbedder` on the indexing side with the same model.
111+
112+
### Multivector (ColBERT) Embeddings
113+
114+
For late-interaction retrieval (ColBERT, Jina-ColBERT, ModernColBERT):
115+
116+
```python
117+
from haystack_integrations.components.embedders.sie import SIEMultivectorTextEmbedder
118+
119+
embedder = SIEMultivectorTextEmbedder(
120+
base_url="http://localhost:8080",
121+
model="jinaai/jina-colbert-v2",
122+
)
123+
result = embedder.run(text="What is machine learning?")
124+
multivector = result["multivector_embedding"] # list[list[float]], one vector per token
125+
```
126+
127+
`SIEMultivectorDocumentEmbedder` provides the document-side equivalent for indexing.
128+
129+
### Image Embeddings
130+
131+
Multimodal retrieval through CLIP, SigLIP, or ColPali:
132+
133+
```python
134+
from haystack_integrations.components.embedders.sie import SIEImageEmbedder
135+
136+
embedder = SIEImageEmbedder(
137+
base_url="http://localhost:8080",
138+
model="openai/clip-vit-large-patch14",
139+
)
140+
with open("photo.jpg", "rb") as f:
141+
result = embedder.run(images=[f.read()])
142+
embeddings = result["embeddings"] # list[list[float]]
143+
```
144+
145+
### Reranking
146+
147+
Rerank retrieved documents with a cross-encoder or late-interaction reranker. `SIERanker` stores the reranker score in `doc.meta["score"]`:
148+
149+
```python
150+
from haystack import Document
151+
from haystack_integrations.components.rankers.sie import SIERanker
152+
153+
ranker = SIERanker(
154+
base_url="http://localhost:8080",
155+
model="jinaai/jina-reranker-v2-base-multilingual",
156+
top_k=3,
157+
)
158+
result = ranker.run(
159+
query="What is Python?",
160+
documents=[
161+
Document(content="Python is a high-level programming language."),
162+
Document(content="France is a country in Western Europe."),
163+
Document(content="Python snakes live in tropical climates."),
164+
],
165+
)
166+
for doc in result["documents"]:
167+
score = doc.meta.get("score", 0)
168+
print(f"{score:.3f}: {doc.content}")
169+
```
170+
171+
### Extraction
172+
173+
Zero-shot entities (GLiNER), relations (GLiREL), classifications (GLiClass), and object detection (GroundingDINO, OWL-v2) all use `SIEExtractor`. The output shape depends on the model family.
174+
175+
Named entity recognition:
176+
177+
```python
178+
from haystack_integrations.components.extractors.sie import SIEExtractor
179+
180+
extractor = SIEExtractor(
181+
base_url="http://localhost:8080",
182+
model="urchade/gliner_multi-v2.1",
183+
labels=["person", "organization", "location"],
184+
)
185+
result = extractor.run(text="Tim Cook is the CEO of Apple in Cupertino.")
186+
for entity in result["entities"]:
187+
print(f"{entity.text} ({entity.label}): {entity.score:.2f}")
188+
```
189+
190+
Relation extraction:
191+
192+
```python
193+
extractor = SIEExtractor(
194+
base_url="http://localhost:8080",
195+
model="jackboyla/glirel-large-v0",
196+
labels=["works_for", "ceo_of", "founded"],
197+
)
198+
result = extractor.run(text="Tim Cook is the CEO of Apple Inc.")
199+
for relation in result["relations"]:
200+
print(f"{relation.head} --{relation.relation}--> {relation.tail}")
201+
```
202+
203+
Text classification (GLiClass):
204+
205+
```python
206+
extractor = SIEExtractor(
207+
base_url="http://localhost:8080",
208+
model="knowledgator/gliclass-base-v1.0",
209+
labels=["positive", "negative", "neutral"],
210+
)
211+
result = extractor.run(text="I absolutely loved this movie! The acting was superb.")
212+
for classification in result["classifications"]:
213+
print(f"{classification.label}: {classification.score:.2f}")
214+
```
215+
216+
Object detection (GroundingDINO, OWL-v2) fills `result["objects"]` with `label`, `score`, and `bbox` on each detection. See the [full integration guide](https://superlinked.com/docs/integrations/haystack#extraction) for the complete extractor reference.
217+
218+
### End-to-End RAG Pipeline
219+
220+
Combine embedder, retriever, and ranker into one query pipeline:
221+
222+
```python
223+
from haystack import Pipeline
224+
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
225+
from haystack_integrations.components.embedders.sie import SIETextEmbedder
226+
from haystack_integrations.components.rankers.sie import SIERanker
227+
228+
# Assumes documents were indexed with SIEDocumentEmbedder into document_store above.
229+
query_pipeline = Pipeline()
230+
query_pipeline.add_component(
231+
"text_embedder",
232+
SIETextEmbedder(base_url="http://localhost:8080", model="BAAI/bge-m3"),
233+
)
234+
query_pipeline.add_component(
235+
"retriever",
236+
InMemoryEmbeddingRetriever(document_store=document_store, top_k=10),
237+
)
238+
query_pipeline.add_component(
239+
"ranker",
240+
SIERanker(
241+
base_url="http://localhost:8080",
242+
model="jinaai/jina-reranker-v2-base-multilingual",
243+
top_k=3,
244+
),
245+
)
246+
query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
247+
query_pipeline.connect("retriever.documents", "ranker.documents")
248+
249+
result = query_pipeline.run({
250+
"text_embedder": {"text": "What is Python?"},
251+
"ranker": {"query": "What is Python?"},
252+
})
253+
for doc in result["ranker"]["documents"]:
254+
score = doc.meta.get("score", 0)
255+
print(f"{score:.3f}: {doc.content}")
256+
```
257+
258+
One SIE server backs the full pipeline through the shared `base_url`. Swapping retrieval or reranking models is a configuration change, not a new deployment.
259+
260+
## Resources
261+
262+
- [`sie-haystack` source](https://github.com/superlinked/sie/tree/main/integrations/sie_haystack)
263+
- [`sie-haystack` on PyPI](https://pypi.org/project/sie-haystack/)
264+
- [Superlinked Haystack integration guide](https://superlinked.com/docs/integrations/haystack)
265+
- [Superlinked model catalog](https://superlinked.com/models)
266+
267+
## License
268+
269+
`sie-haystack` is released under the Apache 2.0 license.

logos/superlinked.png

24 KB
Loading

0 commit comments

Comments
 (0)