Skip to content

Commit 8d80ed9

Browse files
anakin87bilgeyucel
andauthored
Separate HF integration pages (#498)
* separate hf integrations - draft * updates * Update integrations/huggingface-api.md Co-authored-by: Bilge Yücel <bilge.yucel@deepset.ai> --------- Co-authored-by: Bilge Yücel <bilge.yucel@deepset.ai>
1 parent 607de6d commit 8d80ed9

5 files changed

Lines changed: 308 additions & 116 deletions

File tree

integrations/huggingface-api.md

Lines changed: 123 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,123 @@
1+
---
2+
layout: integration
3+
name: Hugging Face API
4+
description: Use models through Hugging Face APIs - Inference Providers, Inference Endpoints, TGI and TEI
5+
authors:
6+
- name: deepset
7+
socials:
8+
github: deepset-ai
9+
twitter: deepset_ai
10+
linkedin: https://www.linkedin.com/company/deepset-ai/
11+
pypi: https://pypi.org/project/huggingface-api-haystack
12+
repo: https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/huggingface_api
13+
type: Model Provider
14+
report_issue: https://github.com/deepset-ai/haystack-core-integrations/issues
15+
logo: /logos/huggingface.png
16+
version: Haystack 2.0
17+
toc: true
18+
---
19+
20+
### **Table of Contents**
21+
22+
- [Overview](#overview)
23+
- [Installation](#installation)
24+
- [Usage](#usage)
25+
26+
## Overview
27+
28+
With this integration, you can use models through Hugging Face APIs:
29+
- [Serverless Inference API (Inference Providers)](https://huggingface.co/docs/inference-providers): access many models from different providers through a unified API.
30+
- [Inference Endpoints](https://huggingface.co/inference-endpoints): deploy models on dedicated, fully managed infrastructure.
31+
- Self-hosted [Text Generation Inference (TGI)](https://github.com/huggingface/text-generation-inference) and [Text Embeddings Inference (TEI)](https://github.com/huggingface/text-embeddings-inference) servers.
32+
33+
Haystack supports Hugging Face models in other ways too:
34+
- [Hugging Face Transformers](https://haystack.deepset.ai/integrations/huggingface) for local models (LLMs, extractive QA, classification, NER)
35+
- [Sentence Transformers](https://haystack.deepset.ai/integrations/sentence-transformers) for local embedding and ranking models
36+
- [Optimum](https://haystack.deepset.ai/integrations/optimum) for high-performance inference with ONNX Runtime
37+
38+
## Installation
39+
40+
```bash
41+
pip install huggingface-api-haystack
42+
```
43+
44+
## Usage
45+
46+
Unless you are using a self-hosted TGI/TEI server, set your Hugging Face token as the `HF_API_TOKEN` or `HF_TOKEN` environment variable.
47+
48+
### Components
49+
50+
This integration provides several components to interact with Hugging Face APIs:
51+
- [`HuggingFaceAPIChatGenerator`](https://docs.haystack.deepset.ai/docs/huggingfaceapichatgenerator): chat generation with LLMs.
52+
- [`HuggingFaceAPITextEmbedder`](https://docs.haystack.deepset.ai/docs/huggingfaceapitextembedder): creates an embedding for text (used in query/RAG pipelines).
53+
- [`HuggingFaceAPIDocumentEmbedder`](https://docs.haystack.deepset.ai/docs/huggingfaceapidocumentembedder): enriches documents with embeddings (used in indexing pipelines).
54+
- [`HuggingFaceTEIRanker`](https://docs.haystack.deepset.ai/docs/huggingfaceteiranker): ranks documents based on their similarity to the query, using a TEI endpoint.
55+
56+
### Chat Generation
57+
58+
Use [`HuggingFaceAPIChatGenerator`](https://docs.haystack.deepset.ai/docs/huggingfaceapichatgenerator) with the Serverless Inference API (Inference Providers):
59+
60+
```python
61+
from haystack.dataclasses import ChatMessage
62+
from haystack_integrations.components.generators.huggingface_api import HuggingFaceAPIChatGenerator
63+
64+
generator = HuggingFaceAPIChatGenerator(
65+
api_type="serverless_inference_api",
66+
api_params={"model": "Qwen/Qwen2.5-7B-Instruct", "provider": "together"},
67+
)
68+
69+
result = generator.run("What's Natural Language Processing? Be brief.")
70+
print(result)
71+
```
72+
73+
To use a dedicated Inference Endpoint or a self-hosted TGI server, pass its URL instead:
74+
75+
```python
76+
generator = HuggingFaceAPIChatGenerator(
77+
api_type="inference_endpoints", # or "text_generation_inference" for self-hosted TGI
78+
api_params={"url": "<your-endpoint-url>"},
79+
)
80+
```
81+
82+
### Embedding Models
83+
84+
To create semantic embeddings for documents, use [`HuggingFaceAPIDocumentEmbedder`](https://docs.haystack.deepset.ai/docs/huggingfaceapidocumentembedder) in your indexing pipeline. For generating embeddings for queries, use [`HuggingFaceAPITextEmbedder`](https://docs.haystack.deepset.ai/docs/huggingfaceapitextembedder).
85+
86+
```python
87+
from haystack_integrations.components.embedders.huggingface_api import HuggingFaceAPITextEmbedder
88+
89+
text_embedder = HuggingFaceAPITextEmbedder(
90+
api_type="serverless_inference_api",
91+
api_params={"model": "BAAI/bge-small-en-v1.5"},
92+
)
93+
94+
print(text_embedder.run("I love pizza!"))
95+
# {'embedding': [0.017020374536514282, -0.023255806416273117, ...]}
96+
```
97+
98+
Both embedders also work with a self-hosted TEI server:
99+
100+
```python
101+
text_embedder = HuggingFaceAPITextEmbedder(
102+
api_type="text_embeddings_inference",
103+
api_params={"url": "http://localhost:8080"},
104+
)
105+
```
106+
107+
### Ranking Models
108+
109+
Use [`HuggingFaceTEIRanker`](https://docs.haystack.deepset.ai/docs/huggingfaceteiranker) to rank documents with a reranking model served by a TEI endpoint:
110+
111+
```python
112+
from haystack import Document
113+
from haystack_integrations.components.rankers.huggingface_api import HuggingFaceTEIRanker
114+
115+
ranker = HuggingFaceTEIRanker(url="http://localhost:8080", top_k=2)
116+
117+
docs = [Document(content="The capital of France is Paris"),
118+
Document(content="The capital of Germany is Berlin")]
119+
120+
result = ranker.run(query="What is the capital of France?", documents=docs)
121+
print(result["documents"][0].content)
122+
# The capital of France is Paris
123+
```

integrations/huggingface.md

Lines changed: 72 additions & 116 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,18 @@
11
---
22
layout: integration
3-
name: Hugging Face
4-
description: Use Models on Hugging Face with Haystack
3+
name: Hugging Face Transformers
4+
description: Run Transformers models locally in your Haystack pipelines
55
authors:
66
- name: deepset
77
socials:
88
github: deepset-ai
99
twitter: deepset_ai
1010
linkedin: https://www.linkedin.com/company/deepset-ai/
11-
pypi: https://pypi.org/project/farm-haystack
11+
pypi: https://pypi.org/project/haystack-ai
1212
repo: https://github.com/deepset-ai/haystack
1313
type: Model Provider
1414
report_issue: https://github.com/deepset-ai/haystack/issues
15-
logo: /logos/huggingface.png
15+
logo: /logos/transformers.png
1616
version: Haystack 2.0
1717
toc: true
1818
---
@@ -25,130 +25,47 @@ toc: true
2525

2626
## Overview
2727

28-
You can use models on [Hugging Face](https://huggingface.co/) in your Haystack pipelines with [Generators](https://docs.haystack.deepset.ai/docs/generators), [Embedders](https://docs.haystack.deepset.ai/docs/embedders), [Rankers](https://docs.haystack.deepset.ai/docs/rankers) and [Readers](https://docs.haystack.deepset.ai/docs/readers)!
28+
[Transformers](https://huggingface.co/docs/transformers/index) is Hugging Face's library for state-of-the-art machine learning models. With this integration, you can run models from the [Hugging Face Hub](https://huggingface.co/models) **locally**, on your own machine, in your Haystack pipelines.
2929

30-
### Installation
30+
Haystack supports Hugging Face models in other ways too:
31+
- [Sentence Transformers](https://haystack.deepset.ai/integrations/sentence-transformers) for local embedding and ranking models
32+
- [Hugging Face API](https://haystack.deepset.ai/integrations/huggingface-api) to call models via Inference Providers, Inference Endpoints, or self-hosted TGI/TEI
33+
- [Optimum](https://haystack.deepset.ai/integrations/optimum) for high-performance inference with ONNX Runtime
34+
35+
## Installation
3136

3237
```bash
33-
pip install haystack-ai
38+
pip install haystack-ai "transformers[torch,sentencepiece]"
3439
```
3540

36-
### Usage
37-
38-
You can use models on Hugging Face in various ways:
39-
40-
#### Embedding Models
41+
## Usage
4142

42-
You can leverage embedding models from Hugging Face through four components: [SentenceTransformersTextEmbedder](https://docs.haystack.deepset.ai/docs/sentencetransformerstextembedder), [SentenceTransformersDocumentEmbedder](https://docs.haystack.deepset.ai/docs/sentencetransformersdocumentembedder), [HuggingFaceAPITextEmbedder](https://docs.haystack.deepset.ai/docs/huggingfaceapitextembedder) and [HuggingFaceAPIDocumentEmbedder](https://docs.haystack.deepset.ai/docs/huggingfaceapidocumentembedder).
43+
### Components
4344

44-
To create semantic embeddings for documents, use a Document Embedder in your indexing pipeline. For generating embeddings for queries, use a Text Embedder.
45+
Haystack provides several components that run Transformers models locally:
46+
- [`HuggingFaceLocalChatGenerator`](https://docs.haystack.deepset.ai/docs/huggingfacelocalchatgenerator): chat generation with local LLMs.
47+
- [`ExtractiveReader`](https://docs.haystack.deepset.ai/docs/extractivereader): extracts answers from documents using question answering models.
48+
- [`TransformersTextRouter`](https://docs.haystack.deepset.ai/docs/transformerstextrouter) and [`TransformersZeroShotTextRouter`](https://docs.haystack.deepset.ai/docs/transformerszeroshottextrouter): route text to different pipeline branches based on classification.
49+
- [`TransformersZeroShotDocumentClassifier`](https://docs.haystack.deepset.ai/docs/transformerszeroshotdocumentclassifier): classifies documents with zero-shot classification models.
50+
- [`NamedEntityExtractor`](https://docs.haystack.deepset.ai/docs/namedentityextractor): annotates named entities in documents (with the `hugging_face` backend).
4551

46-
Depending on the hosting option (local Sentence Transformers model, Serverless Inference API, Inference Endpoints, or self-hosted Text Embeddings Inference), select the suitable Hugging Face Embedder component and initialize it with the model name.
52+
### Chat Generation
4753

48-
Below is the example indexing pipeline with `InMemoryDocumentStore`, `DocumentWriter` and `SentenceTransformersDocumentEmbedder`:
54+
Use [`HuggingFaceLocalChatGenerator`](https://docs.haystack.deepset.ai/docs/huggingfacelocalchatgenerator) to run a chat model locally:
4955

5056
```python
51-
from haystack import Document
52-
from haystack import Pipeline
53-
from haystack.document_stores.in_memory import InMemoryDocumentStore
54-
from haystack.components.embedders import SentenceTransformersDocumentEmbedder
55-
from haystack.components.writers import DocumentWriter
56-
57-
document_store = InMemoryDocumentStore(embedding_similarity_function="cosine")
58-
59-
documents = [Document(content="My name is Wolfgang and I live in Berlin"),
60-
Document(content="I saw a black horse running"),
61-
Document(content="Germany has many big cities")]
62-
63-
indexing_pipeline = Pipeline()
64-
indexing_pipeline.add_component("embedder", SentenceTransformersDocumentEmbedder(model="sentence-transformers/all-MiniLM-L6-v2"))
65-
indexing_pipeline.add_component("writer", DocumentWriter(document_store=document_store))
66-
indexing_pipeline.connect("embedder", "writer")
67-
indexing_pipeline.run({
68-
"embedder":{"documents":documents}
69-
})
70-
```
57+
from haystack.components.generators.chat import HuggingFaceLocalChatGenerator
58+
from haystack.dataclasses import ChatMessage
7159

72-
#### Generative Models (LLMs)
60+
generator = HuggingFaceLocalChatGenerator(model="Qwen/Qwen3-0.6B")
7361

74-
You can leverage text generation models from Hugging Face through three components: [HuggingFaceLocalGenerator](https://docs.haystack.deepset.ai/docs/huggingfacelocalgenerator), [HuggingFaceAPIGenerator](https://docs.haystack.deepset.ai/docs/huggingfaceapigenerator) and [HuggingFaceAPIChatGenerator](https://docs.haystack.deepset.ai/docs/huggingfaceapichatgenerator).
75-
76-
Depending on the model type (chat or text completion) and hosting option (local Transformer model, Serverless Inference API, Inference Endpoints, or self-hosted Text Generation Inference), select the suitable Hugging Face Generator component and initialize it with the model name.
77-
78-
Below is the example query pipeline that uses `HuggingFaceH4/zephyr-7b-beta` hosted on Serverless Inference API with `HuggingFaceAPIGenerator`:
79-
80-
```python
81-
from haystack import Pipeline
82-
from haystack.utils import Secret
83-
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
84-
from haystack.components.builders.prompt_builder import PromptBuilder
85-
from haystack.components.generators import HuggingFaceAPIGenerator
86-
87-
template = """
88-
Given the following information, answer the question.
89-
90-
Context:
91-
{% for document in documents %}
92-
{{ document.text }}
93-
{% endfor %}
94-
95-
Question: What's the official language of {{ country }}?
96-
"""
97-
pipe = Pipeline()
98-
99-
generator = HuggingFaceAPIGenerator(api_type="serverless_inference_api",
100-
api_params={"model": "HuggingFaceH4/zephyr-7b-beta"},
101-
token=Secret.from_token("YOUR_HF_API_TOKEN"))
102-
103-
pipe.add_component("retriever", InMemoryBM25Retriever(document_store=docstore))
104-
pipe.add_component("prompt_builder", PromptBuilder(template=template))
105-
pipe.add_component("llm", generator)
106-
pipe.connect("retriever", "prompt_builder.documents")
107-
pipe.connect("prompt_builder", "llm")
108-
109-
pipe.run({
110-
"prompt_builder": {
111-
"country": "France"
112-
}
113-
})
62+
messages = [ChatMessage.from_user("What's Natural Language Processing? Be brief.")]
63+
print(generator.run(messages))
11464
```
11565

116-
#### Ranker Models
117-
118-
To use cross encoder models on Hugging Face, initialize a `SentenceTransformersRanker` with the model name. You can then use this `SentenceTransformersRanker` to sort documents based on their relevancy to the query.
66+
### Extractive Question Answering
11967

120-
Below is the example of document retrieval pipeline with `InMemoryBM25Retriever` and `SentenceTransformersRanker`:
121-
122-
```python
123-
from haystack import Document, Pipeline
124-
from haystack.document_stores.in_memory import InMemoryDocumentStore
125-
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
126-
from haystack.components.rankers import TransformersSimilarityRanker
127-
128-
docs = [Document(content="Paris is in France"),
129-
Document(content="Berlin is in Germany"),
130-
Document(content="Lyon is in France")]
131-
document_store = InMemoryDocumentStore()
132-
document_store.write_documents(docs)
133-
134-
retriever = InMemoryBM25Retriever(document_store = document_store)
135-
ranker = TransformersSimilarityRanker(model="cross-encoder/ms-marco-MiniLM-L-6-v2")
136-
137-
document_ranker_pipeline = Pipeline()
138-
document_ranker_pipeline.add_component(instance=retriever, name="retriever")
139-
document_ranker_pipeline.add_component(instance=ranker, name="ranker")
140-
document_ranker_pipeline.connect("retriever.documents", "ranker.documents")
141-
142-
query = "Cities in France"
143-
document_ranker_pipeline.run(data={"retriever": {"query": query, "top_k": 3},
144-
"ranker": {"query": query, "top_k": 2}})
145-
```
146-
147-
#### Reader Models
148-
149-
To use question answering models on Hugging Face, initialize a `ExtractiveReader` with the model name. You can then use this `ExtractiveReader` to extract answers from the relevant context.
150-
151-
Below is the example of extractive question answering pipeline with `InMemoryBM25Retriever` and `ExtractiveReader`:
68+
Use [`ExtractiveReader`](https://docs.haystack.deepset.ai/docs/extractivereader) to extract answers from the relevant context:
15269

15370
```python
15471
from haystack import Document, Pipeline
@@ -163,16 +80,55 @@ docs = [Document(content="Paris is the capital of France."),
16380
document_store = InMemoryDocumentStore()
16481
document_store.write_documents(docs)
16582

166-
retriever = InMemoryBM25Retriever(document_store = document_store)
83+
retriever = InMemoryBM25Retriever(document_store=document_store)
16784
reader = ExtractiveReader(model="deepset/roberta-base-squad2-distilled")
16885

16986
extractive_qa_pipeline = Pipeline()
17087
extractive_qa_pipeline.add_component(instance=retriever, name="retriever")
17188
extractive_qa_pipeline.add_component(instance=reader, name="reader")
172-
17389
extractive_qa_pipeline.connect("retriever.documents", "reader.documents")
17490

17591
query = "What is the capital of France?"
176-
extractive_qa_pipeline.run(data={"retriever": {"query": query, "top_k": 3},
177-
"reader": {"query": query, "top_k": 2}})
92+
extractive_qa_pipeline.run(data={"retriever": {"query": query, "top_k": 3},
93+
"reader": {"query": query, "top_k": 2}})
94+
```
95+
96+
### Zero-Shot Document Classification
97+
98+
Use [`TransformersZeroShotDocumentClassifier`](https://docs.haystack.deepset.ai/docs/transformerszeroshotdocumentclassifier) to classify documents with labels of your choice, without fine-tuning:
99+
100+
```python
101+
from haystack import Document
102+
from haystack.components.classifiers import TransformersZeroShotDocumentClassifier
103+
104+
documents = [Document(content="Today was a nice day!"),
105+
Document(content="Yesterday was a bad day!")]
106+
107+
classifier = TransformersZeroShotDocumentClassifier(
108+
model="cross-encoder/nli-deberta-v3-xsmall",
109+
labels=["positive", "negative"],
110+
)
111+
112+
result = classifier.run(documents=documents)
113+
print([doc.meta["classification"]["label"] for doc in result["documents"]])
114+
# ['positive', 'negative']
115+
```
116+
117+
### Named Entity Recognition
118+
119+
Use [`NamedEntityExtractor`](https://docs.haystack.deepset.ai/docs/namedentityextractor) to annotate named entities in documents:
120+
121+
```python
122+
from haystack import Document
123+
from haystack.components.extractors.named_entity_extractor import NamedEntityExtractor
124+
125+
documents = [
126+
Document(content="I'm Merlin, the happy pig!"),
127+
Document(content="My name is Clara and I live in Berkeley, California."),
128+
]
129+
extractor = NamedEntityExtractor(backend="hugging_face", model="dslim/bert-base-NER")
130+
131+
results = extractor.run(documents=documents)["documents"]
132+
annotations = [NamedEntityExtractor.get_stored_annotations(doc) for doc in results]
133+
print(annotations)
178134
```

0 commit comments

Comments
 (0)