deepset-ai
diff --git a/‎integrations/huggingface-api.md‎
Lines changed: 123 additions & 0 deletions b/‎integrations/huggingface-api.md‎
Lines changed: 123 additions & 0 deletions
diff --git a/‎integrations/huggingface.md‎
Lines changed: 72 additions & 116 deletions b/‎integrations/huggingface.md‎
Lines changed: 72 additions & 116 deletions
@@ -0,0 +1,123 @@
+---
+layout: integration
+name: Hugging Face API
+description: Use models through Hugging Face APIs - Inference Providers, Inference Endpoints, TGI and TEI
+authors:
+    - name: deepset
+      socials:
+        github: deepset-ai
+        twitter: deepset_ai
+        linkedin: https://www.linkedin.com/company/deepset-ai/
+pypi: https://pypi.org/project/huggingface-api-haystack
+repo: https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/huggingface_api
+type: Model Provider
+report_issue: https://github.com/deepset-ai/haystack-core-integrations/issues
+logo: /logos/huggingface.png
+version: Haystack 2.0
+toc: true
+---
+
+### **Table of Contents**
+
+- [Overview](#overview)
+- [Installation](#installation)
+- [Usage](#usage)
+
+## Overview
+
+With this integration, you can use models through Hugging Face APIs:
+- [Serverless Inference API (Inference Providers)](https://huggingface.co/docs/inference-providers): access many models from different providers through a unified API.
+- [Inference Endpoints](https://huggingface.co/inference-endpoints): deploy models on dedicated, fully managed infrastructure.
+- Self-hosted [Text Generation Inference (TGI)](https://github.com/huggingface/text-generation-inference) and [Text Embeddings Inference (TEI)](https://github.com/huggingface/text-embeddings-inference) servers.
+
+Haystack supports Hugging Face models in other ways too:
+- [Hugging Face Transformers](https://haystack.deepset.ai/integrations/huggingface) for local models (LLMs, extractive QA, classification, NER)
+- [Sentence Transformers](https://haystack.deepset.ai/integrations/sentence-transformers) for local embedding and ranking models
+- [Optimum](https://haystack.deepset.ai/integrations/optimum) for high-performance inference with ONNX Runtime
+
+## Installation
+
+```bash
+pip install huggingface-api-haystack
+```
+
+## Usage
+
+Unless you are using a self-hosted TGI/TEI server, set your Hugging Face token as the `HF_API_TOKEN` or `HF_TOKEN` environment variable.
+
+### Components
+
+This integration provides several components to interact with Hugging Face APIs:
+- [`HuggingFaceAPIChatGenerator`](https://docs.haystack.deepset.ai/docs/huggingfaceapichatgenerator): chat generation with LLMs.
+- [`HuggingFaceAPITextEmbedder`](https://docs.haystack.deepset.ai/docs/huggingfaceapitextembedder): creates an embedding for text (used in query/RAG pipelines).
+- [`HuggingFaceAPIDocumentEmbedder`](https://docs.haystack.deepset.ai/docs/huggingfaceapidocumentembedder): enriches documents with embeddings (used in indexing pipelines).
+- [`HuggingFaceTEIRanker`](https://docs.haystack.deepset.ai/docs/huggingfaceteiranker): ranks documents based on their similarity to the query, using a TEI endpoint.
+
+### Chat Generation
+
+Use [`HuggingFaceAPIChatGenerator`](https://docs.haystack.deepset.ai/docs/huggingfaceapichatgenerator) with the Serverless Inference API (Inference Providers):
+
+```python
+from haystack.dataclasses import ChatMessage
+from haystack_integrations.components.generators.huggingface_api import HuggingFaceAPIChatGenerator
+
+generator = HuggingFaceAPIChatGenerator(
+    api_type="serverless_inference_api",
+    api_params={"model": "Qwen/Qwen2.5-7B-Instruct", "provider": "together"},
+)
+
+result = generator.run("What's Natural Language Processing? Be brief.")
+print(result)
+```
+
+To use a dedicated Inference Endpoint or a self-hosted TGI server, pass its URL instead:
+
+```python
+generator = HuggingFaceAPIChatGenerator(
+    api_type="inference_endpoints",  # or "text_generation_inference" for self-hosted TGI
+    api_params={"url": "<your-endpoint-url>"},
+)
+```
+
+### Embedding Models
+
+To create semantic embeddings for documents, use [`HuggingFaceAPIDocumentEmbedder`](https://docs.haystack.deepset.ai/docs/huggingfaceapidocumentembedder) in your indexing pipeline. For generating embeddings for queries, use [`HuggingFaceAPITextEmbedder`](https://docs.haystack.deepset.ai/docs/huggingfaceapitextembedder).
+
+```python
+from haystack_integrations.components.embedders.huggingface_api import HuggingFaceAPITextEmbedder
+
+text_embedder = HuggingFaceAPITextEmbedder(
+    api_type="serverless_inference_api",
+    api_params={"model": "BAAI/bge-small-en-v1.5"},
+)
+
+print(text_embedder.run("I love pizza!"))
+# {'embedding': [0.017020374536514282, -0.023255806416273117, ...]}
+```
+
+Both embedders also work with a self-hosted TEI server:
+
+```python
+text_embedder = HuggingFaceAPITextEmbedder(
+    api_type="text_embeddings_inference",
+    api_params={"url": "http://localhost:8080"},
+)
+```
+
+### Ranking Models
+
+Use [`HuggingFaceTEIRanker`](https://docs.haystack.deepset.ai/docs/huggingfaceteiranker) to rank documents with a reranking model served by a TEI endpoint:
+
+```python
+from haystack import Document
+from haystack_integrations.components.rankers.huggingface_api import HuggingFaceTEIRanker
+
+ranker = HuggingFaceTEIRanker(url="http://localhost:8080", top_k=2)
+
+docs = [Document(content="The capital of France is Paris"),
+        Document(content="The capital of Germany is Berlin")]
+
+result = ranker.run(query="What is the capital of France?", documents=docs)
+print(result["documents"][0].content)
+# The capital of France is Paris
+```
@@ -1,18 +1,18 @@
 ---
 layout: integration
-name: Hugging Face
-description: Use Models on Hugging Face with Haystack
+name: Hugging Face Transformers
+description: Run Transformers models locally in your Haystack pipelines
 authors:
     - name: deepset
       socials:
         github: deepset-ai
         twitter: deepset_ai
         linkedin: https://www.linkedin.com/company/deepset-ai/
-pypi: https://pypi.org/project/farm-haystack
+pypi: https://pypi.org/project/haystack-ai
 repo: https://github.com/deepset-ai/haystack
 type: Model Provider
 report_issue: https://github.com/deepset-ai/haystack/issues
-logo: /logos/huggingface.png
+logo: /logos/transformers.png
 version: Haystack 2.0
 toc: true
 ---
@@ -25,130 +25,47 @@ toc: true
 
 ## Overview
 
-You can use models on [Hugging Face](https://huggingface.co/) in your Haystack pipelines with [Generators](https://docs.haystack.deepset.ai/docs/generators), [Embedders](https://docs.haystack.deepset.ai/docs/embedders), [Rankers](https://docs.haystack.deepset.ai/docs/rankers) and [Readers](https://docs.haystack.deepset.ai/docs/readers)!
+[Transformers](https://huggingface.co/docs/transformers/index) is Hugging Face's library for state-of-the-art machine learning models. With this integration, you can run models from the [Hugging Face Hub](https://huggingface.co/models) **locally**, on your own machine, in your Haystack pipelines.
 
-### Installation
+Haystack supports Hugging Face models in other ways too:
+- [Sentence Transformers](https://haystack.deepset.ai/integrations/sentence-transformers) for local embedding and ranking models
+- [Hugging Face API](https://haystack.deepset.ai/integrations/huggingface-api) to call models via Inference Providers, Inference Endpoints, or self-hosted TGI/TEI
+- [Optimum](https://haystack.deepset.ai/integrations/optimum) for high-performance inference with ONNX Runtime
+
+## Installation
 
 ```bash
-pip install haystack-ai
+pip install haystack-ai "transformers[torch,sentencepiece]"
 ```
 
-### Usage
-
-You can use models on Hugging Face in various ways:
-
-#### Embedding Models
+## Usage
 
-You can leverage embedding models from Hugging Face through four components: [SentenceTransformersTextEmbedder](https://docs.haystack.deepset.ai/docs/sentencetransformerstextembedder), [SentenceTransformersDocumentEmbedder](https://docs.haystack.deepset.ai/docs/sentencetransformersdocumentembedder), [HuggingFaceAPITextEmbedder](https://docs.haystack.deepset.ai/docs/huggingfaceapitextembedder) and [HuggingFaceAPIDocumentEmbedder](https://docs.haystack.deepset.ai/docs/huggingfaceapidocumentembedder).
+### Components
 
-To create semantic embeddings for documents, use a Document Embedder in your indexing pipeline. For generating embeddings for queries, use a Text Embedder.
+Haystack provides several components that run Transformers models locally:
+- [`HuggingFaceLocalChatGenerator`](https://docs.haystack.deepset.ai/docs/huggingfacelocalchatgenerator): chat generation with local LLMs.
+- [`ExtractiveReader`](https://docs.haystack.deepset.ai/docs/extractivereader): extracts answers from documents using question answering models.
+- [`TransformersTextRouter`](https://docs.haystack.deepset.ai/docs/transformerstextrouter) and [`TransformersZeroShotTextRouter`](https://docs.haystack.deepset.ai/docs/transformerszeroshottextrouter): route text to different pipeline branches based on classification.
+- [`TransformersZeroShotDocumentClassifier`](https://docs.haystack.deepset.ai/docs/transformerszeroshotdocumentclassifier): classifies documents with zero-shot classification models.
+- [`NamedEntityExtractor`](https://docs.haystack.deepset.ai/docs/namedentityextractor): annotates named entities in documents (with the `hugging_face` backend).
 
-Depending on the hosting option (local Sentence Transformers model, Serverless Inference API, Inference Endpoints, or self-hosted Text Embeddings Inference), select the suitable Hugging Face Embedder component and initialize it with the model name.
+### Chat Generation
 
-Below is the example indexing pipeline with `InMemoryDocumentStore`, `DocumentWriter` and  `SentenceTransformersDocumentEmbedder`:
+Use [`HuggingFaceLocalChatGenerator`](https://docs.haystack.deepset.ai/docs/huggingfacelocalchatgenerator) to run a chat model locally:
 
 ```python
-from haystack import Document
-from haystack import Pipeline
-from haystack.document_stores.in_memory import InMemoryDocumentStore
-from haystack.components.embedders import SentenceTransformersDocumentEmbedder
-from haystack.components.writers import DocumentWriter
-
-document_store = InMemoryDocumentStore(embedding_similarity_function="cosine")
-
-documents = [Document(content="My name is Wolfgang and I live in Berlin"),
-             Document(content="I saw a black horse running"),
-             Document(content="Germany has many big cities")]
-
-indexing_pipeline = Pipeline()
-indexing_pipeline.add_component("embedder", SentenceTransformersDocumentEmbedder(model="sentence-transformers/all-MiniLM-L6-v2"))
-indexing_pipeline.add_component("writer", DocumentWriter(document_store=document_store))
-indexing_pipeline.connect("embedder", "writer")
-indexing_pipeline.run({
-    "embedder":{"documents":documents}
-    })
-```
+from haystack.components.generators.chat import HuggingFaceLocalChatGenerator
+from haystack.dataclasses import ChatMessage
 
-#### Generative Models (LLMs) 
+generator = HuggingFaceLocalChatGenerator(model="Qwen/Qwen3-0.6B")
 
-You can leverage text generation models from Hugging Face through three components: [HuggingFaceLocalGenerator](https://docs.haystack.deepset.ai/docs/huggingfacelocalgenerator), [HuggingFaceAPIGenerator](https://docs.haystack.deepset.ai/docs/huggingfaceapigenerator) and [HuggingFaceAPIChatGenerator](https://docs.haystack.deepset.ai/docs/huggingfaceapichatgenerator).
-
-Depending on the model type (chat or text completion) and hosting option (local Transformer model, Serverless Inference API, Inference Endpoints, or self-hosted Text Generation Inference), select the suitable Hugging Face Generator component and initialize it with the model name.
-
-Below is the example query pipeline that uses `HuggingFaceH4/zephyr-7b-beta` hosted on Serverless Inference API with `HuggingFaceAPIGenerator`:
-
-```python
-from haystack import Pipeline
-from haystack.utils import Secret
-from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
-from haystack.components.builders.prompt_builder import PromptBuilder
-from haystack.components.generators import HuggingFaceAPIGenerator
-
-template = """
-Given the following information, answer the question.
-
-Context: 
-{% for document in documents %}
-    {{ document.text }}
-{% endfor %}
-
-Question: What's the official language of {{ country }}?
-"""
-pipe = Pipeline()
-
-generator = HuggingFaceAPIGenerator(api_type="serverless_inference_api",
-                                    api_params={"model": "HuggingFaceH4/zephyr-7b-beta"},
-                                    token=Secret.from_token("YOUR_HF_API_TOKEN"))
-
-pipe.add_component("retriever", InMemoryBM25Retriever(document_store=docstore))
-pipe.add_component("prompt_builder", PromptBuilder(template=template))
-pipe.add_component("llm", generator)
-pipe.connect("retriever", "prompt_builder.documents")
-pipe.connect("prompt_builder", "llm")
-
-pipe.run({
-    "prompt_builder": {
-        "country": "France"
-    }
-})
+messages = [ChatMessage.from_user("What's Natural Language Processing? Be brief.")]
+print(generator.run(messages))
 ```
 
-#### Ranker Models
-
-To use cross encoder models on Hugging Face, initialize a `SentenceTransformersRanker` with the model name. You can then use this `SentenceTransformersRanker` to sort documents based on their relevancy to the query.
+### Extractive Question Answering
 
-Below is the example of document retrieval pipeline with `InMemoryBM25Retriever` and  `SentenceTransformersRanker`:
-
-```python
-from haystack import Document, Pipeline
-from haystack.document_stores.in_memory import InMemoryDocumentStore
-from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
-from haystack.components.rankers import TransformersSimilarityRanker
-
-docs = [Document(content="Paris is in France"), 
-        Document(content="Berlin is in Germany"),
-        Document(content="Lyon is in France")]
-document_store = InMemoryDocumentStore()
-document_store.write_documents(docs)
-
-retriever = InMemoryBM25Retriever(document_store = document_store)
-ranker = TransformersSimilarityRanker(model="cross-encoder/ms-marco-MiniLM-L-6-v2")
-
-document_ranker_pipeline = Pipeline()
-document_ranker_pipeline.add_component(instance=retriever, name="retriever")
-document_ranker_pipeline.add_component(instance=ranker, name="ranker")
-document_ranker_pipeline.connect("retriever.documents", "ranker.documents")
-
-query = "Cities in France"
-document_ranker_pipeline.run(data={"retriever": {"query": query, "top_k": 3}, 
-                                   "ranker": {"query": query, "top_k": 2}})
-```
-
-#### Reader Models
-
-To use question answering models on Hugging Face, initialize a `ExtractiveReader` with the model name. You can then use this `ExtractiveReader` to extract answers from the relevant context.
-
-Below is the example of extractive question answering pipeline with `InMemoryBM25Retriever` and  `ExtractiveReader`:
+Use [`ExtractiveReader`](https://docs.haystack.deepset.ai/docs/extractivereader) to extract answers from the relevant context:
 
 ```python
 from haystack import Document, Pipeline
@@ -163,16 +80,55 @@ docs = [Document(content="Paris is the capital of France."),
 document_store = InMemoryDocumentStore()
 document_store.write_documents(docs)
 
-retriever = InMemoryBM25Retriever(document_store = document_store)
+retriever = InMemoryBM25Retriever(document_store=document_store)
 reader = ExtractiveReader(model="deepset/roberta-base-squad2-distilled")
 
 extractive_qa_pipeline = Pipeline()
 extractive_qa_pipeline.add_component(instance=retriever, name="retriever")
 extractive_qa_pipeline.add_component(instance=reader, name="reader")
-
 extractive_qa_pipeline.connect("retriever.documents", "reader.documents")
 
 query = "What is the capital of France?"
-extractive_qa_pipeline.run(data={"retriever": {"query": query, "top_k": 3}, 
-                                   "reader": {"query": query, "top_k": 2}})
+extractive_qa_pipeline.run(data={"retriever": {"query": query, "top_k": 3},
+                                 "reader": {"query": query, "top_k": 2}})
+```
+
+### Zero-Shot Document Classification
+
+Use [`TransformersZeroShotDocumentClassifier`](https://docs.haystack.deepset.ai/docs/transformerszeroshotdocumentclassifier) to classify documents with labels of your choice, without fine-tuning:
+
+```python
+from haystack import Document
+from haystack.components.classifiers import TransformersZeroShotDocumentClassifier
+
+documents = [Document(content="Today was a nice day!"),
+             Document(content="Yesterday was a bad day!")]
+
+classifier = TransformersZeroShotDocumentClassifier(
+    model="cross-encoder/nli-deberta-v3-xsmall",
+    labels=["positive", "negative"],
+)
+
+result = classifier.run(documents=documents)
+print([doc.meta["classification"]["label"] for doc in result["documents"]])
+# ['positive', 'negative']
+```
+
+### Named Entity Recognition
+
+Use [`NamedEntityExtractor`](https://docs.haystack.deepset.ai/docs/namedentityextractor) to annotate named entities in documents:
+
+```python
+from haystack import Document
+from haystack.components.extractors.named_entity_extractor import NamedEntityExtractor
+
+documents = [
+    Document(content="I'm Merlin, the happy pig!"),
+    Document(content="My name is Clara and I live in Berkeley, California."),
+]
+extractor = NamedEntityExtractor(backend="hugging_face", model="dslim/bert-base-NER")
+
+results = extractor.run(documents=documents)["documents"]
+annotations = [NamedEntityExtractor.get_stored_annotations(doc) for doc in results]
+print(annotations)
 ```