diff --git a/integrations/huggingface-api.md b/integrations/huggingface-api.md new file mode 100644 index 00000000..4d89f014 --- /dev/null +++ b/integrations/huggingface-api.md @@ -0,0 +1,123 @@ +--- +layout: integration +name: Hugging Face API +description: Use models through Hugging Face APIs - Inference Providers, Inference Endpoints, TGI and TEI +authors: + - name: deepset + socials: + github: deepset-ai + twitter: deepset_ai + linkedin: https://www.linkedin.com/company/deepset-ai/ +pypi: https://pypi.org/project/huggingface-api-haystack +repo: https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/huggingface_api +type: Model Provider +report_issue: https://github.com/deepset-ai/haystack-core-integrations/issues +logo: /logos/huggingface.png +version: Haystack 2.0 +toc: true +--- + +### **Table of Contents** + +- [Overview](#overview) +- [Installation](#installation) +- [Usage](#usage) + +## Overview + +With this integration, you can use models through Hugging Face APIs: +- [Serverless Inference API (Inference Providers)](https://huggingface.co/docs/inference-providers): access many models from different providers through a unified API. +- [Inference Endpoints](https://huggingface.co/inference-endpoints): deploy models on dedicated, fully managed infrastructure. +- Self-hosted [Text Generation Inference (TGI)](https://github.com/huggingface/text-generation-inference) and [Text Embeddings Inference (TEI)](https://github.com/huggingface/text-embeddings-inference) servers. + +Haystack supports Hugging Face models in other ways too: +- [Hugging Face Transformers](https://haystack.deepset.ai/integrations/huggingface) for local models (LLMs, extractive QA, classification, NER) +- [Sentence Transformers](https://haystack.deepset.ai/integrations/sentence-transformers) for local embedding and ranking models +- [Optimum](https://haystack.deepset.ai/integrations/optimum) for high-performance inference with ONNX Runtime + +## Installation + +```bash +pip install huggingface-api-haystack +``` + +## Usage + +Unless you are using a self-hosted TGI/TEI server, set your Hugging Face token as the `HF_API_TOKEN` or `HF_TOKEN` environment variable. + +### Components + +This integration provides several components to interact with Hugging Face APIs: +- [`HuggingFaceAPIChatGenerator`](https://docs.haystack.deepset.ai/docs/huggingfaceapichatgenerator): chat generation with LLMs. +- [`HuggingFaceAPITextEmbedder`](https://docs.haystack.deepset.ai/docs/huggingfaceapitextembedder): creates an embedding for text (used in query/RAG pipelines). +- [`HuggingFaceAPIDocumentEmbedder`](https://docs.haystack.deepset.ai/docs/huggingfaceapidocumentembedder): enriches documents with embeddings (used in indexing pipelines). +- [`HuggingFaceTEIRanker`](https://docs.haystack.deepset.ai/docs/huggingfaceteiranker): ranks documents based on their similarity to the query, using a TEI endpoint. + +### Chat Generation + +Use [`HuggingFaceAPIChatGenerator`](https://docs.haystack.deepset.ai/docs/huggingfaceapichatgenerator) with the Serverless Inference API (Inference Providers): + +```python +from haystack.dataclasses import ChatMessage +from haystack_integrations.components.generators.huggingface_api import HuggingFaceAPIChatGenerator + +generator = HuggingFaceAPIChatGenerator( + api_type="serverless_inference_api", + api_params={"model": "Qwen/Qwen2.5-7B-Instruct", "provider": "together"}, +) + +result = generator.run("What's Natural Language Processing? Be brief.") +print(result) +``` + +To use a dedicated Inference Endpoint or a self-hosted TGI server, pass its URL instead: + +```python +generator = HuggingFaceAPIChatGenerator( + api_type="inference_endpoints", # or "text_generation_inference" for self-hosted TGI + api_params={"url": ""}, +) +``` + +### Embedding Models + +To create semantic embeddings for documents, use [`HuggingFaceAPIDocumentEmbedder`](https://docs.haystack.deepset.ai/docs/huggingfaceapidocumentembedder) in your indexing pipeline. For generating embeddings for queries, use [`HuggingFaceAPITextEmbedder`](https://docs.haystack.deepset.ai/docs/huggingfaceapitextembedder). + +```python +from haystack_integrations.components.embedders.huggingface_api import HuggingFaceAPITextEmbedder + +text_embedder = HuggingFaceAPITextEmbedder( + api_type="serverless_inference_api", + api_params={"model": "BAAI/bge-small-en-v1.5"}, +) + +print(text_embedder.run("I love pizza!")) +# {'embedding': [0.017020374536514282, -0.023255806416273117, ...]} +``` + +Both embedders also work with a self-hosted TEI server: + +```python +text_embedder = HuggingFaceAPITextEmbedder( + api_type="text_embeddings_inference", + api_params={"url": "http://localhost:8080"}, +) +``` + +### Ranking Models + +Use [`HuggingFaceTEIRanker`](https://docs.haystack.deepset.ai/docs/huggingfaceteiranker) to rank documents with a reranking model served by a TEI endpoint: + +```python +from haystack import Document +from haystack_integrations.components.rankers.huggingface_api import HuggingFaceTEIRanker + +ranker = HuggingFaceTEIRanker(url="http://localhost:8080", top_k=2) + +docs = [Document(content="The capital of France is Paris"), + Document(content="The capital of Germany is Berlin")] + +result = ranker.run(query="What is the capital of France?", documents=docs) +print(result["documents"][0].content) +# The capital of France is Paris +``` diff --git a/integrations/huggingface.md b/integrations/huggingface.md index 6da56ca2..47796671 100644 --- a/integrations/huggingface.md +++ b/integrations/huggingface.md @@ -1,18 +1,18 @@ --- layout: integration -name: Hugging Face -description: Use Models on Hugging Face with Haystack +name: Hugging Face Transformers +description: Run Transformers models locally in your Haystack pipelines authors: - name: deepset socials: github: deepset-ai twitter: deepset_ai linkedin: https://www.linkedin.com/company/deepset-ai/ -pypi: https://pypi.org/project/farm-haystack +pypi: https://pypi.org/project/haystack-ai repo: https://github.com/deepset-ai/haystack type: Model Provider report_issue: https://github.com/deepset-ai/haystack/issues -logo: /logos/huggingface.png +logo: /logos/transformers.png version: Haystack 2.0 toc: true --- @@ -25,130 +25,47 @@ toc: true ## Overview -You can use models on [Hugging Face](https://huggingface.co/) in your Haystack pipelines with [Generators](https://docs.haystack.deepset.ai/docs/generators), [Embedders](https://docs.haystack.deepset.ai/docs/embedders), [Rankers](https://docs.haystack.deepset.ai/docs/rankers) and [Readers](https://docs.haystack.deepset.ai/docs/readers)! +[Transformers](https://huggingface.co/docs/transformers/index) is Hugging Face's library for state-of-the-art machine learning models. With this integration, you can run models from the [Hugging Face Hub](https://huggingface.co/models) **locally**, on your own machine, in your Haystack pipelines. -### Installation +Haystack supports Hugging Face models in other ways too: +- [Sentence Transformers](https://haystack.deepset.ai/integrations/sentence-transformers) for local embedding and ranking models +- [Hugging Face API](https://haystack.deepset.ai/integrations/huggingface-api) to call models via Inference Providers, Inference Endpoints, or self-hosted TGI/TEI +- [Optimum](https://haystack.deepset.ai/integrations/optimum) for high-performance inference with ONNX Runtime + +## Installation ```bash -pip install haystack-ai +pip install haystack-ai "transformers[torch,sentencepiece]" ``` -### Usage - -You can use models on Hugging Face in various ways: - -#### Embedding Models +## Usage -You can leverage embedding models from Hugging Face through four components: [SentenceTransformersTextEmbedder](https://docs.haystack.deepset.ai/docs/sentencetransformerstextembedder), [SentenceTransformersDocumentEmbedder](https://docs.haystack.deepset.ai/docs/sentencetransformersdocumentembedder), [HuggingFaceAPITextEmbedder](https://docs.haystack.deepset.ai/docs/huggingfaceapitextembedder) and [HuggingFaceAPIDocumentEmbedder](https://docs.haystack.deepset.ai/docs/huggingfaceapidocumentembedder). +### Components -To create semantic embeddings for documents, use a Document Embedder in your indexing pipeline. For generating embeddings for queries, use a Text Embedder. +Haystack provides several components that run Transformers models locally: +- [`HuggingFaceLocalChatGenerator`](https://docs.haystack.deepset.ai/docs/huggingfacelocalchatgenerator): chat generation with local LLMs. +- [`ExtractiveReader`](https://docs.haystack.deepset.ai/docs/extractivereader): extracts answers from documents using question answering models. +- [`TransformersTextRouter`](https://docs.haystack.deepset.ai/docs/transformerstextrouter) and [`TransformersZeroShotTextRouter`](https://docs.haystack.deepset.ai/docs/transformerszeroshottextrouter): route text to different pipeline branches based on classification. +- [`TransformersZeroShotDocumentClassifier`](https://docs.haystack.deepset.ai/docs/transformerszeroshotdocumentclassifier): classifies documents with zero-shot classification models. +- [`NamedEntityExtractor`](https://docs.haystack.deepset.ai/docs/namedentityextractor): annotates named entities in documents (with the `hugging_face` backend). -Depending on the hosting option (local Sentence Transformers model, Serverless Inference API, Inference Endpoints, or self-hosted Text Embeddings Inference), select the suitable Hugging Face Embedder component and initialize it with the model name. +### Chat Generation -Below is the example indexing pipeline with `InMemoryDocumentStore`, `DocumentWriter` and `SentenceTransformersDocumentEmbedder`: +Use [`HuggingFaceLocalChatGenerator`](https://docs.haystack.deepset.ai/docs/huggingfacelocalchatgenerator) to run a chat model locally: ```python -from haystack import Document -from haystack import Pipeline -from haystack.document_stores.in_memory import InMemoryDocumentStore -from haystack.components.embedders import SentenceTransformersDocumentEmbedder -from haystack.components.writers import DocumentWriter - -document_store = InMemoryDocumentStore(embedding_similarity_function="cosine") - -documents = [Document(content="My name is Wolfgang and I live in Berlin"), - Document(content="I saw a black horse running"), - Document(content="Germany has many big cities")] - -indexing_pipeline = Pipeline() -indexing_pipeline.add_component("embedder", SentenceTransformersDocumentEmbedder(model="sentence-transformers/all-MiniLM-L6-v2")) -indexing_pipeline.add_component("writer", DocumentWriter(document_store=document_store)) -indexing_pipeline.connect("embedder", "writer") -indexing_pipeline.run({ - "embedder":{"documents":documents} - }) -``` +from haystack.components.generators.chat import HuggingFaceLocalChatGenerator +from haystack.dataclasses import ChatMessage -#### Generative Models (LLMs) +generator = HuggingFaceLocalChatGenerator(model="Qwen/Qwen3-0.6B") -You can leverage text generation models from Hugging Face through three components: [HuggingFaceLocalGenerator](https://docs.haystack.deepset.ai/docs/huggingfacelocalgenerator), [HuggingFaceAPIGenerator](https://docs.haystack.deepset.ai/docs/huggingfaceapigenerator) and [HuggingFaceAPIChatGenerator](https://docs.haystack.deepset.ai/docs/huggingfaceapichatgenerator). - -Depending on the model type (chat or text completion) and hosting option (local Transformer model, Serverless Inference API, Inference Endpoints, or self-hosted Text Generation Inference), select the suitable Hugging Face Generator component and initialize it with the model name. - -Below is the example query pipeline that uses `HuggingFaceH4/zephyr-7b-beta` hosted on Serverless Inference API with `HuggingFaceAPIGenerator`: - -```python -from haystack import Pipeline -from haystack.utils import Secret -from haystack.components.retrievers.in_memory import InMemoryBM25Retriever -from haystack.components.builders.prompt_builder import PromptBuilder -from haystack.components.generators import HuggingFaceAPIGenerator - -template = """ -Given the following information, answer the question. - -Context: -{% for document in documents %} - {{ document.text }} -{% endfor %} - -Question: What's the official language of {{ country }}? -""" -pipe = Pipeline() - -generator = HuggingFaceAPIGenerator(api_type="serverless_inference_api", - api_params={"model": "HuggingFaceH4/zephyr-7b-beta"}, - token=Secret.from_token("YOUR_HF_API_TOKEN")) - -pipe.add_component("retriever", InMemoryBM25Retriever(document_store=docstore)) -pipe.add_component("prompt_builder", PromptBuilder(template=template)) -pipe.add_component("llm", generator) -pipe.connect("retriever", "prompt_builder.documents") -pipe.connect("prompt_builder", "llm") - -pipe.run({ - "prompt_builder": { - "country": "France" - } -}) +messages = [ChatMessage.from_user("What's Natural Language Processing? Be brief.")] +print(generator.run(messages)) ``` -#### Ranker Models - -To use cross encoder models on Hugging Face, initialize a `SentenceTransformersRanker` with the model name. You can then use this `SentenceTransformersRanker` to sort documents based on their relevancy to the query. +### Extractive Question Answering -Below is the example of document retrieval pipeline with `InMemoryBM25Retriever` and `SentenceTransformersRanker`: - -```python -from haystack import Document, Pipeline -from haystack.document_stores.in_memory import InMemoryDocumentStore -from haystack.components.retrievers.in_memory import InMemoryBM25Retriever -from haystack.components.rankers import TransformersSimilarityRanker - -docs = [Document(content="Paris is in France"), - Document(content="Berlin is in Germany"), - Document(content="Lyon is in France")] -document_store = InMemoryDocumentStore() -document_store.write_documents(docs) - -retriever = InMemoryBM25Retriever(document_store = document_store) -ranker = TransformersSimilarityRanker(model="cross-encoder/ms-marco-MiniLM-L-6-v2") - -document_ranker_pipeline = Pipeline() -document_ranker_pipeline.add_component(instance=retriever, name="retriever") -document_ranker_pipeline.add_component(instance=ranker, name="ranker") -document_ranker_pipeline.connect("retriever.documents", "ranker.documents") - -query = "Cities in France" -document_ranker_pipeline.run(data={"retriever": {"query": query, "top_k": 3}, - "ranker": {"query": query, "top_k": 2}}) -``` - -#### Reader Models - -To use question answering models on Hugging Face, initialize a `ExtractiveReader` with the model name. You can then use this `ExtractiveReader` to extract answers from the relevant context. - -Below is the example of extractive question answering pipeline with `InMemoryBM25Retriever` and `ExtractiveReader`: +Use [`ExtractiveReader`](https://docs.haystack.deepset.ai/docs/extractivereader) to extract answers from the relevant context: ```python from haystack import Document, Pipeline @@ -163,16 +80,55 @@ docs = [Document(content="Paris is the capital of France."), document_store = InMemoryDocumentStore() document_store.write_documents(docs) -retriever = InMemoryBM25Retriever(document_store = document_store) +retriever = InMemoryBM25Retriever(document_store=document_store) reader = ExtractiveReader(model="deepset/roberta-base-squad2-distilled") extractive_qa_pipeline = Pipeline() extractive_qa_pipeline.add_component(instance=retriever, name="retriever") extractive_qa_pipeline.add_component(instance=reader, name="reader") - extractive_qa_pipeline.connect("retriever.documents", "reader.documents") query = "What is the capital of France?" -extractive_qa_pipeline.run(data={"retriever": {"query": query, "top_k": 3}, - "reader": {"query": query, "top_k": 2}}) +extractive_qa_pipeline.run(data={"retriever": {"query": query, "top_k": 3}, + "reader": {"query": query, "top_k": 2}}) +``` + +### Zero-Shot Document Classification + +Use [`TransformersZeroShotDocumentClassifier`](https://docs.haystack.deepset.ai/docs/transformerszeroshotdocumentclassifier) to classify documents with labels of your choice, without fine-tuning: + +```python +from haystack import Document +from haystack.components.classifiers import TransformersZeroShotDocumentClassifier + +documents = [Document(content="Today was a nice day!"), + Document(content="Yesterday was a bad day!")] + +classifier = TransformersZeroShotDocumentClassifier( + model="cross-encoder/nli-deberta-v3-xsmall", + labels=["positive", "negative"], +) + +result = classifier.run(documents=documents) +print([doc.meta["classification"]["label"] for doc in result["documents"]]) +# ['positive', 'negative'] +``` + +### Named Entity Recognition + +Use [`NamedEntityExtractor`](https://docs.haystack.deepset.ai/docs/namedentityextractor) to annotate named entities in documents: + +```python +from haystack import Document +from haystack.components.extractors.named_entity_extractor import NamedEntityExtractor + +documents = [ + Document(content="I'm Merlin, the happy pig!"), + Document(content="My name is Clara and I live in Berkeley, California."), +] +extractor = NamedEntityExtractor(backend="hugging_face", model="dslim/bert-base-NER") + +results = extractor.run(documents=documents)["documents"] +annotations = [NamedEntityExtractor.get_stored_annotations(doc) for doc in results] +print(annotations) ``` diff --git a/integrations/sentence-transformers.md b/integrations/sentence-transformers.md new file mode 100644 index 00000000..df17d9cc --- /dev/null +++ b/integrations/sentence-transformers.md @@ -0,0 +1,113 @@ +--- +layout: integration +name: Sentence Transformers +description: Use Sentence Transformers embedding and ranking models in your Haystack pipelines +authors: + - name: deepset + socials: + github: deepset-ai + twitter: deepset_ai + linkedin: https://www.linkedin.com/company/deepset-ai/ +pypi: https://pypi.org/project/haystack-ai +repo: https://github.com/deepset-ai/haystack +type: Model Provider +report_issue: https://github.com/deepset-ai/haystack/issues +logo: /logos/sentence-transformers.png +version: Haystack 2.0 +toc: true +--- + +### **Table of Contents** + +- [Overview](#overview) +- [Installation](#installation) +- [Usage](#usage) + +## Overview + +[Sentence Transformers](https://www.sbert.net/) is a library for state-of-the-art embedding and reranking models. With this integration, you can run Sentence Transformers compatible models from the [Hugging Face Hub](https://huggingface.co/models?library=sentence-transformers) **locally**, on your own machine, in your Haystack pipelines. + +Haystack supports Hugging Face models in other ways too: +- [Hugging Face Transformers](https://haystack.deepset.ai/integrations/huggingface) for other local models (LLMs, extractive QA, classification, NER) +- [Hugging Face API](https://haystack.deepset.ai/integrations/huggingface-api) to call models via Inference Providers, Inference Endpoints, or self-hosted TGI/TEI +- [Optimum](https://haystack.deepset.ai/integrations/optimum) for high-performance inference with ONNX Runtime + +## Installation + +```bash +pip install haystack-ai "sentence-transformers>=5.0.0" +``` + +## Usage + +### Components + +Haystack provides several components based on Sentence Transformers: +- Embedders: + - [`SentenceTransformersTextEmbedder`](https://docs.haystack.deepset.ai/docs/sentencetransformerstextembedder): creates a dense embedding for text (used in query/RAG pipelines). + - [`SentenceTransformersDocumentEmbedder`](https://docs.haystack.deepset.ai/docs/sentencetransformersdocumentembedder): enriches documents with dense embeddings (used in indexing pipelines). + - [`SentenceTransformersSparseTextEmbedder`](https://docs.haystack.deepset.ai/docs/sentencetransformerssparsetextembedder): creates a sparse embedding for text (used in query/RAG pipelines). + - [`SentenceTransformersSparseDocumentEmbedder`](https://docs.haystack.deepset.ai/docs/sentencetransformerssparsedocumentembedder): enriches documents with sparse embeddings (used in indexing pipelines). + - [`SentenceTransformersDocumentImageEmbedder`](https://docs.haystack.deepset.ai/docs/sentencetransformersdocumentimageembedder): enriches documents with embeddings computed from their images. +- Rankers: + - [`SentenceTransformersSimilarityRanker`](https://docs.haystack.deepset.ai/docs/sentencetransformerssimilarityranker): ranks documents based on their similarity to the query, using cross-encoder models. + - [`SentenceTransformersDiversityRanker`](https://docs.haystack.deepset.ai/docs/sentencetransformersdiversityranker): ranks documents to maximize their overall diversity. + +### Embedding Models + +To create semantic embeddings for documents, use `SentenceTransformersDocumentEmbedder` in your indexing pipeline. For generating embeddings for queries, use `SentenceTransformersTextEmbedder`. + +Below is an example of a document retrieval pipeline, after the documents have been indexed with their embeddings: + +```python +from haystack import Document, Pipeline +from haystack.components.embedders import SentenceTransformersDocumentEmbedder, SentenceTransformersTextEmbedder +from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever +from haystack.document_stores.in_memory import InMemoryDocumentStore + +document_store = InMemoryDocumentStore(embedding_similarity_function="cosine") + +documents = [Document(content="My name is Wolfgang and I live in Berlin"), + Document(content="I saw a black horse running"), + Document(content="Germany has many big cities")] + +document_embedder = SentenceTransformersDocumentEmbedder(model="sentence-transformers/all-MiniLM-L6-v2") +documents_with_embeddings = document_embedder.run(documents)["documents"] +document_store.write_documents(documents_with_embeddings) + +query_pipeline = Pipeline() +query_pipeline.add_component("text_embedder", SentenceTransformersTextEmbedder(model="sentence-transformers/all-MiniLM-L6-v2")) +query_pipeline.add_component("retriever", InMemoryEmbeddingRetriever(document_store=document_store)) +query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding") + +result = query_pipeline.run({"text_embedder": {"text": "Who lives in Berlin?"}}) +``` + +### Sparse Embedding Models + +Sparse embedding models like SPLADE produce interpretable embeddings and can perform better than dense models in out-of-domain settings. Currently, sparse embedding retrieval is supported by the [Qdrant Document Store](https://haystack.deepset.ai/integrations/qdrant-document-store). + +```python +from haystack.components.embedders import SentenceTransformersSparseTextEmbedder + +text_embedder = SentenceTransformersSparseTextEmbedder() + +print(text_embedder.run("I love pizza!")) +# {'sparse_embedding': SparseEmbedding(indices=[999, 1045, ...], values=[0.918, 0.867, ...])} +``` + +### Ranking Models + +To rank documents based on their relevance to the query, use `SentenceTransformersSimilarityRanker` with a cross-encoder model: + +```python +from haystack import Document +from haystack.components.rankers import SentenceTransformersSimilarityRanker + +ranker = SentenceTransformersSimilarityRanker(model="cross-encoder/ms-marco-MiniLM-L-6-v2") + +docs = [Document(content="Paris"), Document(content="Berlin")] +result = ranker.run(query="City in Germany", documents=docs) +print(result["documents"][0].content) +# Berlin +``` diff --git a/logos/sentence-transformers.png b/logos/sentence-transformers.png new file mode 100644 index 00000000..20dc6abb Binary files /dev/null and b/logos/sentence-transformers.png differ diff --git a/logos/transformers.png b/logos/transformers.png new file mode 100644 index 00000000..a24dd610 Binary files /dev/null and b/logos/transformers.png differ