haystack/docs-website/reference_versioned_docs/version-2.29/integrations-api/watsonx.md at f04ba18c50942524bdf6eff5b31b697a3e6866ed · deepset-ai/haystack

title	IBM watsonx.ai
id	integrations-watsonx
description	IBM watsonx.ai integration for Haystack
slug	/integrations-watsonx

haystack_integrations.components.embedders.watsonx.document_embedder

WatsonxDocumentEmbedder

Computes document embeddings using IBM watsonx.ai models.

Usage example

from haystack import Document
from haystack_integrations.components.embedders.watsonx.document_embedder import WatsonxDocumentEmbedder

documents = [
    Document(content="I love pizza!"),
    Document(content="Pasta is great too"),
]

document_embedder = WatsonxDocumentEmbedder(
    model="ibm/slate-30m-english-rtrvr-v2",
    api_key=Secret.from_env_var("WATSONX_API_KEY"),
    api_base_url="https://us-south.ml.cloud.ibm.com",
    project_id=Secret.from_env_var("WATSONX_PROJECT_ID"),
)

result = document_embedder.run(documents=documents)
print(result["documents"][0].embedding)

# [0.017020374536514282, -0.023255806416273117, ...]

init

__init__(
    *,
    model: str = "ibm/slate-30m-english-rtrvr-v2",
    api_key: Secret = Secret.from_env_var("WATSONX_API_KEY"),
    api_base_url: str = "https://us-south.ml.cloud.ibm.com",
    project_id: Secret = Secret.from_env_var("WATSONX_PROJECT_ID"),
    truncate_input_tokens: int | None = None,
    prefix: str = "",
    suffix: str = "",
    batch_size: int = 1000,
    concurrency_limit: int = 5,
    timeout: float | None = None,
    max_retries: int | None = None,
    meta_fields_to_embed: list[str] | None = None,
    embedding_separator: str = "\n"
)

Creates a WatsonxDocumentEmbedder component.

Parameters:

model (str) – The name of the model to use for calculating embeddings. Default is "ibm/slate-30m-english-rtrvr-v2".
api_key (Secret) – The WATSONX API key. Can be set via environment variable WATSONX_API_KEY.
api_base_url (str) – The WATSONX URL for the watsonx.ai service. Default is "https://us-south.ml.cloud.ibm.com".
project_id (Secret) – The ID of the Watson Studio project. Can be set via environment variable WATSONX_PROJECT_ID.
truncate_input_tokens (int | None) – Maximum number of tokens to use from the input text. If set to None (or not provided), the full input text is used, up to the model's maximum token limit.
prefix (str) – A string to add at the beginning of each text.
suffix (str) – A string to add at the end of each text.
batch_size (int) – Number of documents to embed in one API call. Default is 1000.
concurrency_limit (int) – Number of parallel requests to make. Default is 5.
timeout (float | None) – Timeout for API requests in seconds.
max_retries (int | None) – Maximum number of retries for API requests.

to_dict

to_dict() -> dict[str, Any]

Serialize the component to a dictionary.

Returns:

dict[str, Any] – The serialized component as a dictionary.

from_dict

from_dict(data: dict[str, Any]) -> 'WatsonxDocumentEmbedder'

Deserializes the component from a dictionary.

Parameters:

data (dict[str, Any]) – The dictionary representation of this component.

Returns:

'WatsonxDocumentEmbedder' – The deserialized component instance.

run

run(documents: list[Document]) -> dict[str, list[Document] | dict[str, Any]]

Embeds a list of documents.

Parameters:

documents (list[Document]) – A list of documents to embed.

Returns:

dict[str, list[Document] | dict[str, Any]] – A dictionary with:
'documents': List of Documents with embeddings added
'meta': Information about the model usage

haystack_integrations.components.embedders.watsonx.text_embedder

WatsonxTextEmbedder

Embeds strings using IBM watsonx.ai foundation models.

You can use it to embed user query and send it to an embedding Retriever.

Usage example

from haystack_integrations.components.embedders.watsonx.text_embedder import WatsonxTextEmbedder

text_to_embed = "I love pizza!"

text_embedder = WatsonxTextEmbedder(
    model="ibm/slate-30m-english-rtrvr-v2",
    api_key=Secret.from_env_var("WATSONX_API_KEY"),
    api_base_url="https://us-south.ml.cloud.ibm.com",
    project_id=Secret.from_env_var("WATSONX_PROJECT_ID"),
)

print(text_embedder.run(text_to_embed))

# {'embedding': [0.017020374536514282, -0.023255806416273117, ...],
#  'meta': {'model': 'ibm/slate-30m-english-rtrvr-v2',
#           'truncated_input_tokens': 3}}

init

__init__(
    *,
    model: str = "ibm/slate-30m-english-rtrvr-v2",
    api_key: Secret = Secret.from_env_var("WATSONX_API_KEY"),
    api_base_url: str = "https://us-south.ml.cloud.ibm.com",
    project_id: Secret = Secret.from_env_var("WATSONX_PROJECT_ID"),
    truncate_input_tokens: int | None = None,
    prefix: str = "",
    suffix: str = "",
    timeout: float | None = None,
    max_retries: int | None = None
)

Creates an WatsonxTextEmbedder component.

Parameters:

model (str) – The name of the IBM watsonx model to use for calculating embeddings. Default is "ibm/slate-30m-english-rtrvr-v2".
api_key (Secret) – The WATSONX API key. Can be set via environment variable WATSONX_API_KEY.
api_base_url (str) – The WATSONX URL for the watsonx.ai service. Default is "https://us-south.ml.cloud.ibm.com".
project_id (Secret) – The ID of the Watson Studio project. Can be set via environment variable WATSONX_PROJECT_ID.
truncate_input_tokens (int | None) – Maximum number of tokens to use from the input text. If set to None (or not provided), the full input text is used, up to the model's maximum token limit.
prefix (str) – A string to add at the beginning of each text to embed.
suffix (str) – A string to add at the end of each text to embed.
timeout (float | None) – Timeout for API requests in seconds.
max_retries (int | None) – Maximum number of retries for API requests.

to_dict

to_dict() -> dict[str, Any]

Serialize the component to a dictionary.

Returns:

dict[str, Any] – The serialized component as a dictionary.

from_dict

from_dict(data: dict[str, Any]) -> WatsonxTextEmbedder

Deserializes the component from a dictionary.

Parameters:

data (dict[str, Any]) – The dictionary representation of this component.

Returns:

WatsonxTextEmbedder – The deserialized component instance.

run

run(text: str) -> dict[str, list[float] | dict[str, Any]]

Embeds a single string.

Parameters:

text (str) – Text to embed.

Returns:

dict[str, list[float] | dict[str, Any]] – A dictionary with:
'embedding': The embedding of the input text
'meta': Information about the model usage

haystack_integrations.components.generators.watsonx.chat.chat_generator

WatsonxChatGenerator

Enables chat completions using IBM's watsonx.ai foundation models.

This component interacts with IBM's watsonx.ai platform to generate chat responses using various foundation models. It supports the ChatMessage format for both input and output, including multimodal inputs with text and images.

The generator works with IBM's foundation models that are listed here.

You can customize the generation behavior by passing parameters to the watsonx.ai API through the generation_kwargs argument. These parameters are passed directly to the watsonx.ai inference endpoint.

For details on watsonx.ai API parameters, see IBM watsonx.ai documentation.

Usage example

from haystack_integrations.components.generators.watsonx.chat.chat_generator import WatsonxChatGenerator
from haystack.dataclasses import ChatMessage
from haystack.utils import Secret

messages = [ChatMessage.from_user("Explain quantum computing in simple terms")]

client = WatsonxChatGenerator(
    api_key=Secret.from_env_var("WATSONX_API_KEY"),
    model="ibm/granite-4-h-small",
    project_id=Secret.from_env_var("WATSONX_PROJECT_ID"),
)
response = client.run(messages)
print(response)

Multimodal usage example

from haystack.dataclasses import ChatMessage, ImageContent

# Create an image from file path or base64
image_content = ImageContent.from_file_path("path/to/your/image.jpg")

# Create a multimodal message with both text and image
messages = [ChatMessage.from_user(content_parts=["What's in this image?", image_content])]

# Use a multimodal model
client = WatsonxChatGenerator(
    api_key=Secret.from_env_var("WATSONX_API_KEY"),
    model="meta-llama/llama-3-2-11b-vision-instruct",
    project_id=Secret.from_env_var("WATSONX_PROJECT_ID"),
)
response = client.run(messages)
print(response)

SUPPORTED_MODELS

SUPPORTED_MODELS: list[str] = [
    "ibm/granite-3-1-8b-base",
    "ibm/granite-3-8b-instruct",
    "ibm/granite-4-h-small",
    "ibm/granite-8b-code-instruct",
    "ibm/granite-guardian-3-8b",
    "meta-llama/llama-3-1-70b-gptq",
    "meta-llama/llama-3-1-8b",
    "meta-llama/llama-3-2-11b-vision-instruct",
    "meta-llama/llama-3-2-90b-vision-instruct",
    "meta-llama/llama-3-3-70b-instruct",
    "meta-llama/llama-3-405b-instruct",
    "meta-llama/llama-4-maverick-17b-128e-instruct-fp8",
    "meta-llama/llama-guard-3-11b-vision",
    "mistral-large-2512",
    "mistralai/mistral-medium-2505",
    "mistralai/mistral-small-3-1-24b-instruct-2503",
    "openai/gpt-oss-120b",
]

A non-exhaustive list of models supported by this component.

See https://www.ibm.com/docs/en/watsonx/saas?topic=solutions-supported-foundation-models for the full list of models and up-to-date model IDs.

init

__init__(
    *,
    api_key: Secret = Secret.from_env_var("WATSONX_API_KEY"),
    model: str = "ibm/granite-4-h-small",
    project_id: Secret = Secret.from_env_var("WATSONX_PROJECT_ID"),
    api_base_url: str = "https://us-south.ml.cloud.ibm.com",
    generation_kwargs: dict[str, Any] | None = None,
    timeout: float | None = None,
    max_retries: int | None = None,
    verify: bool | str | None = None,
    streaming_callback: StreamingCallbackT | None = None,
    tools: ToolsType | None = None
) -> None

Creates an instance of WatsonxChatGenerator.

Before initializing the component, you can set environment variables:

WATSONX_TIMEOUT to override the default timeout
WATSONX_MAX_RETRIES to override the default retry count

Parameters:

api_key (Secret) – IBM Cloud API key for watsonx.ai access. Can be set via WATSONX_API_KEY environment variable or passed directly.
model (str) – The model ID to use for completions. Defaults to "ibm/granite-4-h-small". Available models can be found in your IBM Cloud account.
project_id (Secret) – IBM Cloud project ID
api_base_url (str) – Custom base URL for the API endpoint. Defaults to "https://us-south.ml.cloud.ibm.com".
generation_kwargs (dict[str, Any] | None) – Additional parameters to control text generation. These parameters are passed directly to the watsonx.ai inference endpoint. Supported parameters include:
temperature: Controls randomness (lower = more deterministic)
max_new_tokens: Maximum number of tokens to generate
min_new_tokens: Minimum number of tokens to generate
top_p: Nucleus sampling probability threshold
top_k: Number of highest probability tokens to consider
repetition_penalty: Penalty for repeated tokens
length_penalty: Penalty based on output length
stop_sequences: List of sequences where generation should stop
random_seed: Seed for reproducible results
timeout (float | None) – Timeout in seconds for API requests. Defaults to environment variable WATSONX_TIMEOUT or 30 seconds.
max_retries (int | None) – Maximum number of retry attempts for failed requests. Defaults to environment variable WATSONX_MAX_RETRIES or 5.
verify (bool | str | None) – SSL verification setting. Can be:
True: Verify SSL certificates (default)
False: Skip verification (insecure)
Path to CA bundle for custom certificates
streaming_callback (StreamingCallbackT | None) – A callback function for streaming responses.
tools (ToolsType | None) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.

to_dict

to_dict() -> dict[str, Any]

Serialize the component to a dictionary.

Returns:

dict[str, Any] – The serialized component as a dictionary.

from_dict

from_dict(data: dict[str, Any]) -> WatsonxChatGenerator

Deserialize this component from a dictionary.

Parameters:

data (dict[str, Any]) – The dictionary representation of this component.

Returns:

WatsonxChatGenerator – The deserialized component instance.

run

run(
    *,
    messages: list[ChatMessage],
    generation_kwargs: dict[str, Any] | None = None,
    streaming_callback: StreamingCallbackT | None = None,
    tools: ToolsType | None = None
) -> dict[str, list[ChatMessage]]

Generate chat completions synchronously.

Parameters:

messages (list[ChatMessage]) – A list of ChatMessage instances representing the input messages.
generation_kwargs (dict[str, Any] | None) – Additional keyword arguments for text generation. These parameters will potentially override the parameters passed in the __init__ method.
streaming_callback (StreamingCallbackT | None) – A callback function that is called when a new token is received from the stream. If provided this will override the streaming_callback set in the __init__ method.
tools (ToolsType | None) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. If set, it will override the tools parameter provided during initialization.

Returns:

dict[str, list[ChatMessage]] – A dictionary with the following key:
replies: A list containing the generated responses as ChatMessage instances.

run_async

run_async(
    *,
    messages: list[ChatMessage],
    generation_kwargs: dict[str, Any] | None = None,
    streaming_callback: StreamingCallbackT | None = None,
    tools: ToolsType | None = None
) -> dict[str, list[ChatMessage]]

Generate chat completions asynchronously.

Parameters:

messages (list[ChatMessage]) – A list of ChatMessage instances representing the input messages.
generation_kwargs (dict[str, Any] | None) – Additional keyword arguments for text generation. These parameters will potentially override the parameters passed in the __init__ method.
streaming_callback (StreamingCallbackT | None) – A callback function that is called when a new token is received from the stream. If provided this will override the streaming_callback set in the __init__ method.
tools (ToolsType | None) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. If set, it will override the tools parameter provided during initialization.

Returns:

dict[str, list[ChatMessage]] – A dictionary with the following key:
replies: A list containing the generated responses as ChatMessage instances.

haystack_integrations.components.generators.watsonx.generator

WatsonxGenerator

Bases: WatsonxChatGenerator

Enables text completions using IBM's watsonx.ai foundation models.

This component extends WatsonxChatGenerator to provide the standard Generator interface that works with prompt strings instead of ChatMessage objects.

The generator works with IBM's foundation models that are listed here.

For details on watsonx.ai API parameters, see IBM watsonx.ai documentation.

Usage example

from haystack_integrations.components.generators.watsonx.generator import WatsonxGenerator
from haystack.utils import Secret

generator = WatsonxGenerator(
    api_key=Secret.from_env_var("WATSONX_API_KEY"),
    model="ibm/granite-4-h-small",
    project_id=Secret.from_env_var("WATSONX_PROJECT_ID"),
)

response = generator.run(
    prompt="Explain quantum computing in simple terms",
    system_prompt="You are a helpful physics teacher.",
)
print(response)

Output:

{
    "replies": ["Quantum computing uses quantum-mechanical phenomena like...."],
    "meta": [
        {
            "model": "ibm/granite-4-h-small",
            "project_id": "your-project-id",
            "usage": {
                "prompt_tokens": 12,
                "completion_tokens": 45,
                "total_tokens": 57,
            },
        }
    ],
}

SUPPORTED_MODELS

SUPPORTED_MODELS: list[str] = [
    "ibm/granite-3-1-8b-base",
    "ibm/granite-3-8b-instruct",
    "ibm/granite-4-h-small",
    "ibm/granite-8b-code-instruct",
    "ibm/granite-guardian-3-8b",
    "meta-llama/llama-3-1-70b-gptq",
    "meta-llama/llama-3-1-8b",
    "meta-llama/llama-3-2-11b-vision-instruct",
    "meta-llama/llama-3-2-90b-vision-instruct",
    "meta-llama/llama-3-3-70b-instruct",
    "meta-llama/llama-3-405b-instruct",
    "meta-llama/llama-4-maverick-17b-128e-instruct-fp8",
    "meta-llama/llama-guard-3-11b-vision",
    "mistral-large-2512",
    "mistralai/mistral-medium-2505",
    "mistralai/mistral-small-3-1-24b-instruct-2503",
    "openai/gpt-oss-120b",
]

A non-exhaustive list of models supported by this component.

See https://www.ibm.com/docs/en/watsonx/saas?topic=solutions-supported-foundation-models for the full list of models and up-to-date model IDs.

init

__init__(
    *,
    api_key: Secret = Secret.from_env_var("WATSONX_API_KEY"),
    model: str = "ibm/granite-4-h-small",
    project_id: Secret = Secret.from_env_var("WATSONX_PROJECT_ID"),
    api_base_url: str = "https://us-south.ml.cloud.ibm.com",
    system_prompt: str | None = None,
    generation_kwargs: dict[str, Any] | None = None,
    timeout: float | None = None,
    max_retries: int | None = None,
    verify: bool | str | None = None,
    streaming_callback: StreamingCallbackT | None = None
) -> None

Creates an instance of WatsonxGenerator.

Before initializing the component, you can set environment variables:

WATSONX_TIMEOUT to override the default timeout
WATSONX_MAX_RETRIES to override the default retry count

Parameters:

api_key (Secret) – IBM Cloud API key for watsonx.ai access. Can be set via WATSONX_API_KEY environment variable or passed directly.
model (str) – The model ID to use for completions. Defaults to "ibm/granite-4-h-small". Available models can be found in your IBM Cloud account.
project_id (Secret) – IBM Cloud project ID
api_base_url (str) – Custom base URL for the API endpoint. Defaults to "https://us-south.ml.cloud.ibm.com".
system_prompt (str | None) – The system prompt to use for text generation.
generation_kwargs (dict[str, Any] | None) – Additional parameters to control text generation. These parameters are passed directly to the watsonx.ai inference endpoint. Supported parameters include:
temperature: Controls randomness (lower = more deterministic)
max_new_tokens: Maximum number of tokens to generate
min_new_tokens: Minimum number of tokens to generate
top_p: Nucleus sampling probability threshold
top_k: Number of highest probability tokens to consider
repetition_penalty: Penalty for repeated tokens
length_penalty: Penalty based on output length
stop_sequences: List of sequences where generation should stop
random_seed: Seed for reproducible results
timeout (float | None) – Timeout in seconds for API requests. Defaults to environment variable WATSONX_TIMEOUT or 30 seconds.
max_retries (int | None) – Maximum number of retry attempts for failed requests. Defaults to environment variable WATSONX_MAX_RETRIES or 5.
verify (bool | str | None) – SSL verification setting. Can be:
True: Verify SSL certificates (default)
False: Skip verification (insecure)
Path to CA bundle for custom certificates
streaming_callback (StreamingCallbackT | None) – A callback function for streaming responses.

to_dict

to_dict() -> dict[str, Any]

Serialize the component to a dictionary.

Returns:

dict[str, Any] – The serialized component as a dictionary.

from_dict

from_dict(data: dict[str, Any]) -> WatsonxGenerator

Deserialize this component from a dictionary.

Parameters:

data (dict[str, Any]) – The dictionary representation of this component.

Returns:

WatsonxGenerator – The deserialized component instance.

run

run(
    *,
    prompt: str,
    system_prompt: str | None = None,
    streaming_callback: StreamingCallbackT | None = None,
    generation_kwargs: dict[str, Any] | None = None
) -> dict[str, Any]

Generate text completions synchronously.

Parameters:

prompt (str) – The input prompt string for text generation.
system_prompt (str | None) – An optional system prompt to provide context or instructions for the generation. If not provided, the system prompt set in the __init__ method will be used.
streaming_callback (StreamingCallbackT | None) – A callback function that is called when a new token is received from the stream. If provided, this will override the streaming_callback set in the __init__ method.
generation_kwargs (dict[str, Any] | None) – Additional keyword arguments for text generation. These parameters will potentially override the parameters passed in the __init__ method. Supported parameters include temperature, max_new_tokens, top_p, etc.

Returns:

dict[str, Any] – A dictionary with the following keys:
replies: A list of generated text completions as strings.
meta: A list of metadata dictionaries containing information about each generation, including model name, finish reason, and token usage statistics.

run_async

run_async(
    *,
    prompt: str,
    system_prompt: str | None = None,
    streaming_callback: StreamingCallbackT | None = None,
    generation_kwargs: dict[str, Any] | None = None
) -> dict[str, Any]

Generate text completions asynchronously.

Parameters:

prompt (str) – The input prompt string for text generation.
system_prompt (str | None) – An optional system prompt to provide context or instructions for the generation.
streaming_callback (StreamingCallbackT | None) – A callback function that is called when a new token is received from the stream. If provided, this will override the streaming_callback set in the __init__ method.
generation_kwargs (dict[str, Any] | None) – Additional keyword arguments for text generation. These parameters will potentially override the parameters passed in the __init__ method. Supported parameters include temperature, max_new_tokens, top_p, etc.

Returns:

dict[str, Any] – A dictionary with the following keys:
replies: A list of generated text completions as strings.
meta: A list of metadata dictionaries containing information about each generation, including model name, finish reason, and token usage statistics.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

haystack_integrations.components.embedders.watsonx.document_embedder

WatsonxDocumentEmbedder

Usage example

init

to_dict

from_dict

run

haystack_integrations.components.embedders.watsonx.text_embedder

WatsonxTextEmbedder

Usage example

init

to_dict

from_dict

run

haystack_integrations.components.generators.watsonx.chat.chat_generator

WatsonxChatGenerator

Usage example

Multimodal usage example

SUPPORTED_MODELS

init

to_dict

from_dict

run

run_async

haystack_integrations.components.generators.watsonx.generator

WatsonxGenerator

Usage example

SUPPORTED_MODELS

init

to_dict

from_dict

run

run_async

Uh oh!

FilesExpand file tree

watsonx.md

Latest commit

History

watsonx.md

File metadata and controls

haystack_integrations.components.embedders.watsonx.document_embedder

WatsonxDocumentEmbedder

Usage example

init

to_dict

from_dict

run

haystack_integrations.components.embedders.watsonx.text_embedder

WatsonxTextEmbedder

Usage example

init

to_dict

from_dict

run

haystack_integrations.components.generators.watsonx.chat.chat_generator

WatsonxChatGenerator

Usage example

Multimodal usage example

SUPPORTED_MODELS

init

to_dict

from_dict

run

run_async

haystack_integrations.components.generators.watsonx.generator

WatsonxGenerator

Usage example

SUPPORTED_MODELS

init

to_dict

from_dict

run

run_async