| title | Utils |
|---|---|
| id | utils-api |
| description | Utility functions and classes used across the library. |
| slug | /utils-api |
is_callable_async_compatible(func: Callable) -> boolReturns if the given callable is usable inside a component's run_async method.
Parameters:
- func (
Callable) – The callable to check.
Returns:
bool– True if the callable is compatible, False otherwise.
Bases: Enum
Type of secret: token (API key) or environment variable.
from_str(string: str) -> SecretTypeConvert a string to a SecretType.
Parameters:
- string (
str) – The string to convert.
Bases: ABC
Encapsulates a secret used for authentication.
Usage example:
from haystack.components.generators import OpenAIGenerator
from haystack.utils import Secret
generator = OpenAIGenerator(api_key=Secret.from_token("<here_goes_your_token>"))from_token(token: str) -> SecretCreate a token-based secret. Cannot be serialized.
Parameters:
- token (
str) – The token to use for authentication.
from_env_var(env_vars: str | list[str], *, strict: bool = True) -> SecretCreate an environment variable-based secret. Accepts one or more environment variables.
Upon resolution, it returns a string token from the first environment variable that is set.
Parameters:
- env_vars (
str | list[str]) – A single environment variable or an ordered list of candidate environment variables. - strict (
bool) – Whether to raise an exception if none of the environment variables are set.
to_dict() -> dict[str, Any]Convert the secret to a JSON-serializable dictionary.
Some secrets may not be serializable.
Returns:
dict[str, Any]– The serialized policy.
from_dict(dict: dict[str, Any]) -> SecretCreate a secret from a JSON-serializable dictionary.
Parameters:
- dict (
dict[str, Any]) – The dictionary with the serialized data.
Returns:
Secret– The deserialized secret.
resolve_value() -> Any | NoneResolve the secret to an atomic value. The semantics of the value is secret-dependent.
Returns:
Any | None– The value of the secret, if any.
type: SecretTypeThe type of the secret.
Bases: Secret
A secret that uses a string token/API key.
Cannot be serialized.
resolve_value() -> Any | NoneReturn the token.
type: SecretTypeThe type of the secret.
Bases: Secret
A secret that accepts one or more environment variables.
Upon resolution, it returns a string token from the first environment variable that is set. Can be serialized.
resolve_value() -> Any | NoneResolve the secret to an atomic value. The semantics of the value is secret-dependent.
type: SecretTypeThe type of the secret.
deserialize_secrets_inplace(
data: dict[str, Any], keys: Iterable[str], *, recursive: bool = False
) -> NoneDeserialize secrets in a dictionary inplace.
Parameters:
- data (
dict[str, Any]) – The dictionary with the serialized data. - keys (
Iterable[str]) – The keys of the secrets to deserialize. - recursive (
bool) – Whether to recursively deserialize nested dictionaries.
default_azure_ad_token_provider() -> strGet a Azure AD token using the DefaultAzureCredential and the "https://cognitiveservices.azure.com/.default" scope.
serialize_class_instance(obj: Any) -> dict[str, Any]Serializes an object that has a to_dict method into a dictionary.
Parameters:
- obj (
Any) – The object to be serialized.
Returns:
dict[str, Any]– A dictionary representation of the object.
Raises:
SerializationError– If the object does not have ato_dictmethod.
deserialize_class_instance(data: dict[str, Any]) -> AnyDeserializes an object from a dictionary representation generated by auto_serialize_class_instance.
Parameters:
- data (
dict[str, Any]) – The dictionary to deserialize from.
Returns:
Any– The deserialized object.
Raises:
DeserializationError– If the serialization data is malformed, the class type cannot be imported, or the class does not have afrom_dictmethod.
serialize_callable(callable_handle: Callable) -> strSerializes a callable to its full path.
Parameters:
- callable_handle (
Callable) – The callable to serialize
Returns:
str– The full path of the callable
deserialize_callable(callable_handle: str) -> CallableDeserializes a callable given its full import path as a string.
Parameters:
- callable_handle (
str) – The full path of the callable_handle
Returns:
Callable– The callable
Raises:
DeserializationError– If the callable cannot be found
deserialize_chatgenerator_inplace(
data: dict[str, Any], key: str = "chat_generator"
) -> NoneDeserialize a ChatGenerator in a dictionary inplace.
Parameters:
- data (
dict[str, Any]) – The dictionary with the serialized data. - key (
str) – The key in the dictionary where the ChatGenerator is stored.
Raises:
DeserializationError– If the key is missing in the serialized data, the value is not a dictionary, the type key is missing, the class cannot be imported, or the class lacks a 'from_dict' method.
deserialize_component_inplace(
data: dict[str, Any], key: str = "chat_generator"
) -> NoneDeserialize a Component in a dictionary inplace.
Parameters:
- data (
dict[str, Any]) – The dictionary with the serialized data. - key (
str) – The key in the dictionary where the Component is stored. Default is "chat_generator".
Raises:
DeserializationError– If the key is missing in the serialized data, the value is not a dictionary, the type key is missing, the class cannot be imported, or the class lacks a 'from_dict' method.
Bases: Enum
Represents device types supported by Haystack.
This also includes devices that are not directly used by models - for example, the disk device is exclusively used in device maps for frameworks that support offloading model weights to disk.
from_str(string: str) -> DeviceTypeCreate a device type from a string.
Parameters:
- string (
str) – The string to convert.
Returns:
DeviceType– The device type.
A generic representation of a device.
Parameters:
- type (
DeviceType) – The device type. - id (
int | None) – The optional device id.
__init__(type: DeviceType, id: int | None = None) -> NoneCreate a generic device.
Parameters:
- type (
DeviceType) – The device type. - id (
int | None) – The device id.
cpu() -> DeviceCreate a generic CPU device.
Returns:
Device– The CPU device.
gpu(id: int = 0) -> DeviceCreate a generic GPU device.
Parameters:
- id (
int) – The GPU id.
Returns:
Device– The GPU device.
disk() -> DeviceCreate a generic disk device.
Returns:
Device– The disk device.
mps() -> DeviceCreate a generic Apple Metal Performance Shader device.
Returns:
Device– The MPS device.
xpu() -> DeviceCreate a generic Intel GPU Optimization device.
Returns:
Device– The XPU device.
from_str(string: str) -> DeviceCreate a generic device from a string.
Returns:
Device– The device.
A generic mapping from strings to devices.
The semantics of the strings are dependent on target framework. Primarily used to deploy HuggingFace models to multiple devices.
Parameters:
- mapping (
dict[str, Device]) – Dictionary mapping strings to devices.
to_dict() -> dict[str, str]Serialize the mapping to a JSON-serializable dictionary.
Returns:
dict[str, str]– The serialized mapping.
first_device: Device | NoneReturn the first device in the mapping, if any.
Returns:
Device | None– The first device.
from_dict(dict: dict[str, str]) -> DeviceMapCreate a generic device map from a JSON-serialized dictionary.
Parameters:
- dict (
dict[str, str]) – The serialized mapping.
Returns:
DeviceMap– The generic device map.
from_hf(hf_device_map: dict[str, Union[int, str, torch.device]]) -> DeviceMapCreate a generic device map from a HuggingFace device map.
Parameters:
- hf_device_map (
dict[str, Union[int, str, device]]) – The HuggingFace device map.
Returns:
DeviceMap– The deserialized device map.
Raises:
TypeError– If a device value in the map is not an int, str, or torch.device.
A representation of a device for a component.
This can be either a single device or a device map.
from_str(device_str: str) -> ComponentDeviceCreate a component device representation from a device string.
The device string can only represent a single device.
Parameters:
- device_str (
str) – The device string.
Returns:
ComponentDevice– The component device representation.
from_single(device: Device) -> ComponentDeviceCreate a component device representation from a single device.
Disks cannot be used as single devices.
Parameters:
- device (
Device) – The device.
Returns:
ComponentDevice– The component device representation.
from_multiple(device_map: DeviceMap) -> ComponentDeviceCreate a component device representation from a device map.
Parameters:
- device_map (
DeviceMap) – The device map.
Returns:
ComponentDevice– The component device representation.
to_torch() -> torch.deviceConvert the component device representation to PyTorch format.
Device maps are not supported.
Returns:
device– The PyTorch device representation.
to_torch_str() -> strConvert the component device representation to PyTorch string format.
Device maps are not supported.
Returns:
str– The PyTorch device string representation.
to_spacy() -> intConvert the component device representation to spaCy format.
Device maps are not supported.
Returns:
int– The spaCy device representation.
to_hf() -> int | str | dict[str, int | str]Convert the component device representation to HuggingFace format.
Returns:
int | str | dict[str, int | str]– The HuggingFace device representation.
update_hf_kwargs(
hf_kwargs: dict[str, Any], *, overwrite: bool
) -> dict[str, Any]Convert the component device representation to HuggingFace format.
Add them as canonical keyword arguments to the keyword arguments dictionary.
Parameters:
- hf_kwargs (
dict[str, Any]) – The HuggingFace keyword arguments dictionary. - overwrite (
bool) – Whether to overwrite existing device arguments.
Returns:
dict[str, Any]– The HuggingFace keyword arguments dictionary.
has_multiple_devices: boolWhether this component device representation contains multiple devices.
first_device: Optional[ComponentDevice]Return either the single device or the first device in the device map, if any.
Returns:
Optional[ComponentDevice]– The first device.
resolve_device(device: Optional[ComponentDevice] = None) -> ComponentDeviceSelect a device for a component. If a device is specified, it's used. Otherwise, the default device is used.
Parameters:
- device (
Optional[ComponentDevice]) – The provided device, if any.
Returns:
ComponentDevice– The resolved device.
to_dict() -> dict[str, Any]Convert the component device representation to a JSON-serializable dictionary.
Returns:
dict[str, Any]– The dictionary representation.
from_dict(dict: dict[str, Any]) -> ComponentDeviceCreate a component device representation from a JSON-serialized dictionary.
Parameters:
- dict (
dict[str, Any]) – The serialized representation.
Returns:
ComponentDevice– The deserialized component device.
raise_on_invalid_filter_syntax(filters: dict[str, Any] | None = None) -> NoneRaise an error if the filter syntax is invalid.
document_matches_filter(
filters: dict[str, Any], document: Document | ByteStream
) -> boolReturn whether filters match the Document or the ByteStream.
For a detailed specification of the filters, refer to the
DocumentStore.filter_documents() protocol documentation.
init_http_client(
http_client_kwargs: dict[str, Any] | None = None, async_client: bool = False
) -> httpx.Client | httpx.AsyncClient | NoneInitialize an httpx client based on the http_client_kwargs.
Parameters:
- http_client_kwargs (
dict[str, Any] | None) – The kwargs to pass to the httpx client. - async_client (
bool) – Whether to initialize an async client.
Returns:
Client | AsyncClient | None– A httpx client or an async httpx client.
Bases: Extension
A Jinja2 extension for creating structured chat messages with mixed content types.
This extension provides a custom {% message %} tag that allows creating chat messages
with different attributes (role, name, meta) and mixed content types (text, images, etc.).
Inspired by Banks.
Example:
{% message role="system" %}
You are a helpful assistant. You like to talk with {{user_name}}.
{% endmessage %}
{% message role="user" %}
Hello! I am {{user_name}}. Please describe the images.
{% for image in images %}
{{ image | templatize_part }}
{% endfor %}
{% endmessage %}
- The
{% message %}tag is used to define a chat message. - The message can contain text and other structured content parts.
- To include a structured content part in the message, the
| templatize_partfilter is used. The filter serializes the content part into a JSON string and wraps it in a<haystack_content_part>tag. - The
_build_chat_message_jsonmethod of the extension parses the message content parts, converts them into a ChatMessage object and serializes it to a JSON string. - The obtained JSON string is usable in the ChatPromptBuilder component, where templates are rendered to actual ChatMessage objects.
parse(parser: Any) -> nodes.Node | list[nodes.Node]Parse the message tag and its attributes in the Jinja2 template.
This method handles the parsing of role (mandatory), name (optional), meta (optional) and message body content.
Parameters:
- parser (
Any) – The Jinja2 parser instance
Returns:
Node | list[Node]– A CallBlock node containing the parsed message configuration
Raises:
TemplateSyntaxError– If an invalid role is provided
templatize_part(value: ChatMessageContentT) -> MarkupJinja filter to convert an ChatMessageContentT object into JSON string wrapped in special XML content tags.
Parameters:
- value (
ChatMessageContentT) – The ChatMessageContentT object to convert
Returns:
Markup– A JSON string wrapped in special XML content tags marked as safe
Raises:
ValueError– If the value is not an instance of ChatMessageContentT
Bases: Extension
A Jinja2 extension for formatting dates and times.
__init__(environment: Environment) -> NoneInitializes the JinjaTimeExtension object.
Parameters:
- environment (
Environment) – The Jinja2 environment to initialize the extension with. It provides the context where the extension will operate.
parse(parser: Any) -> nodes.Node | list[nodes.Node]Parse the template expression to determine how to handle the datetime formatting.
Parameters:
- parser (
Any) – The parser object that processes the template expressions and manages the syntax tree. It's used to interpret the template's structure.
is_in_jupyter() -> boolReturns True if in Jupyter or Google Colab, False otherwise.
expand_page_range(page_range: list[str | int]) -> list[int]Takes a list of page numbers and ranges and expands them into a list of page numbers.
For example, given a page_range=['1-3', '5', '8', '10-12'] the function will return [1, 2, 3, 5, 8, 10, 11, 12]
Parameters:
- page_range (
list[str | int]) – List of page numbers and ranges
Returns:
list[int]– An expanded list of page integers
expit(x: float | ndarray[Any, Any]) -> float | ndarray[Any, Any]Compute logistic sigmoid function. Maps input values to a range between 0 and 1
Parameters:
- x (
float | ndarray[Any, Any]) – input value. Can be a scalar or a numpy array.
request_with_retry(
attempts: int = 3,
status_codes_to_retry: list[int] | None = None,
**kwargs: Any
) -> httpx.ResponseExecutes an HTTP request with a configurable exponential backoff retry on failures.
Usage example:
from haystack.utils import request_with_retry
# Sending an HTTP request with default retry configs
res = request_with_retry(method="GET", url="https://example.com")
# Sending an HTTP request with custom number of attempts
res = request_with_retry(method="GET", url="https://example.com", attempts=10)
# Sending an HTTP request with custom HTTP codes to retry
res = request_with_retry(method="GET", url="https://example.com", status_codes_to_retry=[408, 503])
# Sending an HTTP request with custom timeout in seconds
res = request_with_retry(method="GET", url="https://example.com", timeout=5)
# Sending an HTTP request with custom headers
res = request_with_retry(method="GET", url="https://example.com", headers={"Authorization": "Bearer <token>"})
# Sending a POST request
res = request_with_retry(method="POST", url="https://example.com", json={"key": "value"}, attempts=10)
# Retry all 5xx status codes
res = request_with_retry(method="GET", url="https://example.com", status_codes_to_retry=list(range(500, 600)))Parameters:
- attempts (
int) – Maximum number of attempts to retry the request. - status_codes_to_retry (
list[int] | None) – List of HTTP status codes that will trigger a retry. When param isNone, HTTP 408, 418, 429 and 503 will be retried. - kwargs (
Any) – Optional arguments thathttpx.Client.requestaccepts.
Returns:
Response– Thehttpx.Responseobject.
async_request_with_retry(
attempts: int = 3,
status_codes_to_retry: list[int] | None = None,
**kwargs: Any
) -> httpx.ResponseExecutes an asynchronous HTTP request with a configurable exponential backoff retry on failures.
Usage example:
import asyncio
from haystack.utils import async_request_with_retry
# Sending an async HTTP request with default retry configs
async def example():
res = await async_request_with_retry(method="GET", url="https://example.com")
return res
# Sending an async HTTP request with custom number of attempts
async def example_with_attempts():
res = await async_request_with_retry(method="GET", url="https://example.com", attempts=10)
return res
# Sending an async HTTP request with custom HTTP codes to retry
async def example_with_status_codes():
res = await async_request_with_retry(method="GET", url="https://example.com", status_codes_to_retry=[408, 503])
return res
# Sending an async HTTP request with custom timeout in seconds
async def example_with_timeout():
res = await async_request_with_retry(method="GET", url="https://example.com", timeout=5)
return res
# Sending an async HTTP request with custom headers
async def example_with_headers():
headers = {"Authorization": "Bearer <my_token_here>"}
res = await async_request_with_retry(method="GET", url="https://example.com", headers=headers)
return res
# All of the above combined
async def example_combined():
headers = {"Authorization": "Bearer <my_token_here>"}
res = await async_request_with_retry(
method="GET",
url="https://example.com",
headers=headers,
attempts=10,
status_codes_to_retry=[408, 503],
timeout=5
)
return res
# Sending an async POST request
async def example_post():
res = await async_request_with_retry(
method="POST",
url="https://example.com",
json={"key": "value"},
attempts=10
)
return res
# Retry all 5xx status codes
async def example_5xx():
res = await async_request_with_retry(
method="GET",
url="https://example.com",
status_codes_to_retry=list(range(500, 600))
)
return resParameters:
- attempts (
int) – Maximum number of attempts to retry the request. - status_codes_to_retry (
list[int] | None) – List of HTTP status codes that will trigger a retry. When param isNone, HTTP 408, 418, 429 and 503 will be retried. - kwargs (
Any) – Optional arguments thathttpx.AsyncClient.requestaccepts.
Returns:
Response– Thehttpx.Responseobject.
serialize_type(target: Any) -> strSerializes a type or an instance to its string representation, including the module name.
This function handles types, instances of types, and special typing objects. It assumes that non-typing objects will have a 'name' attribute.
Parameters:
- target (
Any) – The object to serialize, can be an instance or a type.
Returns:
str– The string representation of the type.
deserialize_type(type_str: str) -> AnyDeserializes a type given its full import path as a string, including nested generic types.
This function will dynamically import the module if it's not already imported
and then retrieve the type object from it. It also handles nested generic types like
list[dict[int, str]].
Parameters:
- type_str (
str) – The string representation of the type's full import path.
Returns:
Any– The deserialized type object.
Raises:
DeserializationError– If the type cannot be deserialized due to missing module or type.
thread_safe_import(module_name: str) -> ModuleTypeImport a module in a thread-safe manner.
Importing modules in a multi-threaded environment can lead to race conditions. This function ensures that the module is imported in a thread-safe manner without having impact on the performance of the import for single-threaded environments.
Parameters:
- module_name (
str) – the module to import
is_valid_http_url(url: str) -> boolCheck if a URL is a valid HTTP/HTTPS URL.