| title | Websearch |
|---|---|
| id | websearch-api |
| description | Web search engine for Haystack. |
| slug | /websearch-api |
Uses SearchApi to search the web for relevant documents.
Usage example: {/* test-ignore */}
from haystack.components.websearch import SearchApiWebSearch
from haystack.utils import Secret
websearch = SearchApiWebSearch(top_k=10, api_key=Secret.from_env_var("SERPERDEV_API_KEY"))
results = websearch.run(query="Who is the boyfriend of Olivia Wilde?")
assert results["documents"]
assert results["links"]__init__(
api_key: Secret = Secret.from_env_var("SEARCHAPI_API_KEY"),
top_k: int | None = 10,
allowed_domains: list[str] | None = None,
search_params: dict[str, Any] | None = None,
) -> NoneInitialize the SearchApiWebSearch component.
Parameters:
- api_key (
Secret) – API key for the SearchApi API - top_k (
int | None) – Number of documents to return. - allowed_domains (
list[str] | None) – List of domains to limit the search to. - search_params (
dict[str, Any] | None) – Additional parameters passed to the SearchApi API. For example, you can set 'num' to 100 to increase the number of search results. See the SearchApi website for more details.
The default search engine is Google, however, users can change it by setting the engine
parameter in the search_params.
to_dict() -> dict[str, Any]Serializes the component to a dictionary.
Returns:
dict[str, Any]– Dictionary with serialized data.
from_dict(data: dict[str, Any]) -> SearchApiWebSearchDeserializes the component from a dictionary.
Parameters:
- data (
dict[str, Any]) – The dictionary to deserialize from.
Returns:
SearchApiWebSearch– The deserialized component.
run(query: str) -> dict[str, list[Document] | list[str]]Uses SearchApi to search the web.
Parameters:
- query (
str) – Search query.
Returns:
dict[str, list[Document] | list[str]]– A dictionary with the following keys:- "documents": List of documents returned by the search engine.
- "links": List of links returned by the search engine.
Raises:
TimeoutError– If the request to the SearchApi API times out.SearchApiError– If an error occurs while querying the SearchApi API.
run_async(query: str) -> dict[str, list[Document] | list[str]]Asynchronously uses SearchApi to search the web.
This is the asynchronous version of the run method with the same parameters and return values.
Parameters:
- query (
str) – Search query.
Returns:
dict[str, list[Document] | list[str]]– A dictionary with the following keys:- "documents": List of documents returned by the search engine.
- "links": List of links returned by the search engine.
Raises:
TimeoutError– If the request to the SearchApi API times out.SearchApiError– If an error occurs while querying the SearchApi API.
Uses Serper to search the web for relevant documents.
See the Serper Dev website for more details.
Usage example: {/* test-ignore */}
from haystack.components.websearch import SerperDevWebSearch
from haystack.utils import Secret
serper_dev_api = Secret.from_env_var("SERPERDEV_API_KEY")
websearch = SerperDevWebSearch(top_k=10, api_key=serper_dev_api)
results = websearch.run(query="Who is the boyfriend of Olivia Wilde?")
assert results["documents"]
assert results["links"]
# Example with domain filtering - exclude subdomains
websearch_filtered = SerperDevWebSearch(
top_k=10,
allowed_domains=["example.com"],
exclude_subdomains=True, # Only results from example.com, not blog.example.com
api_key=serper_dev_api
)
results_filtered = websearch_filtered.run(query="search query")__init__(
api_key: Secret = Secret.from_env_var("SERPERDEV_API_KEY"),
top_k: int | None = 10,
allowed_domains: list[str] | None = None,
search_params: dict[str, Any] | None = None,
*,
exclude_subdomains: bool = False
) -> NoneInitialize the SerperDevWebSearch component.
Parameters:
- api_key (
Secret) – API key for the Serper API. - top_k (
int | None) – Number of documents to return. - allowed_domains (
list[str] | None) – List of domains to limit the search to. - exclude_subdomains (
bool) – Whether to exclude subdomains when filtering by allowed_domains. If True, only results from the exact domains in allowed_domains will be returned. If False, results from subdomains will also be included. Defaults to False. - search_params (
dict[str, Any] | None) – Additional parameters passed to the Serper API. For example, you can set 'num' to 20 to increase the number of search results. See the Serper website for more details.
to_dict() -> dict[str, Any]Serializes the component to a dictionary.
Returns:
dict[str, Any]– Dictionary with serialized data.
from_dict(data: dict[str, Any]) -> SerperDevWebSearchDeserializes the component from a dictionary.
Parameters:
- data (
dict[str, Any]) – The dictionary to deserialize from.
Returns:
SerperDevWebSearch– The deserialized component.
run(query: str) -> dict[str, list[Document] | list[str]]Use Serper to search the web.
Parameters:
- query (
str) – Search query.
Returns:
dict[str, list[Document] | list[str]]– A dictionary with the following keys:- "documents": List of documents returned by the search engine.
- "links": List of links returned by the search engine.
Raises:
SerperDevError– If an error occurs while querying the SerperDev API.TimeoutError– If the request to the SerperDev API times out.
run_async(query: str) -> dict[str, list[Document] | list[str]]Asynchronously uses Serper to search the web.
This is the asynchronous version of the run method with the same parameters and return values.
Parameters:
- query (
str) – Search query.
Returns:
dict[str, list[Document] | list[str]]– A dictionary with the following keys:- "documents": List of documents returned by the search engine.
- "links": List of links returned by the search engine.
Raises:
SerperDevError– If an error occurs while querying the SerperDev API.TimeoutError– If the request to the SerperDev API times out.