| title | FAISS |
|---|---|
| id | integrations-faiss |
| description | FAISS integration for Haystack |
| slug | /integrations-faiss |
Retrieves documents from the FAISSDocumentStore, based on their dense embeddings.
Example usage:
from haystack import Document, Pipeline
from haystack.components.embedders import SentenceTransformersTextEmbedder, SentenceTransformersDocumentEmbedder
from haystack.document_stores.types import DuplicatePolicy
from haystack_integrations.document_stores.faiss import FAISSDocumentStore
from haystack_integrations.components.retrievers.faiss import FAISSEmbeddingRetriever
document_store = FAISSDocumentStore(embedding_dim=768)
documents = [
Document(content="There are over 7,000 languages spoken around the world today."),
Document(content="Elephants have been observed to behave in a way that indicates a high level of intelligence."),
Document(content="In certain places, you can witness the phenomenon of bioluminescent waves."),
]
document_embedder = SentenceTransformersDocumentEmbedder()
document_embedder.warm_up()
documents_with_embeddings = document_embedder.run(documents)["documents"]
document_store.write_documents(documents_with_embeddings, policy=DuplicatePolicy.OVERWRITE)
query_pipeline = Pipeline()
query_pipeline.add_component("text_embedder", SentenceTransformersTextEmbedder())
query_pipeline.add_component("retriever", FAISSEmbeddingRetriever(document_store=document_store))
query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
query = "How many languages are there?"
res = query_pipeline.run({"text_embedder": {"text": query}})
assert res["retriever"]["documents"][0].content == "There are over 7,000 languages spoken around the world today."__init__(
*,
document_store: FAISSDocumentStore,
filters: dict[str, Any] | None = None,
top_k: int = 10,
filter_policy: str | FilterPolicy = FilterPolicy.REPLACE
) -> NoneInitialize FAISSEmbeddingRetriever.
Parameters:
- document_store (
FAISSDocumentStore) – An instance ofFAISSDocumentStore. - filters (
dict[str, Any] | None) – Filters applied to the retrieved Documents at initialisation time. At runtime, these are merged with any runtime filters according to thefilter_policy. - top_k (
int) – Maximum number of Documents to return. - filter_policy (
str | FilterPolicy) – Policy to determine how init-time and runtime filters are combined. SeeFilterPolicyfor details. Defaults toFilterPolicy.REPLACE.
Raises:
ValueError– Ifdocument_storeis not an instance ofFAISSDocumentStore.
to_dict() -> dict[str, Any]Serializes the component to a dictionary.
Returns:
dict[str, Any]– Dictionary with serialized data.
from_dict(data: dict[str, Any]) -> FAISSEmbeddingRetrieverDeserializes the component from a dictionary.
Parameters:
- data (
dict[str, Any]) – Dictionary to deserialize from.
Returns:
FAISSEmbeddingRetriever– Deserialized component.
run(
query_embedding: list[float],
filters: dict[str, Any] | None = None,
top_k: int | None = None,
) -> dict[str, list[Document]]Retrieve documents from the FAISSDocumentStore, based on their embeddings.
Parameters:
- query_embedding (
list[float]) – Embedding of the query. - filters (
dict[str, Any] | None) – Filters applied to the retrieved Documents. The way runtime filters are applied depends on thefilter_policychosen at retriever initialization. See init method docstring for more details. - top_k (
int | None) – Maximum number of Documents to return. Overrides the value set at initialization.
Returns:
dict[str, list[Document]]– A dictionary with the following keys:documents: List ofDocuments that are similar toquery_embedding.
run_async(
query_embedding: list[float],
filters: dict[str, Any] | None = None,
top_k: int | None = None,
) -> dict[str, list[Document]]Asynchronously retrieve documents from the FAISSDocumentStore, based on their embeddings.
Since FAISS search is CPU-bound and fully in-memory, this delegates directly to the synchronous
run() method. No I/O or network calls are involved.
Parameters:
- query_embedding (
list[float]) – Embedding of the query. - filters (
dict[str, Any] | None) – Filters applied to the retrieved Documents. The way runtime filters are applied depends on thefilter_policychosen at retriever initialization. See init method docstring for more details. - top_k (
int | None) – Maximum number of Documents to return. Overrides the value set at initialization.
Returns:
dict[str, list[Document]]– A dictionary with the following keys:documents: List ofDocuments that are similar toquery_embedding.
A Document Store using FAISS for vector search and a simple JSON file for metadata storage.
This Document Store is suitable for small to medium-sized datasets where simplicity is preferred over scalability.
It supports basic persistence by saving the FAISS index to a .faiss file and documents to a .json file.
__init__(
index_path: str | None = None,
index_string: str = "Flat",
embedding_dim: int = 768,
) -> NoneInitializes the FAISSDocumentStore.
Parameters:
- index_path (
str | None) – Path to save/load the index and documents. If None, the store is in-memory only. - index_string (
str) – The FAISS index factory string. Default is "Flat". - embedding_dim (
int) – The dimension of the embeddings. Default is 768.
Raises:
DocumentStoreError– If the FAISS index cannot be initialized.ValueError– Ifindex_pathpoints to a missing.faissfile when loading persisted data.
count_documents() -> intReturns the number of documents in the store.
filter_documents(filters: dict[str, Any] | None = None) -> list[Document]Returns documents that match the provided filters.
Parameters:
- filters (
dict[str, Any] | None) – A dictionary of filters to apply.
Returns:
list[Document]– A list of matching Documents.
Raises:
FilterError– If the filter structure is invalid.
write_documents(
documents: list[Document], policy: DuplicatePolicy = DuplicatePolicy.FAIL
) -> intWrites documents to the store.
Parameters:
- documents (
list[Document]) – The list of documents to write. - policy (
DuplicatePolicy) – The policy to handle duplicate documents.
Returns:
int– The number of documents written.
Raises:
ValueError– Ifdocumentsis not an iterable ofDocumentobjects.DuplicateDocumentError– If a duplicate document is found andpolicyisDuplicatePolicy.FAIL.DocumentStoreError– If the FAISS index is unexpectedly unavailable when adding embeddings.
delete_documents(document_ids: list[str]) -> NoneDeletes documents from the store.
Raises:
DocumentStoreError– If the FAISS index is unexpectedly unavailable when removing embeddings.
delete_all_documents() -> NoneDeletes all documents from the store.
search(
query_embedding: list[float],
top_k: int = 10,
filters: dict[str, Any] | None = None,
) -> list[Document]Performs a vector search.
Parameters:
- query_embedding (
list[float]) – The query embedding. - top_k (
int) – The number of results to return. - filters (
dict[str, Any] | None) – Filters to apply.
Returns:
list[Document]– A list of matching Documents.
Raises:
FilterError– If the filter structure is invalid.
delete_by_filter(filters: dict[str, Any]) -> intDeletes documents that match the provided filters from the store.
Parameters:
- filters (
dict[str, Any]) – A dictionary of filters to apply to find documents to delete.
Returns:
int– The number of documents deleted.
Raises:
FilterError– If the filter structure is invalid.DocumentStoreError– If the FAISS index is unexpectedly unavailable when removing embeddings.
count_documents_by_filter(filters: dict[str, Any]) -> intReturns the number of documents that match the provided filters.
Parameters:
- filters (
dict[str, Any]) – A dictionary of filters to apply.
Returns:
int– The number of matching documents.
Raises:
FilterError– If the filter structure is invalid.
update_by_filter(filters: dict[str, Any], meta: dict[str, Any]) -> intUpdates documents that match the provided filters with the new metadata.
Note: Updates are performed in-memory only. To persist these changes,
you must explicitly call save() after updating.
Parameters:
- filters (
dict[str, Any]) – A dictionary of filters to apply to find documents to update. - meta (
dict[str, Any]) – A dictionary of metadata key-value pairs to update in the matching documents.
Returns:
int– The number of documents updated.
Raises:
FilterError– If the filter structure is invalid.
get_metadata_fields_info() -> dict[str, dict[str, Any]]Infers and returns the types of all metadata fields from the stored documents.
Returns:
dict[str, dict[str, Any]]– A dictionary mapping field names to dictionaries with a "type" key (e.g.{"field": {"type": "long"}}).
get_metadata_field_min_max(field_name: str) -> dict[str, Any]Returns the minimum and maximum values for a specific metadata field.
Parameters:
- field_name (
str) – The name of the metadata field.
Returns:
dict[str, Any]– A dictionary with keys "min" and "max" containing the respective min and max values.
get_metadata_field_unique_values(field_name: str) -> list[Any]Returns all unique values for a specific metadata field.
Parameters:
- field_name (
str) – The name of the metadata field.
Returns:
list[Any]– A list of unique values for the specified field.
count_unique_metadata_by_filter(
filters: dict[str, Any], metadata_fields: list[str]
) -> dict[str, int]Returns a count of unique values for multiple metadata fields, optionally scoped by a filter.
Parameters:
- filters (
dict[str, Any]) – A dictionary of filters to apply. - metadata_fields (
list[str]) – A list of metadata field names to count unique values for.
Returns:
dict[str, int]– A dictionary mapping each field name to the count of its unique values.
to_dict() -> dict[str, Any]Serializes the store to a dictionary.
from_dict(data: dict[str, Any]) -> FAISSDocumentStoreDeserializes the store from a dictionary.
save(index_path: str | Path) -> NoneSaves the index and documents to disk.
Raises:
DocumentStoreError– If the FAISS index is unexpectedly unavailable.
load(index_path: str | Path) -> NoneLoads the index and documents from disk.
Raises:
ValueError– If the.faissfile does not exist.