Skip to content

Latest commit

 

History

History
114 lines (75 loc) · 2.98 KB

File metadata and controls

114 lines (75 loc) · 2.98 KB
title Caching
id caching-api
description Checks if any document coming from the given URL is already present in the store.
slug /caching-api

cache_checker

CacheChecker

Checks for the presence of documents in a Document Store based on a specified field in each document's metadata.

If matching documents are found, they are returned as "hits". If not found in the cache, the items are returned as "misses".

Usage example

from haystack import Document
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.components.caching.cache_checker import CacheChecker

docstore = InMemoryDocumentStore()
documents = [
    Document(content="doc1", meta={"url": "https://example.com/1"}),
    Document(content="doc2", meta={"url": "https://example.com/2"}),
    Document(content="doc3", meta={"url": "https://example.com/1"}),
    Document(content="doc4", meta={"url": "https://example.com/2"}),
]
docstore.write_documents(documents)
checker = CacheChecker(docstore, cache_field="url")
results = checker.run(items=["https://example.com/1", "https://example.com/5"])
assert results == {"hits": [documents[0], documents[2]], "misses": ["https://example.com/5"]}

init

__init__(document_store: DocumentStore, cache_field: str) -> None

Creates a CacheChecker component.

Parameters:

  • document_store (DocumentStore) – Document Store to check for the presence of specific documents.
  • cache_field (str) – Name of the document's metadata field to check for cache hits.

to_dict

to_dict() -> dict[str, Any]

Serializes the component to a dictionary.

Returns:

  • dict[str, Any] – Dictionary with serialized data.

from_dict

from_dict(data: dict[str, Any]) -> CacheChecker

Deserializes the component from a dictionary.

Parameters:

  • data (dict[str, Any]) – Dictionary to deserialize from.

Returns:

  • CacheChecker – Deserialized component.

run

run(items: list[Any]) -> dict[str, Any]

Checks if any document associated with the specified cache field is already present in the store.

Parameters:

  • items (list[Any]) – Values to be checked against the cache field.

Returns:

  • dict[str, Any] – A dictionary with two keys:
  • hits - Documents that matched with at least one of the items.
  • misses - Items that were not present in any documents.

run_async

run_async(items: list[Any]) -> dict[str, Any]

Asynchronously checks if any document associated with the specified cache field is already present in the store.

Parameters:

  • items (list[Any]) – Values to be checked against the cache field.

Returns:

  • dict[str, Any] – A dictionary with two keys:
  • hits - Documents that matched with at least one of the items.
  • misses - Items that were not present in any documents.