I’ve stored a ChromaDocumentStore locally using store.py, and it works perfectly—creating and persisting the DB as expected.
However, when I try to reuse this persisted ChromaDocumentStore in query.py, I encounter an error.
store.py - Used to create and persist the ChromaDocumentStore
import os
from pathlib import Path
from haystack import Pipeline
from haystack.components.converters import TextFileToDocument
from haystack.components.writers import DocumentWriter
from haystack_integrations.document_stores.chroma import ChromaDocumentStore
file_paths = ["data" / Path(name) for name in os.listdir("data")]
# Chroma is used in-memory so we use the same instances in the two pipelines below
document_store = ChromaDocumentStore(persist_path="./chroma_db_test", collection_name="my_documents")
indexing = Pipeline()
indexing.add_component("converter", TextFileToDocument())
indexing.add_component("writer", DocumentWriter(document_store))
indexing.connect("converter", "writer")
indexing.run({"converter": {"sources": file_paths}})
print('done')
❌ query.py – Trying to use the persisted ChromaDocumentStore
from haystack import Pipeline
from haystack_integrations.components.retrievers.chroma import ChromaQueryTextRetriever
from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.components.builders import ChatPromptBuilder
from haystack.dataclasses import ChatMessage
from haystack_integrations.document_stores.chroma import ChromaDocumentStore
import os
os.environ["OPENAI_API_KEY"] = "my-key"
# Load the document store from persisted DB
# This prevents Chroma from wiping the existing DB and recreating it
document_store = ChromaDocumentStore(persist_path="./chroma_db_test", collection_name="my_documents")
prompt = [
ChatMessage.from_user(
"""
According to the contents of this website:
{% for document in documents %}
{{document.content}}
{% endfor %}
Answer the given question: {{query}}
Answer:
"""
)
]
prompt_builder = ChatPromptBuilder(template=prompt)
llm = OpenAIChatGenerator()
retriever = ChromaQueryTextRetriever(document_store)
querying = Pipeline()
querying.add_component("retriever", retriever)
querying.add_component("prompt_builder", prompt_builder)
querying.add_component("llm", llm)
querying.connect("retriever.documents", "prompt_builder.documents")
querying.connect("prompt_builder", "llm")
query = "How to apply discount before tax in POS ?"
results = querying.run(data={"retriever": {"query": query},
"prompt_builder": {"query": query}})
print(results["llm"]["replies"][0].text)
Error from the query.py code
haystack.core.errors.PipelineRuntimeError: The following component failed to run:
Component name: 'retriever'
Component type: 'ChromaQueryTextRetriever'
Error: Collection [my_documents] already exists
🔍 Problem
I want to reuse the existing vector DB (chroma_db_test) without recreating it every time. However, the query.py script throws an error when trying to load the stored ChromaDocumentStore.
💬 Request
Can you please help me correctly load and reuse the existing Chroma vector DB? I want to avoid re-indexing or wiping the DB each time I run the query pipeline.
Thanks in advance!
I’ve stored a ChromaDocumentStore locally using
store.py, and it works perfectly—creating and persisting the DB as expected.However, when I try to reuse this persisted ChromaDocumentStore in query.py, I encounter an error.
store.py- Used to create and persist the ChromaDocumentStore❌
query.py– Trying to use the persisted ChromaDocumentStoreError from the
query.pycode🔍 Problem
I want to reuse the existing vector DB (chroma_db_test) without recreating it every time. However, the query.py script throws an error when trying to load the stored ChromaDocumentStore.
💬 Request
Can you please help me correctly load and reuse the existing Chroma vector DB? I want to avoid re-indexing or wiping the DB each time I run the query pipeline.
Thanks in advance!