This might be a stupid question - I have provided the embedding dimension to match with the dimension of the embedder all-MiniLM-L6-v2
document_store = QdrantDocumentStore(
path=config["document_store"]["persist_path"],
recreate_index=True,
return_embedding=True,
wait_result_from_api=True,
embedding_dim=384
)
# Define the filetype router (this will route the files to their appropriate converter)
# In this example, we only allow plaintext, PDF, and markdown files.
file_type_router = FileTypeRouter(mime_types=["text/markdown"])
# Define the converter used for .md -> Document
markdown_converter = MarkdownToDocument()
# Define the document cleaner, which will remove all extraneous material (extended blankspace, images, etc.)
# You can change this behaviour by passing different parameters into DocumentCleaner()
document_cleaner = DocumentCleaner()
# Define the embedder. This is where the slices will be converted to vectors/embeddings
# These vectors will then be searched against when we submit our query, to find the most relevant chunks of text
document_embedder = SentenceTransformersDocumentEmbedder(model=r"all-MiniLM-L6-v2",device=ComponentDevice.from_str("cuda"),local_files_only=True)
# nvidia-smi -l
# Define the document writer, this will actually write the vectors to the DB
document_writer = DocumentWriter(document_store)
# This is where the pipeline is actually created
# First we add the routers, then the converters, then joiner, cleaner, splitter, embedder, and finally the writer.
# Adding the components...
preprocessing_pipeline = Pipeline()
preprocessing_pipeline.add_component(instance=file_type_router, name="file_type_router")
preprocessing_pipeline.add_component(instance=markdown_converter, name="markdown_converter")
preprocessing_pipeline.add_component(instance=document_cleaner, name="document_cleaner")
preprocessing_pipeline.add_component(instance=document_embedder, name="document_embedder")
preprocessing_pipeline.add_component(instance=document_writer, name="document_writer")
# Connecting the components...
preprocessing_pipeline.connect("file_type_router.text/markdown", "markdown_converter.sources")
preprocessing_pipeline.connect("markdown_converter", "document_cleaner")
preprocessing_pipeline.connect("document_cleaner", "document_embedder")
preprocessing_pipeline.connect("document_embedder", "document_writer")
But I get the following error - I am pretty sure I am doing something stupid - Any help will be appreciated
haystack.core.errors.PipelineRuntimeError: The following component failed to run:
Component name: 'document_writer'
Component type: 'DocumentWriter'
Error: could not broadcast input array from shape (384,) into shape (768,)
This might be a stupid question - I have provided the embedding dimension to match with the dimension of the embedder
all-MiniLM-L6-v2But I get the following error - I am pretty sure I am doing something stupid - Any help will be appreciated