Add FAISS DocumentStore integration#2844
Conversation
- Remove duplicate _check_condition method (lines 111-148) - Remove unreachable comparison operator checks (lines 348-355) - Reduces codebase by 46 lines of unreachable code - All tests still passing (55 passed, 2 skipped)
- Addresses review feedback from @davidsbatista - Removed: sql_url, faiss_index_factory_str, similarity, isolation_level, duplicate_documents, return_embedding, progress_bar - Updated docstrings
davidsbatista
left a comment
There was a problem hiding this comment.
Thanks again for this @GunaPalanivel, and sorry for the delay. I did another review round, a few more things to fix/improve.
| @@ -0,0 +1,49 @@ | |||
| import pytest | |||
There was a problem hiding this comment.
There are few tests which are still not part of our Mixin, so we need to add them for each doc store:
- count_documents_by_filter
- get_metadata_fields_info
- count_unique_metadata_by_filter
- search(query_embedding, top_k)
- search with filters
Also, we need to add tests for:
- to_dict / from_dict with round-trip
- Tests for load when .faiss or .json is missing, and for save/load when some documents have no embeddings
…e cases - Added custom test methods for \count_documents_by_filter\, \get_metadata_fields_info\, and \count_unique_metadata_by_filter\ - Added test for search with and without filters - Added tests for persistence with and without embeddings, and missing files - Added test for to_dict / from_dict roundtrip
…github.com/GunaPalanivel/haystack-core-integrations into integration/issue-717-faiss-document-store
- Fixed Ruff E501, EM101, EM102, B905, E721, and PLC0415 - Added empty py.typed marker to src package - Removed redundant FilterableDocsFixtureMixin from TestFAISSDocumentStore
- Cleaned up developer comments in _matches_filters and count_unique_metadata_by_filter - Added ValueError raising for invalid input in write_documents - Added missing docstrings for get_metadata_fields_info, delete_by_filter, etc. - Added explicit utf-8 encoding to open() calls - Removed dead pass blocks
Thanks for the thorough review @davidsbatista Here’s what’s been addressed in this round:
All tests are passing locally. Please let me know if anything else needs tightening. |
|
Thanks @GunaPalanivel for the changes. In order for this Documentstore to be used with Haystack there's one essential component missing, an EmbeddingRetriever. We need to add a |
Thanks for the clear feedback, @davidsbatista! Understood — I'll add a
Will push the changes shortly! |
- Add components/retrievers/faiss/embedding_retriever.py with @component decorator, run(), run_async(), to_dict(), from_dict() with FilterPolicy support and backward-compat deserialization guard - Add components/__init__.py, components/retrievers/__init__.py, components/retrievers/faiss/__init__.py namespace packages - Add tests/test_embedding_retriever.py with 8 tests covering: basic run, runtime filters, top_k override, to_dict/from_dict roundtrip, FilterPolicy REPLACE/MERGE, ValueError on wrong store type, and end-to-end pipeline execution - Update pyproject.toml types script to also typecheck haystack_integrations.components.retrievers.faiss
Added
Ready for re-review @davidsbatista |
davidsbatista
left a comment
There was a problem hiding this comment.
LGTM!
@GunaPalanivel, thanks again for this contribution.
I've made a few extra fixes and alignments, and configured docusaurus.
|
Thank you so much, @davidsbatista! 🙏 Really appreciate the thorough review rounds and the extra fixes + Docusaurus configuration — that's above and beyond. Learned a lot going through each round of feedback, especially around the Excited to have this land in the repo. Looking forward to contributing more! 🚀 |
|
Thanks for your continuous contributions @GunaPalanivel - always glad to have good quality PRs, it also helps a lot on our side during the reviews. |
Description
Adds a new FAISSDocumentStore integration for Haystack 2.x.
Changes
Core Implementation
FAISSDocumentStorewith full Haystack 2.x protocol supportFeatures
Persistence
Error Handling
FilterErrorfor invalid filter operationsTesting
55 tests passing, 2 skipped
Uses Haystack’s standard DocumentStore test mixins
Covers:
Run:
hatch run test:all # 55 passed, 2 skippedChecklist