Skip to content

Add FAISS DocumentStore integration#2844

Merged
davidsbatista merged 16 commits intodeepset-ai:mainfrom
GunaPalanivel:integration/issue-717-faiss-document-store
Feb 26, 2026
Merged

Add FAISS DocumentStore integration#2844
davidsbatista merged 16 commits intodeepset-ai:mainfrom
GunaPalanivel:integration/issue-717-faiss-document-store

Conversation

@GunaPalanivel
Copy link
Copy Markdown
Contributor

@GunaPalanivel GunaPalanivel commented Feb 16, 2026

Description

Adds a new FAISSDocumentStore integration for Haystack 2.x.

Changes

Core Implementation

  • Introduced FAISSDocumentStore with full Haystack 2.x protocol support

Features

  • In-memory and persistent storage (FAISS index + JSON metadata)
  • Configurable FAISS index types
  • Vector similarity search
  • Comprehensive metadata filtering (comparison + logical operators)
  • Full document CRUD support

Persistence

  • FAISS index for vector storage
  • In-memory metadata dictionary
  • JSON serialization for metadata persistence

Error Handling

  • Proper validation with FilterError for invalid filter operations

Testing

  • 55 tests passing, 2 skipped

  • Uses Haystack’s standard DocumentStore test mixins

  • Covers:

    • Filtering logic
    • Persistence
    • Metadata operations
    • Edge cases

Run:

hatch run test:all
# 55 passed, 2 skipped

Checklist

  • Follows Haystack 2.x conventions
  • Comprehensive test coverage (standard mixins)
  • Formatted and linted (ruff)
  • Persistence verified
  • Metadata filtering validated
  • Documentation updated (docstrings + README)

- Remove duplicate _check_condition method (lines 111-148)
- Remove unreachable comparison operator checks (lines 348-355)
- Reduces codebase by 46 lines of unreachable code
- All tests still passing (55 passed, 2 skipped)
@GunaPalanivel GunaPalanivel requested a review from a team as a code owner February 16, 2026 14:53
@GunaPalanivel GunaPalanivel requested review from davidsbatista and removed request for a team February 16, 2026 14:53
@github-actions github-actions Bot added the type:documentation Improvements or additions to documentation label Feb 16, 2026
- Addresses review feedback from @davidsbatista
- Removed: sql_url, faiss_index_factory_str, similarity, isolation_level, duplicate_documents, return_embedding, progress_bar
- Updated docstrings
Copy link
Copy Markdown
Contributor

@davidsbatista davidsbatista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for this @GunaPalanivel, and sorry for the delay. I did another review round, a few more things to fix/improve.

@@ -0,0 +1,49 @@
import pytest
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are few tests which are still not part of our Mixin, so we need to add them for each doc store:

  • count_documents_by_filter
  • get_metadata_fields_info
  • count_unique_metadata_by_filter
  • search(query_embedding, top_k)
  • search with filters

Also, we need to add tests for:

  • to_dict / from_dict with round-trip
  • Tests for load when .faiss or .json is missing, and for save/load when some documents have no embeddings

Comment thread integrations/faiss/tests/test_document_store.py
davidsbatista and others added 5 commits February 23, 2026 12:00
…e cases

- Added custom test methods for \count_documents_by_filter\,
  \get_metadata_fields_info\, and \count_unique_metadata_by_filter\
- Added test for search with and without filters
- Added tests for persistence with and without embeddings, and missing files
- Added test for to_dict / from_dict roundtrip
- Fixed Ruff E501, EM101, EM102, B905, E721, and PLC0415
- Added empty py.typed marker to src package
- Removed redundant FilterableDocsFixtureMixin from TestFAISSDocumentStore
- Cleaned up developer comments in _matches_filters and count_unique_metadata_by_filter
- Added ValueError raising for invalid input in write_documents
- Added missing docstrings for get_metadata_fields_info, delete_by_filter, etc.
- Added explicit utf-8 encoding to open() calls
- Removed dead pass blocks
@GunaPalanivel
Copy link
Copy Markdown
Contributor Author

Thanks again for this @GunaPalanivel, and sorry for the delay. I did another review round, a few more things to fix/improve.

Thanks for the thorough review @davidsbatista

Here’s what’s been addressed in this round:

  • Added SPDX license headers to both document_store.py and test_document_store.py.

  • Removed the unused _get_result_to_documents method.

  • Added docstrings to:

    • update_by_filter (including note on in-memory behavior),
    • delete_by_filter,
    • count_documents_by_filter,
    • get_metadata_fields_info.
  • Added the requested test coverage for:

    • count_documents_by_filter
    • get_metadata_fields_info
    • count_unique_metadata_by_filter
    • search (with and without filters)
    • to_dict / from_dict roundtrip
    • persistence (with and without embeddings)
    • loading with missing files

All tests are passing locally. Please let me know if anything else needs tightening.

@davidsbatista
Copy link
Copy Markdown
Contributor

Thanks @GunaPalanivel for the changes. In order for this Documentstore to be used with Haystack there's one essential component missing, an EmbeddingRetriever.

We need to add a FAISSEmbeddingRetriever that accepts a query_embedding (and optional filters, top_k) and calls document_store.search(query_embedding, top_k, filters=filters).

@GunaPalanivel
Copy link
Copy Markdown
Contributor Author

Thanks @GunaPalanivel for the changes. In order for this Documentstore to be used with Haystack there's one essential component missing, an EmbeddingRetriever.

We need to add a FAISSEmbeddingRetriever that accepts a query_embedding (and optional filters, top_k) and calls document_store.search(query_embedding, top_k, filters=filters).

Thanks for the clear feedback, @davidsbatista! Understood — I'll add a FAISSEmbeddingRetriever as a proper @component-decorated class with:

  • run(query_embedding, filters, top_k) calling document_store.search(...)
  • FilterPolicy support with apply_filter_policy
  • to_dict / from_dict for pipeline serialization
  • Tests and updated __init__.py exports

Will push the changes shortly!

- Add components/retrievers/faiss/embedding_retriever.py with @component
  decorator, run(), run_async(), to_dict(), from_dict() with FilterPolicy
  support and backward-compat deserialization guard
- Add components/__init__.py, components/retrievers/__init__.py,
  components/retrievers/faiss/__init__.py namespace packages
- Add tests/test_embedding_retriever.py with 8 tests covering:
  basic run, runtime filters, top_k override, to_dict/from_dict
  roundtrip, FilterPolicy REPLACE/MERGE, ValueError on wrong store
  type, and end-to-end pipeline execution
- Update pyproject.toml types script to also typecheck
  haystack_integrations.components.retrievers.faiss
@GunaPalanivel
Copy link
Copy Markdown
Contributor Author

Thanks @GunaPalanivel for the changes. In order for this Documentstore to be used with Haystack there's one essential component missing, an EmbeddingRetriever.
We need to add a FAISSEmbeddingRetriever that accepts a query_embedding (and optional filters, top_k) and calls document_store.search(query_embedding, top_k, filters=filters).

Thanks for the clear feedback, @davidsbatista! Understood — I'll add a FAISSEmbeddingRetriever as a proper @component-decorated class with:

  • run(query_embedding, filters, top_k) calling document_store.search(...)
  • FilterPolicy support with apply_filter_policy
  • to_dict / from_dict for pipeline serialization
  • Tests and updated __init__.py exports

Will push the changes shortly!

Added FAISSEmbeddingRetriever in components/retrievers/faiss/embedding_retriever.py:

  • @component-decorated with run(), run_async(), to_dict(), from_dict()
  • FilterPolicy support via apply_filter_policy
  • Full pipeline serialization with backward-compat filter_policy guard
  • 8 new tests (72 total, 0 regressions)
  • pyproject.toml mypy now covers both sub-packages

Ready for re-review @davidsbatista

Copy link
Copy Markdown
Contributor

@davidsbatista davidsbatista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@GunaPalanivel, thanks again for this contribution.

I've made a few extra fixes and alignments, and configured docusaurus.

Copy link
Copy Markdown
Contributor

@davidsbatista davidsbatista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@davidsbatista davidsbatista merged commit a7f4732 into deepset-ai:main Feb 26, 2026
6 checks passed
@anakin87 anakin87 mentioned this pull request Feb 26, 2026
9 tasks
@GunaPalanivel
Copy link
Copy Markdown
Contributor Author

Thank you so much, @davidsbatista! 🙏

Really appreciate the thorough review rounds and the extra fixes + Docusaurus configuration — that's above and beyond. Learned a lot going through each round of feedback, especially around the FilterPolicy pattern and pipeline serialization contracts.

Excited to have this land in the repo. Looking forward to contributing more! 🚀

@davidsbatista
Copy link
Copy Markdown
Contributor

Thanks for your continuous contributions @GunaPalanivel - always glad to have good quality PRs, it also helps a lot on our side during the reviews.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type:documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Integration] Request to support FAISSDocumentStore in 2.x

2 participants