refactor!: Elasticsearch - use async mixin tests and fix write/delete behaviour#3196
Conversation
Replace duplicated async test methods with mixin classes from `haystack.testing.document_store_async`, available since haystack v2.28.0. Elasticsearch-specific tests (BM25, embedding retrieval, sparse vectors, index recreation, SQL queries) are kept in place. Closes deepset-ai#3047
Coverage report (elasticsearch)Click to see where and how coverage changed
This report was generated by python-coverage-comment-action |
||||||||||||||||||||||||
|
While wiring up the mixin tests I noticed a few places where Elasticsearch's behaviour differs from the standard
I've overridden those mixin tests to match the current ES behaviour so CI passes, but these look like bugs worth fixing in a follow-up PR to bring ES in line with the rest of the document stores. |
| """ | ||
| Basic fixture providing a document store instance for async tests | ||
| """ | ||
| hosts = ["http://localhost:9200"] | ||
| # Use a different index for each test so we can run them in parallel |
There was a problem hiding this comment.
do not remove these comments, please
|
|
||
| @pytest.mark.asyncio | ||
| async def test_write_documents_async_with_sparse_embedding_warning(self, document_store, caplog): | ||
| """Test write_documents with document containing sparse_embedding field""" |
There was a problem hiding this comment.
restore all removed docstrings/comments
I agree, let's fix these behaviors and then remove tests overrides |
Fix three async behavior deviations from the standard DocumentStore interface: - write_documents_async with DuplicatePolicy.FAIL now raises DuplicateDocumentError instead of DocumentStoreError (outer try/except was swallowing the inner DuplicateDocumentError) - write_documents_async with DuplicatePolicy.SKIP now returns 0 when skipping an existing document (errors are now categorized like the sync version: version_conflict_engine_exception with SKIP policy is silently ignored) - delete_documents_async no longer raises on non-existent IDs (adds raise_on_error=False to match the sync version behaviour) Also restore docstrings and inline comments that were accidentally removed in the previous refactor commit.
|
@SyedShahmeerAli12 ping me when you need another review. |
|
PR renamed and comment left. We addressed both inline comments (restored docstrings/comments) and the bug fixes you asked for. Now waiting for your re-review. |
There was a problem hiding this comment.
Hi @anakin87 inline comments restored in test_embedding_retrieval_async.
For the two try/except removals, both were intentional bug fixes:
write_documents_async (line 714): The original outer except Exception was catching DuplicateDocumentError (a subclass of DocumentStoreError) that we raised inside the try block, then re-wrapping it as a generic DocumentStoreError. So callers using DuplicatePolicy.FAIL never actually received DuplicateDocumentError it was always swallowed into DocumentStoreError. Removing it and rewriting the error categorization to match the sync write_documents fixes this.
delete_documents_async (line 788): The original had no raise_on_error=False, so async_bulk would raise BulkIndexError when deleting non-existent IDs, which the try/except then re-raised as DocumentStoreError. The sync delete_documents uses raise_on_error=False and no try/except to silently ignore missing IDs. Adding raise_on_error=False makes the try/except redundant and aligns the behaviour with the sync version.
|
https://github.com/deepset-ai/haystack-core-integrations/pull/3196/files There are still several removed comments |
@davidsbatista these changes look right to me but could you please double-check? |
|
@anakin87 all comments from kept tests are restored. The remaining removed lines in the diff are from test method intentionally deleted as part of this refactor they're now covered by the mixin classes, so their comments go away with them. |
| docs = [Document(id="1", content="A first document"), Document(id="2", content="Second document")] | ||
| await document_store.write_documents_async(docs) | ||
|
|
||
| # capture index structure before deletion |
| await document_store.delete_all_documents_async(recreate_index=True) | ||
| assert await document_store.count_documents_async() == 0 | ||
|
|
||
| # verify index structure is preserved |
| settings_before["index"].pop("creation_date", None) | ||
| assert settings_after == settings_before, "delete_all_documents_async should preserve index settings" | ||
|
|
||
| # verify index can accept new documents and retrieve |
| results = await document_store.filter_documents_async() | ||
| assert len(results) == 1 | ||
| assert results[0].content == "New document after delete all" |
|
So, state of this PR, after having a look. The mixin tests enforced the standard Haystack interface contract, revealing that the existing
Regarding the tests
|
|
after this one is merged we should do a new release |
|
Thank you @davidsbatista! Once comments are restored, I'll release a new major version (due to breaking changes). |
Summary
Closes #3047
Replaces the duplicated async test methods in
test_document_store_async.pywith mixin classes fromhaystack.testing.document_store_async, which shipped in haystack v2.28.0.What changed:
TestElasticsearchDocumentStoreAsyncnow inherits from all applicable async mixin classes (CountDocumentsAsyncTest,WriteDocumentsAsyncTest,DeleteDocumentsAsyncTest,DeleteAllAsyncTest,DeleteByFilterAsyncTest,FilterDocumentsAsyncTest,UpdateByFilterAsyncTest,CountDocumentsByFilterAsyncTest,CountUniqueMetadataByFilterAsyncTest,GetMetadataFieldsInfoAsyncTest,GetMetadataFieldMinMaxAsyncTest,GetMetadataFieldUniqueValuesAsyncTest)pytest_asyncio.fixture(async)haystack-aiminimum version bumped to>=2.28.0(when async mixin was released)Overridden mixin tests (ES behavior deviations):
Several mixin tests were overridden because Elasticsearch's behavior differs from the standard interface contract:
test_write_documents_duplicate_fail_asyncDuplicateDocumentErrorDocumentStoreErrortest_write_documents_duplicate_skip_async0when skipping existing doc1test_delete_documents_empty_document_store_asyncDocumentStoreErrortest_delete_documents_non_existing_document_asyncDocumentStoreErrorThese deviations look like bugs in the Elasticsearch document store implementation and are worth fixing in a follow-up PR to align ES with the standard interface.
Test plan