Skip to content

refactor!: Elasticsearch - use async mixin tests and fix write/delete behaviour#3196

Merged
anakin87 merged 14 commits intodeepset-ai:mainfrom
SyedShahmeerAli12:refactor/elasticsearch-async-mixin-tests
Apr 22, 2026
Merged

refactor!: Elasticsearch - use async mixin tests and fix write/delete behaviour#3196
anakin87 merged 14 commits intodeepset-ai:mainfrom
SyedShahmeerAli12:refactor/elasticsearch-async-mixin-tests

Conversation

@SyedShahmeerAli12
Copy link
Copy Markdown
Contributor

@SyedShahmeerAli12 SyedShahmeerAli12 commented Apr 21, 2026

Summary

Closes #3047

Replaces the duplicated async test methods in test_document_store_async.py with mixin classes from haystack.testing.document_store_async, which shipped in haystack v2.28.0.

What changed:

  • TestElasticsearchDocumentStoreAsync now inherits from all applicable async mixin classes (CountDocumentsAsyncTest, WriteDocumentsAsyncTest, DeleteDocumentsAsyncTest, DeleteAllAsyncTest, DeleteByFilterAsyncTest, FilterDocumentsAsyncTest, UpdateByFilterAsyncTest, CountDocumentsByFilterAsyncTest, CountUniqueMetadataByFilterAsyncTest, GetMetadataFieldsInfoAsyncTest, GetMetadataFieldMinMaxAsyncTest, GetMetadataFieldUniqueValuesAsyncTest)
  • Elasticsearch-specific tests retained: BM25 retrieval, embedding retrieval, sparse vector handling, index recreation, SQL queries
  • Fixture updated to pytest_asyncio.fixture (async)
  • haystack-ai minimum version bumped to >=2.28.0 (when async mixin was released)
  • Net result: -436 lines / +89 lines

Overridden mixin tests (ES behavior deviations):

Several mixin tests were overridden because Elasticsearch's behavior differs from the standard interface contract:

Test Expected (mixin) Actual (ES)
test_write_documents_duplicate_fail_async raises DuplicateDocumentError raises DocumentStoreError
test_write_documents_duplicate_skip_async returns 0 when skipping existing doc returns 1
test_delete_documents_empty_document_store_async no-op (no error) raises DocumentStoreError
test_delete_documents_non_existing_document_async no-op (no error) raises DocumentStoreError

These deviations look like bugs in the Elasticsearch document store implementation and are worth fixing in a follow-up PR to align ES with the standard interface.

Test plan

  • Integration tests pass against a live Elasticsearch instance

Replace duplicated async test methods with mixin classes from
`haystack.testing.document_store_async`, available since haystack v2.28.0.

Elasticsearch-specific tests (BM25, embedding retrieval, sparse vectors,
index recreation, SQL queries) are kept in place.

Closes deepset-ai#3047
@SyedShahmeerAli12 SyedShahmeerAli12 requested a review from a team as a code owner April 21, 2026 18:39
@SyedShahmeerAli12 SyedShahmeerAli12 requested review from anakin87 and removed request for a team April 21, 2026 18:39
@github-actions github-actions Bot added integration:elasticsearch type:documentation Improvements or additions to documentation labels Apr 21, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 21, 2026

Coverage report (elasticsearch)

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  integrations/elasticsearch/src/haystack_integrations/document_stores/elasticsearch
  document_store.py 646-660, 704-744, 786-788
Project Total  

This report was generated by python-coverage-comment-action

@SyedShahmeerAli12
Copy link
Copy Markdown
Contributor Author

While wiring up the mixin tests I noticed a few places where Elasticsearch's behaviour differs from the standard DocumentStore interface contract:

  • write_documents_async with DuplicatePolicy.FAIL raises DocumentStoreError instead of DuplicateDocumentError
  • write_documents_async with DuplicatePolicy.SKIP returns 1 instead of 0 when the document already exists
  • delete_documents_async raises DocumentStoreError instead of silently ignoring non-existent IDs

I've overridden those mixin tests to match the current ES behaviour so CI passes, but these look like bugs worth fixing in a follow-up PR to bring ES in line with the rest of the document stores.

Comment on lines -18 to -22
"""
Basic fixture providing a document store instance for async tests
"""
hosts = ["http://localhost:9200"]
# Use a different index for each test so we can run them in parallel
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do not remove these comments, please


@pytest.mark.asyncio
async def test_write_documents_async_with_sparse_embedding_warning(self, document_store, caplog):
"""Test write_documents with document containing sparse_embedding field"""
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

restore all removed docstrings/comments

@anakin87
Copy link
Copy Markdown
Member

I've overridden those mixin tests to match the current ES behaviour so CI passes, but these look like bugs worth fixing in a follow-up PR to bring ES in line with the rest of the document stores.

I agree, let's fix these behaviors and then remove tests overrides

Fix three async behavior deviations from the standard DocumentStore interface:

- write_documents_async with DuplicatePolicy.FAIL now raises
  DuplicateDocumentError instead of DocumentStoreError (outer try/except
  was swallowing the inner DuplicateDocumentError)
- write_documents_async with DuplicatePolicy.SKIP now returns 0 when
  skipping an existing document (errors are now categorized like the sync
  version: version_conflict_engine_exception with SKIP policy is silently
  ignored)
- delete_documents_async no longer raises on non-existent IDs (adds
  raise_on_error=False to match the sync version behaviour)

Also restore docstrings and inline comments that were accidentally removed
in the previous refactor commit.
@anakin87
Copy link
Copy Markdown
Member

@SyedShahmeerAli12 ping me when you need another review.
Also, let's rename this PR to make the changes more visible

@SyedShahmeerAli12 SyedShahmeerAli12 changed the title refactor(elasticsearch): use async DocumentStore mixin tests fix(elasticsearch): fix async write/delete behaviour and use mixin tests Apr 22, 2026
@SyedShahmeerAli12
Copy link
Copy Markdown
Contributor Author

SyedShahmeerAli12 commented Apr 22, 2026

@anakin87

PR renamed and comment left. We addressed both inline comments (restored docstrings/comments) and the bug fixes you asked for. Now waiting for your re-review.

Copy link
Copy Markdown
Contributor Author

@SyedShahmeerAli12 SyedShahmeerAli12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @anakin87 inline comments restored in test_embedding_retrieval_async.

For the two try/except removals, both were intentional bug fixes:

write_documents_async (line 714): The original outer except Exception was catching DuplicateDocumentError (a subclass of DocumentStoreError) that we raised inside the try block, then re-wrapping it as a generic DocumentStoreError. So callers using DuplicatePolicy.FAIL never actually received DuplicateDocumentError it was always swallowed into DocumentStoreError. Removing it and rewriting the error categorization to match the sync write_documents fixes this.

delete_documents_async (line 788): The original had no raise_on_error=False, so async_bulk would raise BulkIndexError when deleting non-existent IDs, which the try/except then re-raised as DocumentStoreError. The sync delete_documents uses raise_on_error=False and no try/except to silently ignore missing IDs. Adding raise_on_error=False makes the try/except redundant and aligns the behaviour with the sync version.

@anakin87
Copy link
Copy Markdown
Member

https://github.com/deepset-ai/haystack-core-integrations/pull/3196/files

There are still several removed comments

@anakin87
Copy link
Copy Markdown
Member

While wiring up the mixin tests I noticed a few places where Elasticsearch's behaviour differs from the standard DocumentStore interface contract:

  • write_documents_async with DuplicatePolicy.FAIL raises DocumentStoreError instead of DuplicateDocumentError
  • write_documents_async with DuplicatePolicy.SKIP returns 1 instead of 0 when the document already exists
  • delete_documents_async raises DocumentStoreError instead of silently ignoring non-existent IDs

I've overridden those mixin tests to match the current ES behaviour so CI passes, but these look like bugs worth fixing in a follow-up PR to bring ES in line with the rest of the document stores.

@davidsbatista these changes look right to me but could you please double-check?

@SyedShahmeerAli12
Copy link
Copy Markdown
Contributor Author

@anakin87 all comments from kept tests are restored. The remaining removed lines in the diff are from test method intentionally deleted as part of this refactor they're now covered by the mixin classes, so their comments go away with them.

@davidsbatista davidsbatista changed the title fix(elasticsearch): fix async write/delete behaviour and use mixin tests refactor(elasticsearch): use async mixin tests and fix write/delete behaviour Apr 22, 2026
Comment thread integrations/elasticsearch/tests/test_document_store_async.py
docs = [Document(id="1", content="A first document"), Document(id="2", content="Second document")]
await document_store.write_documents_async(docs)

# capture index structure before deletion
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

restore this

await document_store.delete_all_documents_async(recreate_index=True)
assert await document_store.count_documents_async() == 0

# verify index structure is preserved
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

restore this

Comment thread integrations/elasticsearch/tests/test_document_store_async.py
settings_before["index"].pop("creation_date", None)
assert settings_after == settings_before, "delete_all_documents_async should preserve index settings"

# verify index can accept new documents and retrieve
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

restore this

Comment on lines -295 to -297
results = await document_store.filter_documents_async()
assert len(results) == 1
assert results[0].content == "New document after delete all"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

restore this

@davidsbatista
Copy link
Copy Markdown
Contributor

So, state of this PR, after having a look.

The mixin tests enforced the standard Haystack interface contract, revealing that the existing write_documents_async and delete_documents_async implementations deviated from it.

write_documents_async

  • DuplicatePolicy.SKIP return value: Writing a document that already exists with DuplicatePolicy.SKIP now correctly returns 0. Previously, it was silently overwriting the document and returning 1. (breaking change)

  • DuplicatePolicy.FAIL raises DuplicateDocumentError: Previously raised a generic DocumentStoreError. Now raises the more specific DuplicateDocumentError, consistent with write_documents (sync) and the standard Haystack interface. Also, the error message format changed: The error now reports all duplicate IDs in a single message instead of stopping at the first one.

delete_documents_async

  • deleting non-existent documents is now a no-op: Previously raised DocumentStoreError when a document ID did not exist. It now silently does nothing, matching the behaviour of delete_documents (sync) and the standard Haystack interface. (breaking change)

Regarding the tests

Removed test Replaced by mixin
test_count_documents_async CountDocumentsAsyncTest.test_count_not_empty_async (overridden)
test_delete_documents_async DeleteDocumentsAsyncTest.test_delete_documents_async
test_filter_documents_async FilterDocumentsAsyncTest — 3 tests (test_no_filters_async, test_filter_simple_async, test_filter_compound_async)
test_delete_all_documents_async DeleteAllAsyncTest.test_delete_all_documents_async
test_delete_by_filter_async DeleteByFilterAsyncTest.test_delete_by_filter_async
test_update_by_filter_async UpdateByFilterAsyncTest.test_update_by_filter_async
test_count_documents_by_filter_async CountDocumentsByFilterAsyncTest — 4 tests
test_count_unique_metadata_by_filter_async CountUniqueMetadataByFilterAsyncTest — 3 tests
test_get_metadata_fields_info_async GetMetadataFieldsInfoAsyncTest — 2 tests
test_get_metadata_field_min_max_async GetMetadataFieldMinMaxAsyncTest — 5 tests
test_get_metadata_field_unique_values_async GetMetadataFieldUniqueValuesAsyncTest

@davidsbatista
Copy link
Copy Markdown
Contributor

after this one is merged we should do a new release

@anakin87
Copy link
Copy Markdown
Member

Thank you @davidsbatista!

Once comments are restored, I'll release a new major version (due to breaking changes).

@anakin87 anakin87 changed the title refactor(elasticsearch): use async mixin tests and fix write/delete behaviour refactor!: Elasticsearch - use async mixin tests and fix write/delete behaviour Apr 22, 2026
Copy link
Copy Markdown
Member

@anakin87 anakin87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@anakin87 anakin87 merged commit 68b0141 into deepset-ai:main Apr 22, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

integration:elasticsearch type:documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use async DocumentStore mixin tests in Elasticsearch

3 participants