Skip to content

feat: adding count with filtering operations to OpenSearchDocumentStore#2653

Merged
davidsbatista merged 74 commits into
mainfrom
feat/add-count-filtering-to-OpenSearchDocumentStore
Jan 16, 2026
Merged

feat: adding count with filtering operations to OpenSearchDocumentStore#2653
davidsbatista merged 74 commits into
mainfrom
feat/add-count-filtering-to-OpenSearchDocumentStore

Conversation

@davidsbatista
Copy link
Copy Markdown
Contributor

@davidsbatista davidsbatista commented Jan 5, 2026

Related Issues

Proposed Changes:

  • count_documents_by_filter() - count documents matching filter criteria
  • count_distinct_values_by_filter()- get distinct value counts for metadata fields with optional filtering
  • get_fields_info() - retrieve field type information from index mapping
  • get_field_min_max() - get min/max values for numeric metadata fields
  • get_field_unique_values() - get unique values for a field with pagination and content-based filtering
  • query_sql() - execute SQL queries against OpenSearch with support for multiple response formats (JSON, CSV, JDBC, RAW)

How did you test it?

  • added integrations tests covering the new methods both or sync and async versions

Notes for the reviewer

  • added httpx>=0.28.1 dependency
  • the query_sql() method performs a raw http request (based on httpx) if the specified response format is not JSON

Checklist

@github-actions github-actions Bot added integration:opensearch type:documentation Improvements or additions to documentation labels Jan 5, 2026
@davidsbatista davidsbatista changed the title Feat/add count filtering to open search document store feat: adding count with filtering operations to open search document store Jan 5, 2026
@davidsbatista davidsbatista changed the title feat: adding count with filtering operations to open search document store feat: adding count with filtering operations to OpenSearchDocumentStore Jan 5, 2026
@davidsbatista davidsbatista marked this pull request as ready for review January 6, 2026 11:16
@davidsbatista davidsbatista requested a review from a team as a code owner January 6, 2026 11:16
@davidsbatista davidsbatista requested review from sjrl and removed request for a team January 6, 2026 11:16
@sjrl sjrl requested a review from tstadel January 7, 2026 08:36
@sjrl
Copy link
Copy Markdown
Contributor

sjrl commented Jan 7, 2026

Hey @tstadel I'd also appreciate your review on this since we want to make sure it will in platform as well.

Comment thread integrations/opensearch/pyproject.toml Outdated
@davidsbatista davidsbatista requested a review from sjrl January 13, 2026 14:57
davidsbatista and others added 2 commits January 14, 2026 10:46
…res/opensearch/document_store.py

Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com>
…res/opensearch/document_store.py

Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com>
Comment thread integrations/opensearch/tests/test_document_store.py
Comment on lines +1433 to +1438
# Build aggregations
# Terms aggregation for paginated unique values
# Note: Terms aggregation doesn't support 'from' parameter directly,
# so we fetch from_ + size results and slice them
# Cardinality aggregation for total count
terms_size = from_ + size if from_ > 0 else size
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this only works till 10k. Overall the pagination is not ideal here. I'd suggest to switch over to composite aggregations which support proper pagination. However you'd need to change the signature to support the after param instead of from.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest to remove the from param altogether as it gives incorrect impressions here. That would be the quickest measure.

For full pagination support I'd recommend:

  1. support from and after on protocol level (as some document stores might support from-based and some after-based pagination)
  2. raise NotSupportedError for OpenSearch when from is set
  3. implement after-based pagination for OpenSearch using composite aggregations

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with option 3.

def get_metadata_field_unique_values(
        self,
        metadata_field: str,
        search_term: str | None = None,
        size: int | None = 10000,
        after: dict[str, Any] | None = None,
    )

Regarding the Protocol - I support your suggestion of adding both from and after - but let's see how this applies to other DocumentStores, and then we can see if it's easier to add to Protocol.

I think there are common operations that can already be added to the Protocol. I need to review it in detail and make use of it to avoid having so much duplicated code for tests. There's an issue with it, and I will probably take it soon.

Copy link
Copy Markdown
Member

@tstadel tstadel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidsbatista I left some comments. Biggest issue I see is concerning the pagination of unique_values. I'd probably just not support it here. We can always come back and implement a proper solution.

@davidsbatista davidsbatista requested review from sjrl and tstadel January 15, 2026 14:14
Copy link
Copy Markdown
Member

@tstadel tstadel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now!

Copy link
Copy Markdown
Contributor

@sjrl sjrl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor comment, otherwise looks good!

@davidsbatista davidsbatista merged commit f2001e8 into main Jan 16, 2026
7 checks passed
@davidsbatista davidsbatista deleted the feat/add-count-filtering-to-OpenSearchDocumentStore branch January 16, 2026 11:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

integration:opensearch type:documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

add the following operations to OpenSearchDocumentStore

3 participants