Skip to content

optimize must_not exisits on dense fields#21126

Open
ThyTran1402 wants to merge 2 commits intoopensearch-project:mainfrom
ThyTran1402:feat/optimize_must_not_exists
Open

optimize must_not exisits on dense fields#21126
ThyTran1402 wants to merge 2 commits intoopensearch-project:mainfrom
ThyTran1402:feat/optimize_must_not_exists

Conversation

@ThyTran1402
Copy link
Copy Markdown
Contributor

@ThyTran1402 ThyTran1402 commented Apr 4, 2026

Description

This PR: optimizes must_not: exists queries on dense fields by rewriting them into a cacheable positive filter.

  • The root cause was that ExistsQueryBuilder did not implement ComplementAwareQueryBuilder, so MustNotToShouldRewriter skipped it entirely.
  • Fixed: added NotExistsQuery and NotExistsQueryBuilder as the complement of ExistsQueryBuilder, and wires them into the existing query rewrite infrastructure.

Related Issues

Resolves #12426

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Thy Tran <58045538+ThyTran1402@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 4, 2026

PR Reviewer Guide 🔍

(Review updated until commit e21a7c9)

Here are some key observations to aid the review process:

🧪 No relevant tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes

Sub-PR theme: Add NotExistsQuery Lucene query implementation

Relevant files:

  • server/src/main/java/org/opensearch/common/lucene/search/NotExistsQuery.java

Sub-PR theme: Wire NotExistsQueryBuilder into query rewrite infrastructure

Relevant files:

  • server/src/main/java/org/opensearch/index/query/NotExistsQueryBuilder.java
  • server/src/main/java/org/opensearch/index/query/ExistsQueryBuilder.java
  • server/src/main/java/org/opensearch/index/query/ComplementAwareQueryBuilder.java
  • server/src/main/java/org/opensearch/search/SearchModule.java
  • server/src/main/java/org/opensearch/search/query/rewriters/MustNotToShouldRewriter.java

⚡ Recommended focus areas for review

Deleted Docs

The ComplementDocIdSetIterator does not account for deleted documents. In a segment with deletions, maxDoc includes deleted docs, but the iterator will return deleted document IDs as matches. The scorer should use a Bits (live docs) filter to skip deleted documents, similar to how other Lucene queries handle this.

static final class ComplementDocIdSetIterator extends DocIdSetIterator {

    private final DocIdSetIterator inner;
    private final int maxDoc;
    private int doc = -1;
    /** The next doc ID in the inner (exists) iterator; NO_MORE_DOCS when exhausted. */
    private int nextInner = -1;

    ComplementDocIdSetIterator(DocIdSetIterator inner, int maxDoc) {
        this.inner = inner;
        this.maxDoc = maxDoc;
    }

    @Override
    public int docID() {
        return doc;
    }

    @Override
    public int nextDoc() throws IOException {
        return advance(doc + 1);
    }

    @Override
    public int advance(int target) throws IOException {
        doc = target;
        while (doc < maxDoc) {
            // Make sure nextInner is >= doc
            if (nextInner < doc) {
                nextInner = inner.advance(doc);
            }
            if (nextInner == doc) {
                // This doc IS in the inner iterator (field exists) — skip it.
                doc++;
            } else {
                // nextInner > doc: this doc is NOT in the inner iterator — it matches.
                return doc;
            }
        }
        doc = NO_MORE_DOCS;
        return NO_MORE_DOCS;
    }

    @Override
    public long cost() {
        // Approximate: total docs minus the cost of the inner (exists) iterator.
        return Math.max(0, maxDoc - inner.cost());
    }
}
Missing Registration

NotExistsQueryBuilder is registered in SearchModule for parsing, but it may also need to be registered in the NamedWriteableRegistry (e.g., in SearchModule.getNamedWriteables() or equivalent) for serialization across nodes. If it is not registered as a NamedWriteable, cross-node queries using this builder will fail during deserialization.

public class NotExistsQueryBuilder extends AbstractQueryBuilder<NotExistsQueryBuilder> implements WithFieldName {

    public static final String NAME = "not_exists";
    public static final ParseField FIELD_FIELD = new ParseField("field");

    private final String fieldName;

    public NotExistsQueryBuilder(String fieldName) {
        if (Strings.isEmpty(fieldName)) {
            throw new IllegalArgumentException("field name is null or empty");
        }
        this.fieldName = fieldName;
    }

    /**
     * Read from a stream.
     */
    public NotExistsQueryBuilder(StreamInput in) throws IOException {
        super(in);
        fieldName = in.readString();
    }

    @Override
    protected void doWriteTo(StreamOutput out) throws IOException {
        out.writeString(fieldName);
    }

    @Override
    public String fieldName() {
        return fieldName;
    }

    @Override
    protected QueryBuilder doRewrite(QueryRewriteContext queryRewriteContext) throws IOException {
        QueryShardContext context = queryRewriteContext.convertToShardContext();
        if (context != null) {
            // If the field is not mapped, every document is effectively "missing" the field.
            Query innerExists = ExistsQueryBuilder.newFilter(context, fieldName, false);
            if (innerExists instanceof MatchNoDocsQuery) {
                return new MatchAllQueryBuilder();
            }
        }
        return super.doRewrite(queryRewriteContext);
    }

    @Override
    protected Query doToQuery(QueryShardContext context) throws IOException {
        Query innerExists = ExistsQueryBuilder.newFilter(context, fieldName, false);

        if (innerExists instanceof MatchNoDocsQuery) {
            // Field is unmapped or absent everywhere → all docs match "not exists".
            return new MatchAllDocsQuery();
        }

        return new NotExistsQuery(innerExists, fieldName);
    }

    @Override
    protected void doXContent(XContentBuilder builder, Params params) throws IOException {
        builder.startObject(NAME);
        builder.field(FIELD_FIELD.getPreferredName(), fieldName);
        printBoostAndQueryName(builder);
        builder.endObject();
    }

    public static NotExistsQueryBuilder fromXContent(XContentParser parser) throws IOException {
        String fieldPattern = null;
        String queryName = null;
        float boost = AbstractQueryBuilder.DEFAULT_BOOST;

        XContentParser.Token token;
        String currentFieldName = null;
        while ((token = parser.nextToken()) != XContentParser.Token.END_OBJECT) {
            if (token == XContentParser.Token.FIELD_NAME) {
                currentFieldName = parser.currentName();
            } else if (token.isValue()) {
                if (FIELD_FIELD.match(currentFieldName, parser.getDeprecationHandler())) {
                    fieldPattern = parser.text();
                } else if (AbstractQueryBuilder.NAME_FIELD.match(currentFieldName, parser.getDeprecationHandler())) {
                    queryName = parser.text();
                } else if (AbstractQueryBuilder.BOOST_FIELD.match(currentFieldName, parser.getDeprecationHandler())) {
                    boost = parser.floatValue();
                } else {
                    throw new ParsingException(
                        parser.getTokenLocation(),
                        "[" + NotExistsQueryBuilder.NAME + "] query does not support [" + currentFieldName + "]"
                    );
                }
            } else {
                throw new ParsingException(
                    parser.getTokenLocation(),
                    "[" + NotExistsQueryBuilder.NAME + "] unknown token [" + token + "] after [" + currentFieldName + "]"
                );
            }
        }

        if (fieldPattern == null) {
            throw new ParsingException(parser.getTokenLocation(), "[" + NotExistsQueryBuilder.NAME + "] must be provided with a [field]");
        }

        NotExistsQueryBuilder builder = new NotExistsQueryBuilder(fieldPattern);
        builder.queryName(queryName);
        builder.boost(boost);
        return builder;
    }

    @Override
    protected int doHashCode() {
        return Objects.hash(fieldName);
    }

    @Override
    protected boolean doEquals(NotExistsQueryBuilder other) {
        return Objects.equals(fieldName, other.fieldName);
    }

    @Override
    public String getWriteableName() {
        return NAME;
    }
}
Rewrite Duplication

In doRewrite, ExistsQueryBuilder.newFilter is called to check for MatchNoDocsQuery, and then in doToQuery the same call is made again. This results in duplicate work. Consider caching or restructuring to avoid calling newFilter twice per query execution.

protected QueryBuilder doRewrite(QueryRewriteContext queryRewriteContext) throws IOException {
    QueryShardContext context = queryRewriteContext.convertToShardContext();
    if (context != null) {
        // If the field is not mapped, every document is effectively "missing" the field.
        Query innerExists = ExistsQueryBuilder.newFilter(context, fieldName, false);
        if (innerExists instanceof MatchNoDocsQuery) {
            return new MatchAllQueryBuilder();
        }
    }
    return super.doRewrite(queryRewriteContext);
}

@Override
protected Query doToQuery(QueryShardContext context) throws IOException {
    Query innerExists = ExistsQueryBuilder.newFilter(context, fieldName, false);

    if (innerExists instanceof MatchNoDocsQuery) {
        // Field is unmapped or absent everywhere → all docs match "not exists".
        return new MatchAllDocsQuery();
    }

    return new NotExistsQuery(innerExists, fieldName);
}
Javadoc Stale

The existing Javadoc for getComplement still says "Returns a list of RangeQueryBuilder whose elements..." which is now inaccurate since ExistsQueryBuilder also implements this interface. The documentation should be updated to be more general.

/**
 * Returns a list of RangeQueryBuilder whose elements, when combined, form the complement of this range query.
 * May be null, in which case the complement couldn't be determined.
 * @return the complement
 */
List<? extends QueryBuilder> getComplement(QueryShardContext context);

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 4, 2026

PR Code Suggestions ✨

Latest suggestions up to e21a7c9
Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Handle match-all inner query complement case

The doRewrite method checks if the inner exists query returns MatchNoDocsQuery and
rewrites to MatchAllQueryBuilder, but doToQuery duplicates this same check. More
importantly, if innerExists is a MatchAllDocsQuery (meaning every document has the
field), the complement should be MatchNoneQueryBuilder, but neither doRewrite nor
doToQuery handles this case — doToQuery would wrap a MatchAllDocsQuery inside a
NotExistsQuery, which would return no documents but not be recognized as a no-op.

server/src/main/java/org/opensearch/index/query/NotExistsQueryBuilder.java [73-83]

 @Override
 protected QueryBuilder doRewrite(QueryRewriteContext queryRewriteContext) throws IOException {
     QueryShardContext context = queryRewriteContext.convertToShardContext();
     if (context != null) {
-        // If the field is not mapped, every document is effectively "missing" the field.
         Query innerExists = ExistsQueryBuilder.newFilter(context, fieldName, false);
         if (innerExists instanceof MatchNoDocsQuery) {
             return new MatchAllQueryBuilder();
+        }
+        if (innerExists instanceof MatchAllDocsQuery) {
+            return new MatchNoneQueryBuilder();
         }
     }
     return super.doRewrite(queryRewriteContext);
 }
Suggestion importance[1-10]: 7

__

Why: This is a valid edge case: if innerExists is a MatchAllDocsQuery (every document has the field), the complement should be MatchNoneQueryBuilder. Without this, doToQuery would wrap a MatchAllDocsQuery in a NotExistsQuery that returns no documents but isn't recognized as a no-op, potentially causing correctness issues and missed optimizations.

Medium
Fix null occur in query visitor delegation

Passing null as the BooleanClause.Occur argument to visitor.getSubVisitor(null,
this) is not valid — getSubVisitor expects a non-null BooleanClause.Occur. Since
NotExistsQuery is semantically a negation, the correct occur value should be
BooleanClause.Occur.MUST_NOT.

server/src/main/java/org/opensearch/common/lucene/search/NotExistsQuery.java [98-102]

 @Override
 public void visit(QueryVisitor visitor) {
     if (visitor.acceptField(fieldName)) {
-        innerExistsQuery.visit(visitor.getSubVisitor(null, this));
+        innerExistsQuery.visit(visitor.getSubVisitor(BooleanClause.Occur.MUST_NOT, this));
     }
 }
Suggestion importance[1-10]: 6

__

Why: Passing null as the BooleanClause.Occur to getSubVisitor may cause NullPointerException in visitor implementations that don't handle null occur values. Using BooleanClause.Occur.MUST_NOT is semantically correct for a negation query and avoids potential runtime errors.

Low
General
Document sentinel value intent clearly

The nextInner field is initialized to -1, but the advance logic assumes it starts
uninitialized relative to the inner iterator. On the first call to advance(0), the
condition nextInner < doc (i.e., -1 < 0) is true, so inner.advance(0) is called
correctly. However, this is fragile — if advance is called with target = 0 after
construction, nextInner should be initialized to a sentinel that clearly means "not
yet advanced", such as -1 with a comment, or better, use Integer.MIN_VALUE to avoid
confusion with valid doc IDs. More critically, the constructor should eagerly
advance the inner iterator to its first document to avoid the implicit reliance on
-1 < 0.
Actually, the real bug is: if nextDoc() is called before advance(), doc
starts at -1, and advance(doc + 1) = advance(0) is called. This is correct. But if
advance(0) is called directly, nextInner = -1 < 0 triggers inner.advance(0). This is
fine. The initialization is actually correct, but should be documented clearly.

server/src/main/java/org/opensearch/common/lucene/search/NotExistsQuery.java [131-136]

+/** Sentinel: inner iterator has not yet been advanced past this point. */
 private int nextInner = -1;
 
 ComplementDocIdSetIterator(DocIdSetIterator inner, int maxDoc) {
     this.inner = inner;
     this.maxDoc = maxDoc;
 }
Suggestion importance[1-10]: 1

__

Why: The suggestion only adds a comment to document the sentinel value -1 for nextInner. This is a minor documentation improvement with no functional impact, and the suggestion itself acknowledges the initialization is actually correct.

Low

Previous suggestions

Suggestions up to commit 74a9528
CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix null Occur in QueryVisitor delegation

Passing null as the BooleanClause.Occur argument to visitor.getSubVisitor(null,
this) is incorrect. The NotExistsQuery semantically represents a MUST_NOT
relationship with the inner exists query. Passing null may cause
NullPointerException in some QueryVisitor implementations that dereference the Occur
parameter. It should be BooleanClause.Occur.MUST_NOT.

server/src/main/java/org/opensearch/common/lucene/search/NotExistsQuery.java [98-102]

 @Override
 public void visit(QueryVisitor visitor) {
     if (visitor.acceptField(fieldName)) {
-        innerExistsQuery.visit(visitor.getSubVisitor(null, this));
+        innerExistsQuery.visit(visitor.getSubVisitor(BooleanClause.Occur.MUST_NOT, this));
     }
 }
Suggestion importance[1-10]: 6

__

Why: Passing null as the BooleanClause.Occur to getSubVisitor could cause NullPointerException in visitor implementations that dereference the Occur parameter. Using BooleanClause.Occur.MUST_NOT is semantically correct and prevents potential NPEs, making this a valid bug fix with moderate impact.

Low
Pre-fetch inner iterator in constructor

The nextInner field is initialized to -1, but the advance method checks if
(nextInner < doc) to decide whether to advance the inner iterator. On the very first
call with target=0, nextInner=-1 < 0 is true, so it will correctly advance. However,
the initial state should be explicitly set to NO_MORE_DOCS or advanced eagerly in
the constructor to make the intent clear and avoid subtle bugs if advance(0) is
never called before nextDoc(). Consider initializing nextInner by calling
inner.nextDoc() in the constructor so the first inner document is pre-fetched.

server/src/main/java/org/opensearch/common/lucene/search/NotExistsQuery.java [131-136]

-private int nextInner = -1;
+private int nextInner;
 
-ComplementDocIdSetIterator(DocIdSetIterator inner, int maxDoc) {
+ComplementDocIdSetIterator(DocIdSetIterator inner, int maxDoc) throws IOException {
     this.inner = inner;
     this.maxDoc = maxDoc;
+    this.nextInner = inner.nextDoc();
 }
Suggestion importance[1-10]: 3

__

Why: The current initialization of nextInner = -1 works correctly because the advance method checks if (nextInner < doc) before advancing. Pre-fetching in the constructor would change the semantics and add throws IOException to the constructor, which is a more invasive change. The existing logic is functionally correct, so this is a minor style/clarity suggestion with low impact.

Low
General
Handle MatchAllDocsQuery case in rewrite

The doRewrite method only handles the MatchNoDocsQuery case but not the
MatchAllDocsQuery case. If innerExists is a MatchAllDocsQuery (field exists in every
document), the complement should be MatchNoneQueryBuilder. Without this, doToQuery
will wrap a MatchAllDocsQuery in a NotExistsQuery, which will correctly return no
documents but is less efficient than returning MatchNoneQueryBuilder directly.

server/src/main/java/org/opensearch/index/query/NotExistsQueryBuilder.java [73-83]

 protected QueryBuilder doRewrite(QueryRewriteContext queryRewriteContext) throws IOException {
     QueryShardContext context = queryRewriteContext.convertToShardContext();
     if (context != null) {
-        // If the field is not mapped, every document is effectively "missing" the field.
         Query innerExists = ExistsQueryBuilder.newFilter(context, fieldName, false);
         if (innerExists instanceof MatchNoDocsQuery) {
             return new MatchAllQueryBuilder();
+        }
+        if (innerExists instanceof MatchAllDocsQuery) {
+            return new MatchNoneQueryBuilder();
         }
     }
     return super.doRewrite(queryRewriteContext);
 }
Suggestion importance[1-10]: 5

__

Why: This is a valid optimization: when innerExists is a MatchAllDocsQuery, the complement should be MatchNoneQueryBuilder for efficiency. Without this, doToQuery would wrap a MatchAllDocsQuery in a NotExistsQuery which still works correctly but is less efficient. The suggestion is accurate and the improved_code correctly reflects the change.

Low

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 4, 2026

❌ Gradle check result for 74a9528: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@ThyTran1402 ThyTran1402 changed the title optimize must_ot exisits on dense fields optimize must_not exisits on dense fields Apr 4, 2026
Signed-off-by: Thy Tran <58045538+ThyTran1402@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 4, 2026

Persistent review updated to latest commit e21a7c9

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 4, 2026

❌ Gradle check result for e21a7c9: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@opensearch-trigger-bot
Copy link
Copy Markdown
Contributor

This PR is stalled because it has been open for 30 days with no activity.

@opensearch-trigger-bot opensearch-trigger-bot Bot added the stalled Issues that have stalled label May 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Enhancement or improvement to existing feature or request lucene Search:Performance stalled Issues that have stalled

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request] More efficiently handle NOT exists(...) queries

1 participant