Skip to content

[BugFix] Fix SQL window functions with ORDER BY/LIMIT on unified query path#5592

Open
dai-chen wants to merge 1 commit into
opensearch-project:mainfrom
dai-chen:fix/sql-window-order-limit
Open

[BugFix] Fix SQL window functions with ORDER BY/LIMIT on unified query path#5592
dai-chen wants to merge 1 commit into
opensearch-project:mainfrom
dai-chen:fix/sql-window-order-limit

Conversation

@dai-chen

@dai-chen dai-chen commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator

Description

This PR fixes SQL window functions used with ORDER BY / LIMIT, which produced wrong plans for Analytics Engine). Because fixing the shared AstBuilder directly would impact the SQL V2 engine (V2 requires the top operator to be a Project), the fix is implemented in the extended AST builder of the unified query API only.

Known limitation: this change does NOT fix a window query with ORDER BY on a non-projected column plus LIMIT — e.g. SELECT a, ROW_NUMBER() OVER (...) AS rn FROM t ORDER BY b LIMIT n. Correctly supporting it requires Calcite-style ordering-column handling (SqlToRelConverter.gatherOrderExprs).

Case 1 — window over GROUP BY + bare LIMIT

SELECT region, COUNT(*) AS cnt, ROW_NUMBER() OVER (ORDER BY COUNT(*) DESC) AS rn
FROM clickbench
GROUP BY region
LIMIT 5

# BEFORE (base — buggy): Limit is below the window Project
Project[region, cnt, rn=WindowFunction(...)]
  Limit(5)
    Aggregation[group=region, COUNT(*)]
      Relation(clickbench)

→ window ranks only the 5 rows that survive LIMIT. Wrong / non-deterministic.

# AFTER (fix): Limit above the window Project
Limit(5)
  Project[region, cnt, rn=WindowFunction(...)]
    Aggregation[group=region, COUNT(*)]
      Relation(clickbench)

→ rn ranked over all groups, then top-5 taken.

Case 2 — ORDER BY on the window alias

SELECT region, COUNT(*) AS cnt, ROW_NUMBER() OVER (ORDER BY COUNT(*) DESC) AS rn
FROM clickbench
GROUP BY region
ORDER BY rn
LIMIT 5

# BEFORE (base — buggy): Sort below the Project, and rn re-expanded into a second window
Project[region, cnt, rn=WindowFunction(...)]          ← window #1
  Limit(5)
    Sort[ key = WindowFunction(ROW_NUMBER, ...) ]      ← window #2 (re-expanded from "rn")
      Aggregation[group=region, COUNT(*)]
        Relation(clickbench)

→ two RexOver ⇒ "duplicate unqualified field name" + RelCollation cast ⇒ HTTP 500.

# AFTER (fix): Limit(Sort(Project)), sort key is the column reference rn
Limit(5)
  Sort[ key = QualifiedName("rn"), asc ]               ← references projected column
    Project[region, cnt, rn=WindowFunction(...)]       ← single window
      Aggregation[group=region, COUNT(*)]
        Relation(clickbench)

→ one RexOver; Calcite resolves rn to the projected field. No crash; true top-5 by rn.

Related Issues

Part of #5248

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions

github-actions Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

PR Reviewer Guide 🔍

(Review updated until commit 53958e8)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Possible Issue

The hasWindowFunctionInProjectList(QuerySpecificationContext) method creates a new QuerySpecification and calls collect() on it, but does not push/pop the parsing context. If collect() relies on context state or modifies it, this could cause incorrect behavior or leave the context in an inconsistent state when called during visitQuerySpecification.

private boolean hasWindowFunctionInProjectList(QuerySpecificationContext queryContext) {
  if (queryContext.fromClause() == null) {
    return false;
  }
  QuerySpecification probe = new QuerySpecification();
  probe.collect(queryContext, query);
  return hasWindowFunctionInProjectList(probe);
Null Pointer Risk

If queryContext.fromClause() returns null (checked at line 112), the code at line 88 will throw a NullPointerException when calling queryContext.fromClause().orderByClause(). The null check should occur before attempting to access orderByClause().

OrderByClauseContext orderByClause = queryContext.fromClause().orderByClause();
if (orderByClause != null) {
  result = new ExtendedAstSortBuilder(context.peek()).visit(orderByClause).attach(result);
}

@github-actions

github-actions Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

PR Code Suggestions ✨

Latest suggestions up to 53958e8
Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Validate child list before access

The code assumes sort.getChild() returns a non-empty list without validation. If the
Sort node has no children, calling get(0) will throw an IndexOutOfBoundsException.
Add a check to ensure the child list is not empty before accessing the first
element.

api/src/main/java/org/opensearch/sql/api/parser/SqlV2QueryParser.java [101-108]

 @Override
 public UnresolvedPlan visitFromClause(FromClauseContext ctx) {
   UnresolvedPlan from = super.visitFromClause(ctx);
   if (hasWindowFunctionInProjectList(context.peek()) && from instanceof Sort sort) {
     // Drop the ORDER BY Sort for window queries; it is re-attached above the Project
+    if (sort.getChild().isEmpty()) {
+      throw new IllegalStateException("Sort node has no children");
+    }
     return sort.getChild().get(0);
   }
   return from;
 }
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies a potential IndexOutOfBoundsException when accessing sort.getChild().get(0) without validating the list is non-empty. However, this is a defensive programming improvement rather than a critical bug, as the Sort node is expected to have children in normal operation.

Medium
Add null check for fromClause

The code calls queryContext.fromClause() without null-checking, but earlier checks
only verify queryContext.fromClause() == null in hasWindowFunctionInProjectList. If
fromClause() returns null, calling orderByClause() on it will cause a
NullPointerException. Add a null check before accessing orderByClause().

api/src/main/java/org/opensearch/sql/api/parser/SqlV2QueryParser.java [77-98]

 @Override
 public UnresolvedPlan visitQuerySpecification(QuerySpecificationContext queryContext) {
   if (!hasWindowFunctionInProjectList(queryContext)) {
     return super.visitQuerySpecification(queryContext);
   }
 
   context.push();
   context.peek().collect(queryContext, query);
   Project project = (Project) visit(queryContext.selectClause());
   UnresolvedPlan result = project.attach(visit(queryContext.fromClause()));
 
   // Window output must be computed before ORDER BY/LIMIT, so build Limit(Sort(Project(from)))
-  OrderByClauseContext orderByClause = queryContext.fromClause().orderByClause();
-  if (orderByClause != null) {
-    result = new ExtendedAstSortBuilder(context.peek()).visit(orderByClause).attach(result);
+  FromClauseContext fromClause = queryContext.fromClause();
+  if (fromClause != null) {
+    OrderByClauseContext orderByClause = fromClause.orderByClause();
+    if (orderByClause != null) {
+      result = new ExtendedAstSortBuilder(context.peek()).visit(orderByClause).attach(result);
+    }
   }
   if (queryContext.limitClause() != null) {
     result = visit(queryContext.limitClause()).attach(result);
   }
 
   context.pop();
   return result;
 }
Suggestion importance[1-10]: 3

__

Why: The suggestion identifies a potential NullPointerException, but the logic is flawed. The method hasWindowFunctionInProjectList already returns false when fromClause() is null (line 111-113), so execution only reaches line 88 when fromClause() is guaranteed to be non-null. The suggested check is redundant.

Low

Previous suggestions

Suggestions up to commit 70ff9d8
CategorySuggestion                                                                                                                                    Impact
Possible issue
Validate child list before access

The method assumes sort.getChild() returns a non-empty list without validation. If
the Sort node has no children, calling get(0) will throw an
IndexOutOfBoundsException. Add a check to ensure the child list is not empty before
accessing it.

api/src/main/java/org/opensearch/sql/api/parser/SqlV2QueryParser.java [101-108]

 @Override
 public UnresolvedPlan visitFromClause(FromClauseContext ctx) {
   UnresolvedPlan from = super.visitFromClause(ctx);
   if (hasWindowFunctionInProjectList(context.peek()) && from instanceof Sort sort) {
     // Drop Sort operator if any window function since it is already attached atop Project above
-    return sort.getChild().get(0);
+    if (!sort.getChild().isEmpty()) {
+      return sort.getChild().get(0);
+    }
   }
   return from;
 }
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies a potential IndexOutOfBoundsException if sort.getChild() returns an empty list. However, in typical AST structures, a Sort node should always have at least one child, making this an edge case. The suggestion improves defensive programming but may not address a likely runtime issue.

Medium

Scope the fix to the unified query API parser (ExtendedAstBuilder); fixing the shared AstBuilder directly would break the V2/legacy engine.

Signed-off-by: Chen Dai <daichen@amazon.com>
@dai-chen dai-chen force-pushed the fix/sql-window-order-limit branch from 70ff9d8 to 53958e8 Compare June 26, 2026 22:21
@github-actions

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 53958e8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants