Skip to content

Add AnalyticsExecutionEngine and query routing for Parquet-backed indices#5256

Closed
ahkcs wants to merge 1 commit into
opensearch-project:mainfrom
ahkcs:feature/mustang-analytics-execution-engine
Closed

Add AnalyticsExecutionEngine and query routing for Parquet-backed indices#5256
ahkcs wants to merge 1 commit into
opensearch-project:mainfrom
ahkcs:feature/mustang-analytics-execution-engine

Conversation

@ahkcs

@ahkcs ahkcs commented Mar 23, 2026

Copy link
Copy Markdown
Collaborator

Summary

Implements the execution engine adapter and query routing infrastructure for Project Mustang's unified query pipeline (#5247).

  • QueryPlanExecutor@FunctionalInterface contract for analytics engine plan execution. Local equivalent of upstream org.opensearch.analytics.exec.QueryPlanExecutor; will be swapped when analytics-framework JAR is published.
  • AnalyticsExecutionEngine — Implements ExecutionEngine. Bridges QueryPlanExecutor output (Iterable<Object[]>) to the SQL plugin's QueryResponse pipeline with type mapping via OpenSearchTypeFactory and row conversion via ExprValueUtils.
  • RestUnifiedQueryAction — Orchestrates the full analytics query path: index detection, schema building, PPL planning via UnifiedQueryPlanner, execution via AnalyticsExecutionEngine, response formatting via JdbcResponseFormatter. Runs on sql-worker thread pool.
  • RestPPLQueryAction — Added routing branch: queries targeting parquet_ prefixed indices are routed to the analytics path; all other queries go through the existing Lucene path unchanged.
  • StubQueryPlanExecutor — Returns canned data for parquet_logs and parquet_metrics stub tables, enabling full pipeline testing without the analytics engine.

Test plan

  • 12 unit tests for AnalyticsExecutionEngine (type mapping, row conversion, query size limit, null handling, error propagation, explain, PhysicalPlan rejection)
  • 10 unit tests for RestUnifiedQueryAction (index name extraction, routing detection for various PPL patterns)
  • Integration tests (Step 5 — requires test cluster with stub plugin)
  • Existing PPL integration test suite regression check

@github-actions

Copy link
Copy Markdown
Contributor

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 829ca34.

PathLineSeverityDescription
plugin/src/main/java/org/opensearch/sql/plugin/rest/StubQueryPlanExecutor.java1highA stub executor that returns hardcoded fake data (canned rows with IP addresses and fixed timestamps) is wired directly into the production plugin (SQLPlugin.java). Any PPL query targeting a 'parquet_*' index will silently return fabricated results instead of real data, completely bypassing the actual query engine. This data falsification is registered unconditionally in the production handler, not gated behind a test or feature flag.
plugin/src/main/java/org/opensearch/sql/plugin/SQLPlugin.java168highRestPPLQueryAction is registered twice in the REST handler list: once with the now-removed no-arg constructor (which no longer exists per RestPPLQueryAction.java diff, making this a compile-time landmine or crash) and once with the new parameterized constructor. The duplicate registration is suspicious—if it compiles, it could double-handle PPL requests; if it does not, it acts as a sabotage of the build/deploy pipeline. Either outcome is consistent with deliberate interference.
plugin/src/main/java/org/opensearch/sql/plugin/rest/RestUnifiedQueryAction.java104mediumThe isUnifiedQueryPath routing bypass silently diverts all PPL queries whose source index begins with 'parquet_' away from the normal OpenSearch execution path (including its authorization and access-control checks) into the new analytics pipeline backed by the stub executor. Any user who knows the naming convention can route their query to a path with weaker or absent access controls.
plugin/src/main/java/org/opensearch/sql/plugin/rest/RestPPLQueryAction.java50mediumThe original no-arg RestPPLQueryAction() constructor is replaced by a constructor requiring ClusterService and NodeClient, but SQLPlugin.java still calls new RestPPLQueryAction() (the removed constructor). If the final field 'unifiedQueryHandler' is uninitialized via a surviving default constructor path, calling the unified routing branch will produce a NullPointerException that disrupts the entire PPL query service—a potential availability impact.
plugin/src/main/java/org/opensearch/sql/plugin/rest/RestUnifiedQueryAction.java261lowThe reportError method inlines raw exception class names and messages directly into HTTP 500 responses as unescaped JSON strings. While not intentionally malicious, this can be exploited to leak internal class names, stack context, and configuration details to unauthenticated callers, aiding further reconnaissance.

The table above displays the top 10 most important findings.

Total: 5 | Critical: 0 | High: 2 | Medium: 2 | Low: 1


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@ahkcs ahkcs added PPL Piped processing language enhancement New feature or request labels Mar 23, 2026
@ahkcs ahkcs force-pushed the feature/mustang-analytics-execution-engine branch from 829ca34 to 253da49 Compare March 23, 2026 22:06
@github-actions

Copy link
Copy Markdown
Contributor

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 253da49.

PathLineSeverityDescription
plugin/src/main/java/org/opensearch/sql/plugin/rest/StubQueryPlanExecutor.java1mediumA stub executor returning hardcoded rows is present in production code (not a test class). Any query targeting 'parquet_logs' or 'parquet_metrics' will silently return fabricated data including hardcoded IP addresses and fake status codes, bypassing real data access. While labeled as a development stub with TODOs, shipping this in a production plugin path could mislead users about real data.
plugin/src/main/java/org/opensearch/sql/plugin/rest/RestUnifiedQueryAction.java87mediumQueries targeting any 'parquet_*' index are silently rerouted to the analytics engine execution path via isUnifiedQueryPath(). If the analytics engine enforces different (or no) authorization/security checks compared to the standard OpenSearch query path, this prefix-based routing could act as an authorization bypass for attacker-controlled index names starting with 'parquet_'.
plugin/src/main/java/org/opensearch/sql/plugin/rest/RestPPLQueryAction.java52lowThe ClusterService parameter is accepted in the constructor but never stored or used — only NodeClient is kept. Accepting an unused privileged object (ClusterService grants broad cluster-state access) without purpose is an anomaly worth investigating; it may indicate an incomplete or removed feature rather than malicious intent.
plugin/src/main/java/org/opensearch/sql/plugin/rest/RestUnifiedQueryAction.java261lowreportError() inlines the raw exception message directly into the JSON response body with only double-quote escaping. Depending on what the underlying exception surfaces (e.g., file paths, internal query plans, stack frames from wrapped exceptions), this can leak internal implementation details to unauthenticated callers.

The table above displays the top 10 most important findings.

Total: 4 | Critical: 0 | High: 0 | Medium: 2 | Low: 2


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@github-actions

github-actions Bot commented Mar 23, 2026

Copy link
Copy Markdown
Contributor

PR Reviewer Guide 🔍

(Review updated until commit 3b1ecb8)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 Security concerns

Sensitive information exposure:
The reportError method in RestUnifiedQueryAction reflects raw exception messages (including stack-trace-level detail from e.getMessage()) directly into HTTP responses. Depending on the exception type, this can leak internal implementation details, file paths, or configuration values to API callers. Exception messages should be sanitized or mapped to user-facing messages before being sent in responses.

📝 TODO sections

🔀 Multiple PR themes

Sub-PR theme: Core analytics execution engine and QueryPlanExecutor interface

Relevant files:

  • core/src/main/java/org/opensearch/sql/executor/analytics/AnalyticsExecutionEngine.java
  • core/src/main/java/org/opensearch/sql/executor/analytics/QueryPlanExecutor.java
  • core/src/test/java/org/opensearch/sql/executor/analytics/AnalyticsExecutionEngineTest.java

Sub-PR theme: PPL query routing and REST handler for analytics engine path

Relevant files:

  • plugin/src/main/java/org/opensearch/sql/plugin/rest/RestUnifiedQueryAction.java
  • plugin/src/main/java/org/opensearch/sql/plugin/rest/StubQueryPlanExecutor.java
  • plugin/src/main/java/org/opensearch/sql/plugin/rest/RestPPLQueryAction.java
  • plugin/src/main/java/org/opensearch/sql/plugin/SQLPlugin.java
  • plugin/src/test/java/org/opensearch/sql/plugin/rest/RestUnifiedQueryActionTest.java
  • integ-test/src/test/java/org/opensearch/sql/ppl/AnalyticsPPLIT.java

⚡ Recommended focus areas for review

XSS / Injection in Error Response

The reportError method manually constructs a JSON error response by string-concatenating e.getClass().getSimpleName() and e.getMessage(). While getMessage() has its double-quotes escaped, other JSON-special characters (newlines, backslashes, control characters) are not escaped. A crafted exception message could break the JSON structure or inject content into the response. Use a proper JSON serializer (e.g., Jackson ObjectMapper) to build the error body.

private static void reportError(RestChannel channel, Exception e) {
  channel.sendResponse(
      new BytesRestResponse(
          org.opensearch.core.rest.RestStatus.INTERNAL_SERVER_ERROR,
          "application/json; charset=UTF-8",
          "{\"error\":{\"type\":\""
              + e.getClass().getSimpleName()
              + "\",\"reason\":\""
              + (e.getMessage() != null ? e.getMessage().replace("\"", "\\\"") : "Unknown error")
              + "\"},\"status\":500}"));
}
Stub in Production Code

StubQueryPlanExecutor is wired directly into RestPPLQueryAction (production plugin bootstrap) via new StubQueryPlanExecutor(). This means every deployed instance will use hardcoded canned data for any parquet_* index, silently returning fake rows instead of an error or real data. This should not be merged to a production branch without a guard (e.g., a feature flag or ensuring the stub is only used in test scope).

if (planStr.contains("parquet_metrics")) {
Fragile Table Detection

extractTableName calls RelOptUtil.toString(plan) and does a string contains check to identify the table. This is fragile: any plan whose string representation happens to contain the substring "parquet_logs" (e.g., a filter literal, a column alias, or a comment) will incorrectly match. For a stub this is low-severity, but it will cause silent wrong-data bugs during pipeline testing.

private String extractTableName(RelNode plan) {
  // Use RelOptUtil.toString to get the full plan tree including child nodes
  String planStr = RelOptUtil.toString(plan);
  if (planStr.contains("parquet_logs")) {
    return "parquet_logs";
  }
  if (planStr.contains("parquet_metrics")) {
    return "parquet_metrics";
  }
  return null;
}
Null context passed to executor

planExecutor.execute(plan, null) always passes null as the execution context. The QueryPlanExecutor interface documents the parameter as "opaque execution context", implying it may carry security credentials, transaction state, or resource limits. Passing null unconditionally may cause a NullPointerException in the real executor when it is wired in, and it discards any context available in CalcitePlanContext. The CalcitePlanContext (or a wrapper) should be passed instead.

Iterable<Object[]> rows = planExecutor.execute(plan, null);

@github-actions

github-actions Bot commented Mar 23, 2026

Copy link
Copy Markdown
Contributor

PR Code Suggestions ✨

Latest suggestions up to 3b1ecb8

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Security
Fix incomplete JSON escaping in error responses

The error message is manually escaped only for double quotes, but other special
characters (backslashes, newlines, control characters) can break the JSON structure
or cause injection issues. Use a proper JSON serialization approach instead of
manual string concatenation for the error response body.

plugin/src/main/java/org/opensearch/sql/plugin/rest/RestUnifiedQueryAction.java [263-273]

 private static void reportError(RestChannel channel, Exception e) {
+  String reason = e.getMessage() != null ? e.getMessage() : "Unknown error";
+  // Escape all JSON special characters
+  String escapedReason = reason
+      .replace("\\", "\\\\")
+      .replace("\"", "\\\"")
+      .replace("\n", "\\n")
+      .replace("\r", "\\r")
+      .replace("\t", "\\t");
   channel.sendResponse(
       new BytesRestResponse(
           org.opensearch.core.rest.RestStatus.INTERNAL_SERVER_ERROR,
           "application/json; charset=UTF-8",
           "{\"error\":{\"type\":\""
               + e.getClass().getSimpleName()
               + "\",\"reason\":\""
-              + (e.getMessage() != null ? e.getMessage().replace("\"", "\\\"") : "Unknown error")
+              + escapedReason
               + "\"},\"status\":500}"));
 }
Suggestion importance[1-10]: 6

__

Why: The manual JSON escaping in reportError only handles double quotes, leaving backslashes, newlines, and other control characters unescaped, which could break the JSON structure. The improved code adds more escape sequences, though a proper JSON library would be the ideal solution.

Low
General
Handle thread pool scheduling failures gracefully

If the SQL_WORKER_THREAD_POOL_NAME thread pool does not exist or is not registered,
the schedule call will throw an exception that is not caught, leaving the REST
channel without a response and potentially hanging the client. The scheduling call
should be wrapped in a try-catch to ensure the channel always receives a response.

plugin/src/main/java/org/opensearch/sql/plugin/rest/RestUnifiedQueryAction.java [112-119]

 public void execute(String query, QueryType queryType, RestChannel channel, boolean isExplain) {
-  client
-      .threadPool()
-      .schedule(
-          () -> doExecute(query, queryType, channel, isExplain),
-          new TimeValue(0),
-          SQL_WORKER_THREAD_POOL_NAME);
+  try {
+    client
+        .threadPool()
+        .schedule(
+            () -> doExecute(query, queryType, channel, isExplain),
+            new TimeValue(0),
+            SQL_WORKER_THREAD_POOL_NAME);
+  } catch (Exception e) {
+    recordFailureMetric(e);
+    reportError(channel, e);
+  }
 }
Suggestion importance[1-10]: 5

__

Why: If the thread pool scheduling fails, the REST channel would hang without a response. Wrapping the schedule call in a try-catch ensures the client always receives a response, improving robustness.

Low
Pass execution context instead of null to plan executor

The context parameter from CalcitePlanContext is passed as null to
planExecutor.execute(). If the QueryPlanExecutor implementation requires a non-null
context for proper execution (e.g., for security, timeout, or resource management),
this will silently produce incorrect behavior or failures. The execution context
should be derived from CalcitePlanContext and passed through.

core/src/main/java/org/opensearch/sql/executor/analytics/AnalyticsExecutionEngine.java [72]

-Iterable<Object[]> rows = planExecutor.execute(plan, null);
+Iterable<Object[]> rows = planExecutor.execute(plan, context);
Suggestion importance[1-10]: 3

__

Why: While passing null as context is intentional per the QueryPlanExecutor interface design (context is "opaque to avoid server dependency"), passing the CalcitePlanContext object directly would violate the interface contract since it's typed as Object. The suggestion may be premature given the current stub design.

Low
Possible issue
Fix substring matching order to avoid false table name matches

The parquet_metrics check will never be reached because parquet_metrics also
contains the substring parquet_ and the string parquet_logs does not match
parquet_metrics, but the order matters if a plan string could contain both. More
critically, a plan involving parquet_metrics would never match parquet_logs, so the
logic is correct in isolation — however, a plan string containing
parquet_logs_metrics or similar would incorrectly match parquet_logs. The check
should use exact word/table-name matching rather than substring containment to avoid
false positives.

plugin/src/main/java/org/opensearch/sql/plugin/rest/StubQueryPlanExecutor.java [49-59]

 private String extractTableName(RelNode plan) {
-  // Use RelOptUtil.toString to get the full plan tree including child nodes
   String planStr = RelOptUtil.toString(plan);
+  if (planStr.contains("parquet_metrics")) {
+    return "parquet_metrics";
+  }
   if (planStr.contains("parquet_logs")) {
     return "parquet_logs";
-  }
-  if (planStr.contains("parquet_metrics")) {
-    return "parquet_metrics";
   }
   return null;
 }
Suggestion importance[1-10]: 4

__

Why: The suggestion correctly identifies that checking parquet_metrics before parquet_logs avoids potential false matches (e.g., a table named parquet_logs_metrics). However, since this is a stub implementation meant to be replaced, the impact is limited.

Low

Previous suggestions

Suggestions up to commit 094a096
CategorySuggestion                                                                                                                                    Impact
Security
Fix incomplete JSON escaping in error responses

The error message is manually escaped only for double quotes, but other special
characters (newlines, backslashes, control characters) in the exception message can
break the JSON structure or cause injection issues. Use a proper JSON serialization
library (e.g., Jackson's ObjectMapper or JsonStringEncoder) to safely encode the
error message string.

plugin/src/main/java/org/opensearch/sql/plugin/rest/RestUnifiedQueryAction.java [263-273]

 private static void reportError(RestChannel channel, Exception e) {
+  String rawMessage = e.getMessage() != null ? e.getMessage() : "Unknown error";
+  String safeMessage = rawMessage
+      .replace("\\", "\\\\")
+      .replace("\"", "\\\"")
+      .replace("\n", "\\n")
+      .replace("\r", "\\r")
+      .replace("\t", "\\t");
   channel.sendResponse(
       new BytesRestResponse(
           org.opensearch.core.rest.RestStatus.INTERNAL_SERVER_ERROR,
           "application/json; charset=UTF-8",
           "{\"error\":{\"type\":\""
               + e.getClass().getSimpleName()
               + "\",\"reason\":\""
-              + (e.getMessage() != null ? e.getMessage().replace("\"", "\\\"") : "Unknown error")
+              + safeMessage
               + "\"},\"status\":500}"));
 }
Suggestion importance[1-10]: 6

__

Why: The manual JSON escaping only handles double quotes but misses other special characters like newlines, backslashes, and control characters that could break the JSON structure. This is a valid security/correctness concern, though using a proper JSON library would be the ideal fix rather than the improved manual escaping shown.

Low
Possible issue
Handle thread pool scheduling failures gracefully

The schedule call returns a Cancellable that is silently discarded. If the thread
pool is shut down or the task is rejected, the error will be swallowed and the REST
channel will never receive a response, leaving the client hanging. The scheduled
task's failure should be handled, or the channel should be closed on rejection.

plugin/src/main/java/org/opensearch/sql/plugin/rest/RestUnifiedQueryAction.java [112-119]

 public void execute(String query, QueryType queryType, RestChannel channel, boolean isExplain) {
-  client
-      .threadPool()
-      .schedule(
-          () -> doExecute(query, queryType, channel, isExplain),
-          new TimeValue(0),
-          SQL_WORKER_THREAD_POOL_NAME);
+  try {
+    client
+        .threadPool()
+        .schedule(
+            () -> doExecute(query, queryType, channel, isExplain),
+            new TimeValue(0),
+            SQL_WORKER_THREAD_POOL_NAME);
+  } catch (Exception e) {
+    recordFailureMetric(e);
+    reportError(channel, e);
+  }
 }
Suggestion importance[1-10]: 5

__

Why: The discarded Cancellable return value from schedule could leave the REST channel hanging if the thread pool rejects the task. The suggestion to wrap in a try-catch is a reasonable improvement, though the actual risk depends on the thread pool's rejection policy.

Low
Fix substring matching ambiguity in table name extraction

The check for "parquet_logs" will also match "parquet_logs_archive" or any other
table name that contains "parquet_logs" as a substring, and since parquet_logs is
checked before parquet_metrics, a table named parquet_logs_metrics would incorrectly
match parquet_logs. Use more precise matching (e.g., exact word boundary or checking
for the full table name with surrounding non-alphanumeric characters).

plugin/src/main/java/org/opensearch/sql/plugin/rest/StubQueryPlanExecutor.java [49-59]

 private String extractTableName(RelNode plan) {
-  // Use RelOptUtil.toString to get the full plan tree including child nodes
   String planStr = RelOptUtil.toString(plan);
-  if (planStr.contains("parquet_logs")) {
+  if (planStr.matches("(?s).*\\bparquet_metrics\\b.*")) {
+    return "parquet_metrics";
+  }
+  if (planStr.matches("(?s).*\\bparquet_logs\\b.*")) {
     return "parquet_logs";
-  }
-  if (planStr.contains("parquet_metrics")) {
-    return "parquet_metrics";
   }
   return null;
 }
Suggestion importance[1-10]: 4

__

Why: The contains check for "parquet_logs" would incorrectly match table names like "parquet_logs_archive", and the ordering matters since parquet_logs is checked before parquet_metrics. However, this is a stub implementation meant to be replaced, reducing the practical impact.

Low
General
Ensure streaming result resources are properly closed

The rows iterable from planExecutor.execute may represent a lazy/streaming result
set (e.g., backed by a cursor or network stream). Calling
plan.getRowType().getFieldList() after execute is safe, but if the iterable holds
resources (like a JDBC ResultSet or file handle), there is no guarantee it will be
closed after iteration in convertRows. Consider wrapping the iterable in a
try-with-resources if it implements Closeable, or documenting this constraint in
QueryPlanExecutor.

core/src/main/java/org/opensearch/sql/executor/analytics/AnalyticsExecutionEngine.java [72-75]

 Iterable<Object[]> rows = planExecutor.execute(plan, null);
 
 List<RelDataTypeField> fields = plan.getRowType().getFieldList();
-List<ExprValue> results = convertRows(rows, fields, querySizeLimit);
+List<ExprValue> results;
+if (rows instanceof AutoCloseable) {
+  try (AutoCloseable closeable = (AutoCloseable) rows) {
+    results = convertRows(rows, fields, querySizeLimit);
+  }
+} else {
+  results = convertRows(rows, fields, querySizeLimit);
+}
Suggestion importance[1-10]: 4

__

Why: If planExecutor.execute returns a resource-backed iterable (e.g., a cursor), it may not be closed after iteration, causing resource leaks. The suggested AutoCloseable check is a reasonable defensive approach, though the QueryPlanExecutor interface could also be updated to return a Closeable iterable.

Low
Suggestions up to commit 253da49
CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix incomplete JSON escaping in error responses

The manual JSON construction in reportError is fragile and incomplete — it only
escapes double quotes but not other special characters (backslash, newlines, control
characters) that can appear in exception messages, potentially producing malformed
JSON. Use a proper JSON serialization approach or at minimum apply full JSON string
escaping to the message.

plugin/src/main/java/org/opensearch/sql/plugin/rest/RestUnifiedQueryAction.java [263-273]

 private static void reportError(RestChannel channel, Exception e) {
+  String message = e.getMessage() != null ? e.getMessage() : "Unknown error";
+  // Escape all JSON special characters
+  String escapedMessage = message
+      .replace("\\", "\\\\")
+      .replace("\"", "\\\"")
+      .replace("\n", "\\n")
+      .replace("\r", "\\r")
+      .replace("\t", "\\t");
   channel.sendResponse(
       new BytesRestResponse(
           org.opensearch.core.rest.RestStatus.INTERNAL_SERVER_ERROR,
           "application/json; charset=UTF-8",
           "{\"error\":{\"type\":\""
               + e.getClass().getSimpleName()
               + "\",\"reason\":\""
-              + (e.getMessage() != null ? e.getMessage().replace("\"", "\\\"") : "Unknown error")
+              + escapedMessage
               + "\"},\"status\":500}"));
 }
Suggestion importance[1-10]: 6

__

Why: The manual JSON construction only escapes double quotes but not backslashes, newlines, or other control characters that could appear in exception messages, producing malformed JSON. This is a valid correctness concern, though the code is explicitly temporary/stub infrastructure.

Low
General
Pass execution context instead of null to plan executor

Passing null as the execution context to planExecutor.execute is hardcoded and will
prevent proper context propagation when the real analytics engine is integrated. The
CalcitePlanContext is already available in scope and should be passed (or a derived
context object) to allow the executor to access query settings, credentials, or
other runtime information.

core/src/main/java/org/opensearch/sql/executor/analytics/AnalyticsExecutionEngine.java [72]

-Iterable<Object[]> rows = planExecutor.execute(plan, null);
+Iterable<Object[]> rows = planExecutor.execute(plan, context);
Suggestion importance[1-10]: 5

__

Why: Passing null as the execution context is intentional for the current stub/development phase, but passing the available context object would be better for future integration with the real analytics engine. The QueryPlanExecutor interface accepts Object context, so passing context directly would work but may not be semantically correct since CalcitePlanContext is not the same as the analytics engine's execution context.

Low
Fix substring collision in table name extraction

Using plan.toString() to extract the table name is unreliable because the string
representation of a RelNode is implementation-dependent and not guaranteed to
contain the raw table name. Additionally, parquet_metrics check will never be
reached since parquet_logs check comes first and parquet_metrics also contains
parquet_logs as a substring — wait, actually it doesn't, but parquet_logs check
would match any string containing that substring. The order is correct here, but the
approach is still fragile. Consider checking parquet_metrics before parquet_logs to
avoid potential future substring collision issues.

plugin/src/main/java/org/opensearch/sql/plugin/rest/StubQueryPlanExecutor.java [48-58]

 private String extractTableName(RelNode plan) {
   // Walk the plan to find the table name from the string representation
   String planStr = plan.toString();
+  // Check more specific names first to avoid substring collisions
+  if (planStr.contains("parquet_metrics")) {
+    return "parquet_metrics";
+  }
   if (planStr.contains("parquet_logs")) {
     return "parquet_logs";
-  }
-  if (planStr.contains("parquet_metrics")) {
-    return "parquet_metrics";
   }
   return null;
 }
Suggestion importance[1-10]: 3

__

Why: The suggestion correctly identifies that checking parquet_metrics before parquet_logs avoids potential future substring collision issues, but since parquet_metrics does not contain parquet_logs as a substring, there is no actual bug in the current code. This is a minor defensive improvement on stub/temporary code.

Low
Prevent bare prefix from matching unified query path

The SOURCE_PATTERN regex allows wildcard characters () in index names (e.g.,
parquet_
), but tableName.startsWith("parquet_") would correctly match parquet_*.
However, a query like source = parquet_ (with nothing after the underscore) would
also match, which may not be intended. Consider adding a minimum length check or
requiring at least one character after the prefix.

plugin/src/main/java/org/opensearch/sql/plugin/rest/RestUnifiedQueryAction.java [76-88]

 public static boolean isUnifiedQueryPath(String query) {
   if (query == null) {
     return false;
   }
   String indexName = extractIndexName(query);
   if (indexName == null) {
     return false;
   }
   // Handle qualified names like "catalog.parquet_logs" — check the last segment
   int lastDot = indexName.lastIndexOf('.');
   String tableName = lastDot >= 0 ? indexName.substring(lastDot + 1) : indexName;
-  return tableName.startsWith("parquet_");
+  // Require at least one character after "parquet_" prefix
+  return tableName.startsWith("parquet_") && tableName.length() > "parquet_".length();
 }
Suggestion importance[1-10]: 2

__

Why: The edge case of source = parquet_ (bare prefix with nothing after the underscore) is an extremely unlikely real-world scenario, and the suggestion adds a minor defensive check with minimal practical impact on the routing logic.

Low

@ahkcs ahkcs force-pushed the feature/mustang-analytics-execution-engine branch from 253da49 to 094a096 Compare March 23, 2026 23:25
@github-actions

Copy link
Copy Markdown
Contributor

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 094a096.

PathLineSeverityDescription
plugin/src/main/java/org/opensearch/sql/plugin/rest/RestPPLQueryAction.java52mediumStubQueryPlanExecutor is wired into production code (not test code), meaning all queries to 'parquet_*' indices bypass the real execution engine and return hardcoded data regardless of the actual query. While documented as temporary, placing a stub in src/main/java and instantiating it unconditionally in the production constructor is anomalous and could be used to return attacker-controlled or misleading results.
plugin/src/main/java/org/opensearch/sql/plugin/rest/RestPPLQueryAction.java95mediumQueries matching 'parquet_*' index names are silently rerouted to RestUnifiedQueryAction, bypassing the normal nodeClient.execute() path (which goes through the OpenSearch transport layer with its associated authentication and authorization checks). This could allow a user who names an index with the 'parquet_' prefix to bypass security controls enforced in the standard PPL execution pipeline.
plugin/src/main/java/org/opensearch/sql/plugin/rest/RestPPLQueryAction.java49lowThe ClusterService parameter is accepted in the constructor but never stored or used. This dead parameter may indicate incomplete implementation or an intentional omission of cluster-level security checks that would normally use ClusterService.
plugin/src/main/java/org/opensearch/sql/plugin/rest/RestUnifiedQueryAction.java261lowException messages are directly interpolated into the JSON error response body without sanitization (only quotes are escaped). This can leak internal implementation details, stack information, or sensitive data (e.g., file paths, query internals) to clients.

The table above displays the top 10 most important findings.

Total: 4 | Critical: 0 | High: 0 | Medium: 2 | Low: 2


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@github-actions

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 094a096

…ices (opensearch-project#5247)

Implements the execution engine adapter and query routing infrastructure
for Project Mustang's unified query pipeline (Option B). PPL queries
targeting parquet_ prefixed indices are routed through UnifiedQueryPlanner
and AnalyticsExecutionEngine instead of the existing Lucene path.

Key changes:
- QueryPlanExecutor interface: contract for analytics engine execution
- AnalyticsExecutionEngine: converts QueryPlanExecutor results to QueryResponse
- RestUnifiedQueryAction: orchestrates schema building, planning, execution
- RestPPLQueryAction: routing branch for parquet_ indices
- StubQueryPlanExecutor: canned data for development/testing

Signed-off-by: Kai Huang <kaihuang@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
@ahkcs ahkcs force-pushed the feature/mustang-analytics-execution-engine branch from 094a096 to 3b1ecb8 Compare March 23, 2026 23:29
@ahkcs

ahkcs commented Mar 23, 2026

Copy link
Copy Markdown
Collaborator Author

Retargeting to feature branch feature/mustang-ppl-integration

@ahkcs ahkcs closed this Mar 23, 2026
@github-actions

Copy link
Copy Markdown
Contributor

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 3b1ecb8.

PathLineSeverityDescription
plugin/src/main/java/org/opensearch/sql/plugin/rest/RestUnifiedQueryAction.java262lowError response is constructed via string concatenation using e.getMessage() with only quote-escaping. Other JSON metacharacters are not escaped, allowing a crafted exception message to produce malformed or injected JSON in the error response body. This appears to be a coding shortcut rather than malicious intent, but warrants a fix using a proper JSON serializer.

The table above displays the top 10 most important findings.

Total: 1 | Critical: 0 | High: 0 | Medium: 0 | Low: 1


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@github-actions

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 3b1ecb8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request PPL Piped processing language

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant