Skip to content

Fix/ai/datetime clusters#5520

Closed
vinaykpud wants to merge 2 commits into
opensearch-project:mainfrom
vinaykpud:fix/ai/datetime-clusters
Closed

Fix/ai/datetime clusters#5520
vinaykpud wants to merge 2 commits into
opensearch-project:mainfrom
vinaykpud:fix/ai/datetime-clusters

Conversation

@vinaykpud

Copy link
Copy Markdown
Contributor

Description

[Describe what this change achieves]

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Companion to OpenSearch fix/ai/datetime-clusters @ 9009736c2dc.
list(<TIME-typed field>) now returns "HH:mm:ss[.fraction]" without the
1970-01-01 epoch-date prefix.

The analytics-engine path rewrites PPL list() to DataFusion's list_merge,
so the legacy ListAggFunction never fires. Instead, AnalyticsExecutionEngine
now post-processes List-typed cells in the result conversion: when an
element string matches "1970-01-01[ T]HH:mm:ss[.fraction]", only the time
portion is kept. Scalar cells are untouched, preserving the wider
timestamp-stringification regression baseline.

Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>
@github-actions

github-actions Bot commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

PR Reviewer Guide 🔍

(Review updated until commit 2405204)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Possible Issue

The regex DATE_WITH_MIDNIGHT_TIME matches timestamps with any time component (not just midnight). For example, "2024-01-15 14:30:45" would match and be truncated to "2024-01-15", potentially losing non-midnight time data if the DateOnlyType marker is incorrectly applied to a full timestamp column.

private static final Pattern DATE_WITH_MIDNIGHT_TIME =
    Pattern.compile("^(\\d{4}-\\d{2}-\\d{2})[ T]\\d{2}:\\d{2}:\\d{2}(?:\\.\\d+)?$");

@github-actions

github-actions Bot commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

PR Code Suggestions ✨

Latest suggestions up to 2405204

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Guard list processing by type check

The method stripEpochDatePrefixInList is called unconditionally for all List<?> values
in toExprValue, but it should only process lists when the column type is
TimeOnlyType. This causes unnecessary regex matching on all list elements regardless
of their semantic type, potentially altering non-time data that happens to match the
pattern.

core/src/main/java/org/opensearch/sql/executor/analytics/AnalyticsExecutionEngine.java [266-268]

-private static List<Object> stripEpochDatePrefixInList(List<?> list) {
-  List<Object> out = new ArrayList<>(list.size());
-  for (Object element : list) {
-    if (element instanceof String s) {
-      var m = EPOCH_DATE_TIME_PREFIX.matcher(s);
-      out.add(m.matches() ? m.group(1) : s);
-    } else {
-      out.add(element);
+private static ExprValue toExprValue(Object value, RelDataType type) {
+  if (value instanceof byte[] bytes) {
+    if (type instanceof IpType) {
+      try {
+        return ExprValueUtils.stringValue(
+            InetAddresses.toAddrString(InetAddress.getByAddress(bytes)));
+      } catch (UnknownHostException e) {
+        throw new IllegalStateException("invalid IP buffer length: " + bytes.length, e);
+      }
+    } else if (type instanceof BinaryType) {
+      return ExprValueUtils.stringValue(Base64.getEncoder().encodeToString(bytes));
     }
   }
-  return out;
+  if (type instanceof DateOnlyType && value instanceof String s) {
+    var m = DATE_WITH_MIDNIGHT_TIME.matcher(s);
+    if (m.matches()) {
+      return ExprValueUtils.stringValue(m.group(1));
+    }
+  }
+  if (type instanceof TimeOnlyType && value instanceof String s) {
+    var m = EPOCH_DATE_TIME_PREFIX.matcher(s);
+    if (m.matches()) {
+      return ExprValueUtils.stringValue(m.group(1));
+    }
+  }
+  if (type instanceof TimeOnlyType && value instanceof List<?> list) {
+    return ExprValueUtils.collectionValue(stripEpochDatePrefixInList(list));
+  }
+  if (value instanceof List<?> list) {
+    return ExprValueUtils.collectionValue(new ArrayList<>(list));
+  }
+  return ExprValueUtils.fromObjectValue(value);
 }
Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies that stripEpochDatePrefixInList is called unconditionally for all lists, which could incorrectly transform non-time data matching the epoch pattern. The fix properly guards the transformation to only apply when type instanceof TimeOnlyType, preventing unintended data alterations.

Medium

Previous suggestions

Suggestions up to commit a5eb156
CategorySuggestion                                                                                                                                    Impact
General
Handle nested lists recursively

The method doesn't handle nested lists. If element is itself a List, it should
recursively call stripEpochDatePrefixInList to process nested time values. This
ensures consistent handling of multi-dimensional list structures.

core/src/main/java/org/opensearch/sql/executor/analytics/AnalyticsExecutionEngine.java [276-287]

 private static List<Object> stripEpochDatePrefixInList(List<?> list) {
   List<Object> out = new ArrayList<>(list.size());
   for (Object element : list) {
     if (element instanceof String s) {
       var m = EPOCH_DATE_TIME_PREFIX.matcher(s);
       out.add(m.matches() ? m.group(1) : s);
+    } else if (element instanceof List<?> nestedList) {
+      out.add(stripEpochDatePrefixInList(nestedList));
     } else {
       out.add(element);
     }
   }
   return out;
 }
Suggestion importance[1-10]: 6

__

Why: This is a valid enhancement that would handle nested list structures containing time values. However, the PR context suggests this handles list-aggregation elements from DataFusion, which may not produce nested lists in practice. The suggestion improves robustness but may address an edge case not currently encountered.

Low
Handle non-matching DateOnlyType format explicitly

After the pattern match fails, the code falls through to fromObjectValue(value)
which may not handle the malformed date string appropriately. Consider adding an
else clause to handle non-matching DateOnlyType strings explicitly or log a warning
about unexpected format.

core/src/main/java/org/opensearch/sql/executor/analytics/AnalyticsExecutionEngine.java [252-257]

 if (type instanceof DateOnlyType && value instanceof String s) {
   var m = DATE_WITH_MIDNIGHT_TIME.matcher(s);
   if (m.matches()) {
     return ExprValueUtils.stringValue(m.group(1));
   }
+  // Fall through to fromObjectValue for non-matching formats
 }
Suggestion importance[1-10]: 3

__

Why: The suggestion to add a comment for fall-through behavior is marginally helpful for code clarity, but the existing code already handles non-matching formats correctly by falling through to fromObjectValue(value). The improvement is minimal and doesn't change functionality.

Low
Suggestions up to commit 457fb21
CategorySuggestion                                                                                                                                    Impact
Possible issue
Remove duplicate stack management operations

The stack.push(rel) and stack.pop() operations duplicate the stack management
already performed by the parent visit(RelNode) method in RelHomogeneousShuttle. This
double-push creates an incorrect stack state that could cause issues in nested
traversals or when the shuttle's stack is inspected by other methods.

api/src/main/java/org/opensearch/sql/api/spec/datetime/DatetimeUdtNormalizeRule.java [64-75]

 private List<RelNode> recurseChildren(RelNode rel) {
   java.util.ArrayList<RelNode> out = new java.util.ArrayList<>(rel.getInputs().size());
-  stack.push(rel);
-  try {
-    for (RelNode input : rel.getInputs()) {
-      out.add(input.accept(this));
-    }
-  } finally {
-    stack.pop();
+  for (RelNode input : rel.getInputs()) {
+    out.add(input.accept(this));
   }
   return out;
 }
Suggestion importance[1-10]: 9

__

Why: The stack.push(rel) and stack.pop() operations in recurseChildren duplicate stack management already performed by the parent class's visit method. This creates incorrect stack state that could cause serious issues in nested traversals, as the stack would contain duplicate entries.

High
General
Handle non-matching date format strings

When the DATE_WITH_MIDNIGHT_TIME pattern doesn't match a DateOnlyType string value,
the method falls through to fromObjectValue(value) without returning. This could
lead to incorrect type conversion or unexpected behavior for date values that don't
match the expected format.

core/src/main/java/org/opensearch/sql/executor/analytics/AnalyticsExecutionEngine.java [252-258]

 if (type instanceof DateOnlyType && value instanceof String s) {
   var m = DATE_WITH_MIDNIGHT_TIME.matcher(s);
   if (m.matches()) {
     return ExprValueUtils.stringValue(m.group(1));
   }
+  return ExprValueUtils.stringValue(s);
 }
Suggestion importance[1-10]: 7

__

Why: When DateOnlyType values don't match the expected pattern, falling through to fromObjectValue may cause inconsistent behavior. Explicitly returning the string value ensures consistent handling of all DateOnlyType values, though the current fallthrough may be intentional for error cases.

Medium
Handle non-matching time format strings

Similar to the DateOnlyType case, when the EPOCH_DATE_TIME_PREFIX pattern doesn't
match a TimeOnlyType string value, the method falls through without returning. This
could result in incorrect handling of time values that don't match the expected
epoch-prefixed format.

core/src/main/java/org/opensearch/sql/executor/analytics/AnalyticsExecutionEngine.java [259-265]

 if (type instanceof TimeOnlyType && value instanceof String s) {
   var m = EPOCH_DATE_TIME_PREFIX.matcher(s);
   if (m.matches()) {
     return ExprValueUtils.stringValue(m.group(1));
   }
+  return ExprValueUtils.stringValue(s);
 }
Suggestion importance[1-10]: 7

__

Why: Similar to the date case, when TimeOnlyType values don't match the epoch prefix pattern, falling through to fromObjectValue may cause inconsistent behavior. Explicitly returning the string value ensures consistent handling, though the current fallthrough may be intentional for error cases.

Medium
Suggestions up to commit a5eb156
CategorySuggestion                                                                                                                                    Impact
General
Use getMessage instead of getLocalizedMessage

Using getLocalizedMessage() may return null or locale-specific text that doesn't
match the English pattern. Use getMessage() instead to ensure consistent pattern
matching across different locales and avoid potential NullPointerException in the
matcher.

opensearch/src/main/java/org/opensearch/sql/opensearch/response/error/ErrorMessage.java [47-50]

-String message = exception.getLocalizedMessage();
+String message = exception.getMessage();
 if (message != null && UNSUPPORTED_FORMAT_HINT.matcher(message).find()) {
   return "ExpressionEvaluationException";
 }
 return exception.getClass().getSimpleName();
Suggestion importance[1-10]: 7

__

Why: Using getMessage() instead of getLocalizedMessage() ensures consistent pattern matching across locales and avoids potential locale-specific variations that could break the regex match. This is a valid improvement for reliability.

Medium
Handle non-matching date pattern case

The code doesn't handle the case where DateOnlyType value doesn't match the expected
pattern. This could lead to incorrect data being passed through without
transformation. Add an else clause to handle non-matching date strings or log a
warning for unexpected formats.

core/src/main/java/org/opensearch/sql/executor/analytics/AnalyticsExecutionEngine.java [252-257]

 if (type instanceof DateOnlyType && value instanceof String s) {
   var m = DATE_WITH_MIDNIGHT_TIME.matcher(s);
   if (m.matches()) {
     return ExprValueUtils.stringValue(m.group(1));
   }
+  // If pattern doesn't match, value may already be in correct format or invalid
 }
Suggestion importance[1-10]: 3

__

Why: While adding a comment for non-matching cases could improve code clarity, the current implementation correctly falls through to ExprValueUtils.fromObjectValue(value) at line 269, which handles non-matching values appropriately. The suggestion adds minimal value.

Low
Suggestions up to commit 5250de3
CategorySuggestion                                                                                                                                    Impact
General
Use non-localized exception message

Using getLocalizedMessage() may return localized text that doesn't match the English
pattern. Use getMessage() instead to ensure consistent pattern matching across
different locales, preventing the type remapping logic from failing in non-English
environments.

opensearch/src/main/java/org/opensearch/sql/opensearch/response/error/ErrorMessage.java [47-50]

-String message = exception.getLocalizedMessage();
+String message = exception.getMessage();
 if (message != null && UNSUPPORTED_FORMAT_HINT.matcher(message).find()) {
   return "ExpressionEvaluationException";
 }
Suggestion importance[1-10]: 7

__

Why: This is a valid concern about locale-dependent behavior. Using getMessage() instead of getLocalizedMessage() ensures consistent pattern matching across different locales, which is important for the type remapping logic to work reliably. This addresses a potential bug in internationalized environments.

Medium
Handle nested list structures

The recursive processing only handles top-level lists. If nested lists contain
epoch-prefixed strings, they won't be processed. Consider implementing recursive
processing to handle nested list structures that may contain epoch-prefixed time
strings at any depth.

core/src/main/java/org/opensearch/sql/executor/analytics/AnalyticsExecutionEngine.java [245-247]

 if (value instanceof List<?> list) {
-  return ExprValueUtils.collectionValue(stripEpochDatePrefixInList(list));
+  return ExprValueUtils.collectionValue(stripEpochDatePrefixInList(list, type));
 }
Suggestion importance[1-10]: 5

__

Why: The suggestion correctly identifies that nested lists aren't handled, which could be a limitation. However, the improved_code doesn't actually implement the recursive processing mentioned in the suggestion_content, and adds a type parameter that isn't used or explained. The issue is valid but the solution is incomplete.

Low
Validate time component ranges

The regex pattern should be compiled with Pattern.DOTALL or use [\s\S] instead of
. to handle potential newlines in the fractional seconds part. Additionally,
consider making the pattern more restrictive by validating hour/minute/second ranges
(00-23, 00-59, 00-59) to avoid matching invalid time values.

core/src/main/java/org/opensearch/sql/executor/analytics/AnalyticsExecutionEngine.java [54-55]

 private static final Pattern EPOCH_DATE_TIME_PREFIX =
-    Pattern.compile("^1970-01-01[ T](\\d{2}:\\d{2}:\\d{2}(?:\\.\\d+)?)$");
+    Pattern.compile("^1970-01-01[ T]([01]\\d|2[0-3]):[0-5]\\d:[0-5]\\d(?:\\.\\d+)?$");
Suggestion importance[1-10]: 4

__

Why: The suggestion to validate hour/minute/second ranges is reasonable for robustness, but the current pattern already handles the expected input format from the analytics engine. The mention of Pattern.DOTALL is incorrect since . is not used in the pattern. The improvement is minor and may not be necessary given the controlled input source.

Low

@github-actions

github-actions Bot commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit a5eb156.

PathLineSeverityDescription
opensearch/src/main/java/org/opensearch/sql/opensearch/response/error/ErrorMessage.java44lowException type remapping based on message content: when an exception's message matches the UNSUPPORTED_FORMAT_HINT pattern, the reported exception class name is replaced with 'ExpressionEvaluationException' regardless of the actual exception type. While the stated purpose is backward compatibility, altering error type classification based on message content could confuse security monitoring or alerting systems that use exception type names to detect anomalous conditions. The pattern is specific and behavior appears intentional for legacy compatibility, but warrants review.

The table above displays the top 10 most important findings.

Total: 1 | Critical: 0 | High: 0 | Medium: 0 | Low: 1


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@github-actions

github-actions Bot commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit a5eb156

@github-actions

github-actions Bot commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 457fb21

@vinaykpud vinaykpud force-pushed the fix/ai/datetime-clusters branch from 457fb21 to a5eb156 Compare June 7, 2026 19:42
@github-actions

github-actions Bot commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit a5eb156

Recognize the new sandbox DateOnlyType / TimeOnlyType UDT markers in:
- OpenSearchTypeFactory.convertAnalyticsEngineRelDataTypeToExprType:
  DateOnlyType → ExprCoreType.DATE, TimeOnlyType → ExprCoreType.TIME so the
  user-visible response schema labels span() bucket columns as `date` /
  `time` instead of `timestamp`.
- AnalyticsExecutionEngine.toExprValue: when the column carries a
  DateOnlyType marker, strip the trailing ` HH:MM:SS` from the
  Timestamp(ms)-formatted wire value so dates render as `YYYY-MM-DD`.
  Symmetric handling for TimeOnlyType strips the `1970-01-01 ` prefix.

Pairs with the sandbox schema-builder change in
opensearch-project/OpenSearch@b69c5ff8888.
@vinaykpud vinaykpud force-pushed the fix/ai/datetime-clusters branch from a5eb156 to 2405204 Compare June 7, 2026 20:16
@github-actions

github-actions Bot commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 2405204

@vinaykpud vinaykpud closed this Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant