Skip to content

Added coercion rules and placeholder UDF to handle VARBINARY#5443

Merged
penghuo merged 4 commits into
opensearch-project:mainfrom
vinaykpud:feature/binary_field
May 15, 2026
Merged

Added coercion rules and placeholder UDF to handle VARBINARY#5443
penghuo merged 4 commits into
opensearch-project:mainfrom
vinaykpud:feature/binary_field

Conversation

@vinaykpud

@vinaykpud vinaykpud commented May 14, 2026

Copy link
Copy Markdown
Contributor

Description

Adds coercion support so PPL queries can compare a VARBINARY column with a VARCHAR literal.

eg: source=logs | where client_ip = '192.168.1.1'

For ip and binary fields types OpenSearchSchemaBuilder.java‎ maps to SqlTypeName.VARBINARY.

The change introduces a BINARY(varchar) placeholder UDF and threads VARBINARY through the type/coercion pipeline so that:

  1. A VARCHAR literal compared to a VARBINARY column is recognized as coercible to BINARY.
  2. CoercionUtils.cast triggers ExtendedRexBuilder.makeCast, which emits BINARY(varchar) instead of a generic Calcite CAST. the on-disk encoding of ip / binary fields.
  3. The placeholder BINARY(varchar) RexCall is intended to be rewritten by a backend adapter (OpenSearch repo, separate PR) into a VARBINARY RexLiteral whose bytes match the on-disk format.

@github-actions

github-actions Bot commented May 14, 2026

Copy link
Copy Markdown
Contributor

PR Reviewer Guide 🔍

(Review updated until commit 0c4b263)

Here are some key observations to aid the review process:

🧪 No relevant tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Possible Issue

The leastRestrictiveVarbinaryVarchar method returns null when no VARBINARY is present, but does not check if the list is empty. If called with an empty list, the loop never sets hasVarbinary, and the method returns null at line 442 without verifying that types is non-empty. This could cause unexpected null returns when the list is empty, though the caller (leastRestrictive) may handle it.

private @Nullable RelDataType leastRestrictiveVarbinaryVarchar(List<RelDataType> types) {
  boolean hasVarbinary = false;
  boolean anyNullable = false;
  for (RelDataType t : types) {
    SqlTypeName name = t.getSqlTypeName();
    if (name == SqlTypeName.VARBINARY) {
      hasVarbinary = true;
    } else if (name != SqlTypeName.VARCHAR && name != SqlTypeName.CHAR) {
      return null;
    }
    anyNullable |= t.isNullable();
  }
  if (!hasVarbinary) {
    return null;
  }
  return createTypeWithNullability(createSqlType(SqlTypeName.VARBINARY), anyNullable);
}
Possible Issue

The parseCidrToIpv6Range method throws SemanticCheckException if cidr is null, but does not handle the case where IPUtils.toRange(cidr) returns null or throws an exception for invalid CIDR strings. If toRange fails or returns null, calling .getLower() or .getUpper() on line 1633-1634 will throw NullPointerException. The method should validate the range is non-null before dereferencing it.

private static byte[][] parseCidrToIpv6Range(String cidr) {
  if (cidr == null) {
    throw new SemanticCheckException("cidrmatch range argument is null");
  }
  IPAddress range = IPUtils.toRange(cidr);
  byte[] low = range.getLower().toIPv6().getBytes();
  byte[] high = range.getUpper().toIPv6().getBytes();
  return new byte[][] {low, high};
}

@github-actions

github-actions Bot commented May 14, 2026

Copy link
Copy Markdown
Contributor

PR Code Suggestions ✨

Latest suggestions up to 0c4b263

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Handle invalid CIDR format exceptions

Add exception handling for invalid CIDR strings. The IPUtils.toRange(cidr) call may
throw exceptions for malformed CIDR input, which should be caught and wrapped in a
SemanticCheckException with a descriptive message to prevent unexpected runtime
failures during query planning.

core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java [1628-1636]

 private static byte[][] parseCidrToIpv6Range(String cidr) {
   if (cidr == null) {
     throw new SemanticCheckException("cidrmatch range argument is null");
   }
-  IPAddress range = IPUtils.toRange(cidr);
-  byte[] low = range.getLower().toIPv6().getBytes();
-  byte[] high = range.getUpper().toIPv6().getBytes();
-  return new byte[][] {low, high};
+  try {
+    IPAddress range = IPUtils.toRange(cidr);
+    byte[] low = range.getLower().toIPv6().getBytes();
+    byte[] high = range.getUpper().toIPv6().getBytes();
+    return new byte[][] {low, high};
+  } catch (Exception e) {
+    throw new SemanticCheckException("Invalid CIDR format: " + cidr, e);
+  }
 }
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies that IPUtils.toRange(cidr) may throw exceptions for malformed CIDR input. Adding exception handling improves robustness by catching parsing errors and wrapping them in a SemanticCheckException with a descriptive message. However, this is a defensive improvement rather than fixing a critical bug.

Medium
Validate CIDR literal is non-null

Validate that lit.getValueAs(String.class) returns a non-null value before passing
it to parseCidrToIpv6Range. If the literal value is null, the method will throw an
exception, but it's safer to check explicitly and provide a clearer error message at
the call site.

core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java [919-936]

 (FunctionImp2)
     (builder, col, cidr) -> {
       if (cidr instanceof RexLiteral lit
           && col.getType().getSqlTypeName() == SqlTypeName.VARBINARY) {
-        byte[][] range = parseCidrToIpv6Range(lit.getValueAs(String.class));
+        String cidrStr = lit.getValueAs(String.class);
+        if (cidrStr == null) {
+          throw new SemanticCheckException("cidrmatch literal cannot be null");
+        }
+        byte[][] range = parseCidrToIpv6Range(cidrStr);
         ...
       }
       return builder.makeCall(PPLBuiltinOperators.CIDRMATCH, col, cidr);
     }
Suggestion importance[1-10]: 6

__

Why: The suggestion adds an explicit null check for lit.getValueAs(String.class) before passing it to parseCidrToIpv6Range. While parseCidrToIpv6Range already handles null input, checking at the call site provides a clearer error context. This is a minor defensive improvement that enhances code clarity.

Low

Previous suggestions

Suggestions up to commit d3157be
CategorySuggestion                                                                                                                                    Impact
General
Handle CIDR parsing exceptions gracefully

Wrap the parseCidrToIpv6Range call in a try-catch block to handle potential parsing
exceptions gracefully. Invalid CIDR strings could cause runtime exceptions that
should be caught and converted to semantic errors at plan time.

core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java [922-936]

 if (cidr instanceof RexLiteral lit
     && col.getType().getSqlTypeName() == SqlTypeName.VARBINARY) {
-  byte[][] range = parseCidrToIpv6Range(lit.getValueAs(String.class));
-  ...
+  try {
+    byte[][] range = parseCidrToIpv6Range(lit.getValueAs(String.class));
+    ...
+  } catch (Exception e) {
+    throw new SemanticCheckException("Invalid CIDR literal: " + lit.getValueAs(String.class), e);
+  }
 }
Suggestion importance[1-10]: 7

__

Why: Adding exception handling around parseCidrToIpv6Range improves robustness by catching potential parsing errors and converting them to semantic errors at plan time. This prevents unexpected runtime failures and provides clearer error messages to users.

Medium
Possible issue
Add null-safety checks for IP range

Add null-safety checks for the IPAddress objects returned by toRange() and their
getLower()/getUpper() methods. If IPUtils.toRange(cidr) or the subsequent method
calls return null, the code will throw a NullPointerException instead of a
meaningful error.

core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java [1628-1636]

 private static byte[][] parseCidrToIpv6Range(String cidr) {
   if (cidr == null) {
     throw new SemanticCheckException("cidrmatch range argument is null");
   }
   IPAddress range = IPUtils.toRange(cidr);
+  if (range == null || range.getLower() == null || range.getUpper() == null) {
+    throw new SemanticCheckException("Invalid CIDR range: " + cidr);
+  }
   byte[] low = range.getLower().toIPv6().getBytes();
   byte[] high = range.getUpper().toIPv6().getBytes();
   return new byte[][] {low, high};
 }
Suggestion importance[1-10]: 5

__

Why: The suggestion adds defensive null checks for IPAddress objects. However, if IPUtils.toRange() is designed to never return null and always throw exceptions for invalid input, these checks may be unnecessary. The score reflects uncertainty about the actual behavior of the utility method.

Low
Suggestions up to commit 43b460e
CategorySuggestion                                                                                                                                    Impact
Possible issue
Add null-safety checks for IP conversion

Add null-safety checks for the IPAddress objects returned by toRange() and the
subsequent getLower()/getUpper() calls. If IPUtils.toRange() or the conversion
methods return null, the code will throw a NullPointerException instead of a
meaningful error.

core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java [1609-1617]

 private static byte[][] parseCidrToIpv6Range(String cidr) {
   if (cidr == null) {
     throw new SemanticCheckException("cidrmatch range argument is null");
   }
   IPAddress range = IPUtils.toRange(cidr);
+  if (range == null || range.getLower() == null || range.getUpper() == null) {
+    throw new SemanticCheckException("Invalid CIDR notation: " + cidr);
+  }
   byte[] low = range.getLower().toIPv6().getBytes();
   byte[] high = range.getUpper().toIPv6().getBytes();
   return new byte[][] {low, high};
 }
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies a potential NullPointerException if IPUtils.toRange() or the conversion methods return null. Adding null checks improves robustness, though the likelihood depends on the IPUtils implementation. This is a valid defensive programming practice for error handling.

Medium
Add null check for SqlTypeName

Add a null check for t.getSqlTypeName() to prevent potential NullPointerException
when iterating over the types list. If a RelDataType in the list has a null
SqlTypeName, the code will fail unexpectedly.

core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactory.java [427-443]

 private @Nullable RelDataType leastRestrictiveVarbinaryVarchar(List<RelDataType> types) {
   boolean hasVarbinary = false;
   boolean anyNullable = false;
   for (RelDataType t : types) {
     SqlTypeName name = t.getSqlTypeName();
+    if (name == null) {
+      return null;
+    }
     if (name == SqlTypeName.VARBINARY) {
       hasVarbinary = true;
     } else if (name != SqlTypeName.VARCHAR && name != SqlTypeName.CHAR) {
       return null;
     }
     anyNullable |= t.isNullable();
   }
   if (!hasVarbinary) {
     return null;
   }
   return createTypeWithNullability(createSqlType(SqlTypeName.VARBINARY), anyNullable);
 }
Suggestion importance[1-10]: 6

__

Why: The suggestion adds a null check for t.getSqlTypeName() to prevent potential NullPointerException. While this is defensive programming, RelDataType.getSqlTypeName() typically should not return null in well-formed Calcite types. The check adds safety but may be unnecessary depending on the framework guarantees.

Low
Suggestions up to commit 43b460e
CategorySuggestion                                                                                                                                    Impact
Possible issue
Add null-safety checks for IP range

Add null-safety checks for the IPAddress objects returned by toRange() and their
getLower()/getUpper() methods. If IPUtils.toRange() or the subsequent calls return
null, the code will throw a NullPointerException instead of a meaningful error.

core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java [1609-1617]

 private static byte[][] parseCidrToIpv6Range(String cidr) {
   if (cidr == null) {
     throw new SemanticCheckException("cidrmatch range argument is null");
   }
   IPAddress range = IPUtils.toRange(cidr);
+  if (range == null || range.getLower() == null || range.getUpper() == null) {
+    throw new SemanticCheckException("Invalid CIDR range: " + cidr);
+  }
   byte[] low = range.getLower().toIPv6().getBytes();
   byte[] high = range.getUpper().toIPv6().getBytes();
   return new byte[][] {low, high};
 }
Suggestion importance[1-10]: 6

__

Why: The suggestion correctly identifies that IPUtils.toRange() and its subsequent method calls could potentially return null, which would cause a NullPointerException. However, this depends on the implementation of IPUtils.toRange() which is not visible in the PR. The check adds defensive programming but may be redundant if toRange() already throws exceptions for invalid input.

Low
Handle null SqlTypeName safely

Add a null check for t.getSqlTypeName() to prevent potential NullPointerException
when processing the type list. If any RelDataType in the list has a null
SqlTypeName, the method should handle it gracefully.

core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactory.java [427-443]

 private @Nullable RelDataType leastRestrictiveVarbinaryVarchar(List<RelDataType> types) {
   boolean hasVarbinary = false;
   boolean anyNullable = false;
   for (RelDataType t : types) {
     SqlTypeName name = t.getSqlTypeName();
+    if (name == null) {
+      return null;
+    }
     if (name == SqlTypeName.VARBINARY) {
       hasVarbinary = true;
     } else if (name != SqlTypeName.VARCHAR && name != SqlTypeName.CHAR) {
       return null;
     }
     anyNullable |= t.isNullable();
   }
   if (!hasVarbinary) {
     return null;
   }
   return createTypeWithNullability(createSqlType(SqlTypeName.VARBINARY), anyNullable);
 }
Suggestion importance[1-10]: 5

__

Why: The suggestion adds a null check for getSqlTypeName(), which is a defensive programming practice. However, in the Calcite framework, RelDataType.getSqlTypeName() typically does not return null for standard types, making this check potentially unnecessary. The suggestion is valid but addresses a low-probability edge case.

Low
Suggestions up to commit da004e3
CategorySuggestion                                                                                                                                    Impact
Possible issue
Handle CIDR parsing exceptions gracefully

Wrap the parseCidrToIpv6Range() call in a try-catch block to handle potential
parsing exceptions gracefully. Invalid CIDR strings could cause runtime exceptions
that should be caught and converted to semantic errors during query planning.

core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java [921-942]

 register(
     CIDRMATCH,
     (FunctionImp2)
         (builder, col, cidr) -> {
           if (cidr instanceof RexLiteral lit
               && col.getType().getSqlTypeName() == SqlTypeName.VARBINARY) {
-            byte[][] range = parseCidrToIpv6Range(lit.getValueAs(String.class));
-            ...
+            try {
+              byte[][] range = parseCidrToIpv6Range(lit.getValueAs(String.class));
+              ...
+            } catch (Exception e) {
+              throw new SemanticCheckException("Invalid CIDR notation in cidrmatch: " + e.getMessage(), e);
+            }
           }
           return builder.makeCall(PPLBuiltinOperators.CIDRMATCH, col, cidr);
         },
     PPLTypeChecker.family(SqlTypeFamily.BINARY, SqlTypeFamily.STRING));
Suggestion importance[1-10]: 8

__

Why: This suggestion addresses a critical issue where invalid CIDR strings could cause uncaught exceptions during query planning. Wrapping parseCidrToIpv6Range() in a try-catch block and converting exceptions to SemanticCheckException provides better error handling and user feedback.

Medium
Add null-safety checks for IP range

Add null-safety checks for the IPAddress objects returned by toRange() and the
subsequent getLower()/getUpper() calls. If IPUtils.toRange() returns null or the
bounds are null, the code will throw a NullPointerException instead of a meaningful
error.

core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java [1614-1622]

 private static byte[][] parseCidrToIpv6Range(String cidr) {
   if (cidr == null) {
     throw new SemanticCheckException("cidrmatch range argument is null");
   }
   IPAddress range = IPUtils.toRange(cidr);
+  if (range == null || range.getLower() == null || range.getUpper() == null) {
+    throw new SemanticCheckException("Invalid CIDR notation: " + cidr);
+  }
   byte[] low = range.getLower().toIPv6().getBytes();
   byte[] high = range.getUpper().toIPv6().getBytes();
   return new byte[][] {low, high};
 }
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies a potential NullPointerException if IPUtils.toRange() or its methods return null. Adding null checks improves robustness, though the likelihood depends on IPUtils.toRange() implementation details not visible in the PR.

Medium
Suggestions up to commit e135281
CategorySuggestion                                                                                                                                    Impact
Possible issue
Validate operand list before access

The PassThroughImplementor directly returns the first operand without validation. If
translatedOperands is empty, this will throw an IndexOutOfBoundsException. Add a
check to ensure the list is not empty before accessing the first element.

core/src/main/java/org/opensearch/sql/expression/function/udf/conversion/BinaryFunction.java [48-54]

 public static class PassThroughImplementor implements NotNullImplementor {
   @Override
   public Expression implement(
       RexToLixTranslator translator, RexCall call, List<Expression> translatedOperands) {
+    if (translatedOperands.isEmpty()) {
+      throw new IllegalArgumentException("Expected at least one operand for BINARY function");
+    }
     return translatedOperands.get(0);
   }
 }
Suggestion importance[1-10]: 3

__

Why: While adding validation is generally good practice, the NullPolicy.STRICT in the constructor and OperandTypes.CHARACTER checker already ensure exactly one operand is present before this implementor is called. The suggestion addresses a theoretical issue that cannot occur in practice given the existing constraints.

Low

@github-actions

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit e135281

@penghuo penghuo added the PPL Piped processing language label May 14, 2026

@penghuo penghuo left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good. Please failed Fix IT/UT.

@github-actions

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit da004e3

@dai-chen dai-chen added the enhancement New feature or request label May 15, 2026

@dai-chen dai-chen left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CI is failing due to spotless check. Please fix it. Thanks!

@github-actions

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 43b460e

ahkcs
ahkcs previously approved these changes May 15, 2026
@vinaykpud vinaykpud closed this May 15, 2026
@vinaykpud vinaykpud reopened this May 15, 2026
@github-actions

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 43b460e

vinaykpud added 3 commits May 15, 2026 17:33
Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>
Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>
Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>
@vinaykpud vinaykpud force-pushed the feature/binary_field branch from 43b460e to d3157be Compare May 15, 2026 17:34
@github-actions

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit d3157be

Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>
@github-actions

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 0c4b263

@penghuo penghuo merged commit f78df64 into opensearch-project:main May 15, 2026
39 of 41 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request PPL Piped processing language

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants