Skip to content

Fix #5176: Return actual null from JSON_EXTRACT for missing/null paths#5196

Merged
LantaoJin merged 4 commits into
opensearch-project:mainfrom
opensearchpplteam:fix/issue-5176
Mar 5, 2026
Merged

Fix #5176: Return actual null from JSON_EXTRACT for missing/null paths#5196
LantaoJin merged 4 commits into
opensearch-project:mainfrom
opensearchpplteam:fix/issue-5176

Conversation

@opensearchpplteam

Copy link
Copy Markdown
Contributor

Description

Root cause: JsonExtractFunctionImpl.doJsonize() explicitly returned the string "null" when the candidate object was null. When JSON_EXTRACT encounters a missing path or an explicit JSON null value, both Calcite's jsonQuery() and jsonValue() return Java null, and doJsonize(null) converted this to string "null" instead of propagating actual null.

Fix: Changed doJsonize() to return null (Java null) when candidate is null. The function's return type is already declared as STRING_FORCE_NULLABLE (nullable VARCHAR), so Calcite's expression framework correctly handles null returns. This is consistent with other JSON UDFs in the codebase (e.g., JsonExtractAllFunctionImpl).

Testing:

  • Unit tests: Added tests for single-path null (missing path, explicit JSON null), multi-path with missing paths, and multi-path all missing
  • Integration tests: Added end-to-end tests in CalcitePPLJsonBuiltinFunctionIT verifying null-return behavior through the REST API for both missing paths and explicit JSON null values, plus multi-path extraction with missing paths

Related Issues

#5176

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

… missing/null paths

The doJsonize() method in JsonExtractFunctionImpl was explicitly converting
null values to the string "null". When JSON_EXTRACT encounters a missing
path or an explicit JSON null value, it should return actual null instead
of the string "null".

Signed-off-by: opensearchpplteam <opensearchpplteam@gmail.com>
…arch-project#5176)

Address review feedback:
- Add integration test for null-return behavior (missing path and explicit null)
- Add integration test for multi-path extraction with missing path
- Add unit tests for multi-path extraction where some/all paths are missing

Signed-off-by: opensearchpplteam <opensearchpplteam@gmail.com>
@github-actions

github-actions Bot commented Feb 27, 2026

Copy link
Copy Markdown
Contributor

PR Reviewer Guide 🔍

(Review updated until commit abdc19d)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes

Sub-PR theme: Fix doJsonize null handling with unit tests

Relevant files:

  • core/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonExtractFunctionImpl.java
  • core/src/test/java/org/opensearch/sql/expression/function/jsonUDF/JsonExtractFunctionImplTest.java

Sub-PR theme: Integration and REST API tests for JSON_EXTRACT null fix

Relevant files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLJsonBuiltinFunctionIT.java
  • integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues/5176.yml

⚡ Recommended focus areas for review

Behavior Change

The change from returning string "null" to returning Java null may break existing users or downstream code that relied on the previous behavior of getting the string "null" for missing/null JSON paths. This is a semantic breaking change that should be carefully validated against all callers and any stored query results or comparisons that expected the string "null".

if (candidate == null) {
  return null;
Multi-path Null Serialization

The multi-path tests expect [\"John\",null] and [null,null] as string results. It should be verified that the JSON serialization of null values in the array result is consistent and intentional — specifically that null values in multi-path results are serialized as JSON null (not omitted or represented differently) across all JSON serialization paths.

public void testMultiPathWithMissingPath() {
  Object result = JsonExtractFunctionImpl.eval("{\"name\": \"John\"}", "name", "age");
  assertEquals("[\"John\",null]", result);
}

@Test
public void testMultiPathAllMissing() {
  Object result = JsonExtractFunctionImpl.eval("{}", "name", "age");
  assertEquals("[null,null]", result);
}

@github-actions

github-actions Bot commented Feb 27, 2026

Copy link
Copy Markdown
Contributor

PR Code Suggestions ✨

Latest suggestions up to abdc19d

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Verify null handling in array serialization path

Returning null from doJsonize when candidate is null may cause issues in callers
that build multi-path result arrays, since null elements in a list serialized to
JSON should appear as the string "null" within the array (e.g., ["John",null]). The
fix is correct for the single-value case, but the multi-path array construction path
must explicitly handle null returns from doJsonize to emit JSON null tokens rather
than the Java string "null". Verify that the array-building logic correctly
distinguishes between a Java null (missing/null value) and the string "null" when
serializing the result list.

core/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonExtractFunctionImpl.java [96-97]

 if (candidate == null) {
-    return null;
+    return null; // Callers building arrays must handle null explicitly to emit JSON null
Suggestion importance[1-10]: 3

__

Why: The suggestion asks to verify behavior rather than proposing a concrete fix. The improved_code is essentially identical to the existing_code (just adds a comment), and the test testMultiPathWithMissingPath already validates that ["John",null] is returned correctly, suggesting the array-building logic already handles null returns from doJsonize properly.

Low

Previous suggestions

Suggestions up to commit d3f1413
CategorySuggestion                                                                                                                                    Impact
General
Fragile numeric serialization assertion may fail

The test asserts that a numeric value 30 is returned as the string "30", but this
assumption may be fragile depending on how the JSON library serializes numbers
(e.g., it could return "30.0" for doubles). Consider verifying the actual
serialization behavior and using a more precise assertion or documenting the
expected type conversion explicitly.

core/src/test/java/org/opensearch/sql/expression/function/jsonUDF/JsonExtractFunctionImplTest.java [44-47]

 @Test
 public void testExtractNumericValue() {
   Object result = JsonExtractFunctionImpl.eval("{\"age\": 30}", "age");
-  assertEquals("30", result);
+  // JSON integers may be serialized as "30" or "30.0" depending on the parser
+  assertTrue(result != null && result.toString().startsWith("30"),
+      "Numeric value should serialize to a string starting with '30', got: " + result);
 }
Suggestion importance[1-10]: 3

__

Why: The concern about numeric serialization is valid in general, but the improved_code changes the assertion to a assertTrue with a comment rather than a clear fix, and the existing assertEquals("30", result) is a reasonable test for the specific JSON library behavior being used. The suggestion adds marginal value.

Low
Verify null serialization in multi-path array result

The test expects null to appear as the literal string null inside the JSON array
result ["John",null]. However, after the fix in doJsonize that returns Java null
instead of the string "null", the serialization of the array containing a null
element should be verified to ensure it still produces the correct JSON string
representation and not something like ["John"] or ["John","null"].

core/src/test/java/org/opensearch/sql/expression/function/jsonUDF/JsonExtractFunctionImplTest.java [68-71]

 @Test
 public void testMultiPathWithMissingPath() {
   Object result = JsonExtractFunctionImpl.eval("{\"name\": \"John\"}", "name", "age");
-  assertEquals("[\"John\",null]", result);
+  // Verify that missing path in multi-path returns JSON null (not string "null") in array
+  assertEquals("[\"John\",null]", result,
+      "Missing path in multi-path extraction should produce JSON null in array");
 }
Suggestion importance[1-10]: 1

__

Why: The improved_code is essentially identical to the existing_code with only a comment added, making this suggestion trivial. The test already correctly verifies the expected behavior after the fix.

Low
Suggestions up to commit b6e1150
CategorySuggestion                                                                                                                                    Impact
Possible issue
Verify null handling in multi-path array serialization

The multi-path tests expect null values to be serialized as the JSON string null
within the array (e.g., "["John",null]"). However, since doJsonize now returns
Java null for missing/null values, the array serialization logic must explicitly
handle null elements and render them as the JSON token null rather than omitting
them or throwing a NullPointerException. Verify that the array-building code in eval
correctly handles null entries from doJsonize.

core/src/test/java/org/opensearch/sql/expression/function/jsonUDF/JsonExtractFunctionImplTest.java [69-78]

 @Test
 public void testMultiPathWithMissingPath() {
   Object result = JsonExtractFunctionImpl.eval("{\"name\": \"John\"}", "name", "age");
+  // null entries in multi-path results should be serialized as JSON null token, not omitted
   assertEquals("[\"John\",null]", result);
 }
 
 @Test
 public void testMultiPathAllMissing() {
   Object result = JsonExtractFunctionImpl.eval("{}", "name", "age");
   assertEquals("[null,null]", result);
 }
Suggestion importance[1-10]: 3

__

Why: The suggestion raises a valid concern about whether the array-building code in eval correctly handles Java null returned by doJsonize and serializes it as the JSON token null. However, the improved_code is functionally identical to existing_code (only adds a comment), so it doesn't actually fix the potential issue.

Low
General
Clarify expected numeric return type in test

The test expects the numeric value 30 to be returned as the string "30", but this
assumption should be verified against the actual behavior of the implementation. If
the underlying JSON parser returns an Integer or Long object, calling toString() on
it would produce "30", but if the implementation changes, this test could give a
false positive. Consider adding a comment or asserting the type explicitly to
document the expected behavior.

core/src/test/java/org/opensearch/sql/expression/function/jsonUDF/JsonExtractFunctionImplTest.java [45-47]

 Object result = JsonExtractFunctionImpl.eval("{\"age\": 30}", "age");
-assertEquals("30", result);
+// Numeric values are returned as their string representation via toString()
+assertEquals("30", result, "Numeric JSON values should be returned as their string representation");
Suggestion importance[1-10]: 2

__

Why: The suggestion only adds a comment to the test, which is a minor documentation improvement. The existing_code and improved_code are functionally equivalent, and the suggestion score criteria penalizes adding comments.

Low

Signed-off-by: opensearchpplteam <opensearchpplteam@gmail.com>
@github-actions

github-actions Bot commented Mar 3, 2026

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit d3f1413

@penghuo penghuo added the PPL Piped processing language label Mar 3, 2026
…t#5176)

Signed-off-by: opensearchpplteam <opensearchpplteam@gmail.com>
@github-actions

github-actions Bot commented Mar 3, 2026

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit abdc19d

@penghuo penghuo added the bugFix label Mar 3, 2026
@LantaoJin LantaoJin merged commit ce68b0c into opensearch-project:main Mar 5, 2026
59 of 61 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bugFix PPL Piped processing language

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants