Skip to content

[BugFix] Fix transpose command name collision with 'value' field (#5172)#5352

Merged
qianheng-aws merged 2 commits into
opensearch-project:mainfrom
songkant-aws:worktree-agent-aa5e49ba
Apr 16, 2026
Merged

[BugFix] Fix transpose command name collision with 'value' field (#5172)#5352
qianheng-aws merged 2 commits into
opensearch-project:mainfrom
songkant-aws:worktree-agent-aa5e49ba

Conversation

@songkant-aws

Copy link
Copy Markdown
Collaborator

Description

The transpose command's internal unpivot column was hardcoded as value, which collides with input fields named value. When a field named value exists in the input data, all transposed rows incorrectly show duplicated values from that field instead of each column's actual values.

Root cause: In CalciteRelNodeVisitor.visitTranspose(), the unpivot step uses ImmutableList.of("value") as the value column alias, and the subsequent pivot step references b.field("value"). When the input has a value field, Calcite resolves the ambiguous reference to the wrong column.

Fix: Replaced the hardcoded "value" with a unique internal name _value_transpose_ (constant PlanUtils.VALUE_COLUMN_FOR_TRANSPOSE), following the same naming convention as _row_number_transpose_.

Related Issues

Resolves #5172

Check List

  • New functionality includes testing
  • Commits signed per DCO (-s)
  • spotlessCheck passed
  • Unit tests passed
  • Integration tests passed

@songkant-aws

Copy link
Copy Markdown
Collaborator Author

Decision Log

Root Cause: The visitTranspose() method in CalciteRelNodeVisitor hardcodes "value" as the unpivot column name (line 926: ImmutableList.of("value")). When input data contains a field named value, Calcite's field resolution picks the wrong column during the pivot step (b.field("value") on line 948), causing all transposed rows to show the same duplicated data from the input's value field.

Approach: Introduced a unique internal column name _value_transpose_ (as constant PlanUtils.VALUE_COLUMN_FOR_TRANSPOSE) following the existing naming convention used for _row_number_transpose_. This avoids collision with any user-defined field name since it uses leading/trailing underscores.

Alternatives Rejected:

  • Dynamic name generation (e.g., appending UUID) — overly complex and would break plan determinism for explain/testing.
  • Checking if value exists in input fields and only renaming then — more fragile and still leaves potential collision if the fallback name also clashes.

Pitfalls: None encountered. The fix is mechanical — replace string literal with constant in two places.

Things to Watch: The same _value_transpose_ naming convention means a user field named _value_transpose_ would still collide. This is astronomically unlikely given the naming pattern, consistent with how _row_number_transpose_ is handled.

@github-actions

github-actions Bot commented Apr 15, 2026

Copy link
Copy Markdown
Contributor

PR Reviewer Guide 🔍

(Review updated until commit 371ef50)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes

Sub-PR theme: Core fix - replace hardcoded 'value' with VALUE_COLUMN_FOR_TRANSPOSE constant

Relevant files:

  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
  • core/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.java
  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLTransposeTest.java
  • integ-test/src/test/resources/expectedOutput/calcite/explain_transpose.yaml

Sub-PR theme: Integration and REST tests for transpose value field name collision

Relevant files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteTransposeCommandIT.java
  • integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues/5172.yml

⚡ Recommended focus areas for review

Fragile Test Assertion

The integration test testTransposeWithValueFieldNameCollision hardcodes the expected count as "1000" (total accounts in TEST_INDEX_ACCOUNT). If the test index data changes, this test will break. Consider using a more flexible assertion or a dedicated test index with known data.

  assertEquals("1000", row.getString(1));
} else if ("avg_age".equals(colName)) {
  foundAvgAge = true;
  // avg_age should not equal the count value
  assertNotEquals("1000", row.getString(1));
}
Collision Risk

The new constant VALUE_COLUMN_FOR_TRANSPOSE is set to _value_transpose_. While this reduces collision likelihood, it is still possible for a user to have a field literally named _value_transpose_. Consider documenting this limitation or adding a validation/check that rejects input fields with this reserved name.

String VALUE_COLUMN_FOR_TRANSPOSE = "_value_transpose_";

@github-actions

github-actions Bot commented Apr 15, 2026

Copy link
Copy Markdown
Contributor

PR Code Suggestions ✨

Latest suggestions up to 371ef50
Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix JUnit 5 assertTrue argument order

The assertTrue(String, boolean) overload from JUnit 4 is being used, but the imports
use org.junit.jupiter.api.Assertions.assertTrue (JUnit 5), where the argument order
is reversed: assertTrue(boolean, String). This will cause a compilation error or
incorrect behavior at runtime.

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteTransposeCommandIT.java [187-188]

-assertTrue("Should have 'value' row in transposed result", foundValue);
-assertTrue("Should have 'avg_age' row in transposed result", foundAvgAge);
+assertTrue(foundValue, "Should have 'value' row in transposed result");
+assertTrue(foundAvgAge, "Should have 'avg_age' row in transposed result");
Suggestion importance[1-10]: 8

__

Why: The code uses org.junit.jupiter.api.Assertions.assertTrue (JUnit 5) but passes arguments in JUnit 4 order (String, boolean). In JUnit 5, the correct order is assertTrue(boolean, String), so this is a real bug that would cause compilation errors or incorrect behavior.

Medium
Fix incorrect schema assertion for single-row transpose

The stats count() as value, avg(age) as avg_age query produces a single row of
aggregated results, so transposing it should yield only 2 rows (one per field) with
a single data column (row 1). Expecting 5 row columns (row 1 through row 5) in the
schema is incorrect for this input and may cause the test to fail or produce
misleading assertions.

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteTransposeCommandIT.java [159-166]

 verifySchema(
     result,
     schema("column", "string"),
-    schema("row 1", "string"),
-    schema("row 2", "string"),
-    schema("row 3", "string"),
-    schema("row 4", "string"),
-    schema("row 5", "string"));
+    schema("row 1", "string"));
Suggestion importance[1-10]: 7

__

Why: The stats count() as value, avg(age) as avg_age produces a single aggregated row, so transposing it yields only 1 data column (row 1). Asserting 5 row columns in the schema is likely incorrect and could cause test failures or misleading assertions.

Medium
General
Use more collision-resistant internal column name

The new internal column name value_transpose could still collide if a user has a
field literally named value_transpose. Consider using a more unique sentinel name
(e.g., with double underscores or a UUID-like suffix) consistent with the pattern
used by other internal columns like row_number_transpose, or add a
collision-detection guard in the visitor before using this name.

core/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.java [87]

-String VALUE_COLUMN_FOR_TRANSPOSE = "_value_transpose_";
+String VALUE_COLUMN_FOR_TRANSPOSE = "__value_transpose__";
Suggestion importance[1-10]: 3

__

Why: While the suggestion is valid in principle, the PR already improves the situation by replacing the hardcoded "value" with a constant _value_transpose_. The marginal benefit of adding more underscores is low, and the existing naming pattern in the codebase uses single underscores consistently.

Low

Previous suggestions

Suggestions up to commit e692ea7
CategorySuggestion                                                                                                                                    Impact
General
Use more collision-resistant internal column name

The internal column name value_transpose could still collide if a user has a field
named value_transpose. Consider using a more unique sentinel name (e.g., with
double underscores or a UUID-like suffix) to further reduce collision risk, similar
to how other internal columns use double underscores (e.g., stream_seq).

core/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.java [87]

-String VALUE_COLUMN_FOR_TRANSPOSE = "_value_transpose_";
+String VALUE_COLUMN_FOR_TRANSPOSE = "__value_transpose__";
Suggestion importance[1-10]: 4

__

Why: The suggestion is valid - using double underscores like __value_transpose__ would be more consistent with __stream_seq__ and reduce collision risk. However, this is a minor naming improvement and the current name _value_transpose_ already significantly reduces collision probability compared to the original value.

Low
Schema assertion may be environment-dependent

The stats count() as value, avg(age) as avg_age query on TEST_INDEX_ACCOUNT produces
a single aggregated row, so the transpose result should have at most 5 row columns
but only 1 non-null value per row. The schema verification expects 6 columns (row 1
through row 5), but the number of row columns depends on the configured default
limit. If the default limit differs from 5 in some environments, this assertion will
fail. Consider making the schema assertion more flexible or documenting the assumed
default.

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteTransposeCommandIT.java [157-164]

 verifySchema(
     result,
     schema("column", "string"),
-    schema("row 1", "string"),
-    schema("row 2", "string"),
-    schema("row 3", "string"),
-    schema("row 4", "string"),
-    schema("row 5", "string"));
+    schema("row 1", "string"));
Suggestion importance[1-10]: 3

__

Why: The concern about environment-dependent default limits is valid, but the improved_code only keeps row 1 which would be incorrect if the default is indeed 5. The suggestion oversimplifies the fix and the existing test likely matches the configured default of 5 rows used throughout the test suite.

Low
Strengthen avg_age assertion in regression test

The assertion assertNotEquals("1000", row.getString(1)) only verifies that avg_age
is not 1000, but does not confirm it is a valid average age value. If the average
age happened to be 1000 for some reason, or if the column is null/empty due to a
bug, this check would pass incorrectly. Consider asserting a more meaningful range
or exact expected value for avg_age.

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteTransposeCommandIT.java [179-183]

 } else if ("avg_age".equals(colName)) {
     foundAvgAge = true;
-    // avg_age should not equal the count value
-    assertNotEquals("1000", row.getString(1));
+    // avg_age for TEST_INDEX_ACCOUNT should be a reasonable age, not the count
+    String avgAgeStr = row.getString(1);
+    assertNotEquals("1000", avgAgeStr);
+    assertNotNull(avgAgeStr);
+    double avgAge = Double.parseDouble(avgAgeStr);
+    assertTrue("avg_age should be a plausible age value", avgAge > 0 && avgAge < 150);
   }
Suggestion importance[1-10]: 3

__

Why: Adding a range check for avg_age would make the regression test more robust, but the primary goal of this test is to verify the field name collision fix (not 1000), so the current assertion is sufficient for its purpose.

Low

Signed-off-by: Songkan Tang <songkant@amazon.com>
@github-actions

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 371ef50

@qianheng-aws qianheng-aws merged commit 93a646f into opensearch-project:main Apr 16, 2026
38 checks passed
@songkant-aws songkant-aws deleted the worktree-agent-aa5e49ba branch May 15, 2026 06:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Transpose command name collision: hardcoded 'value' unpivot column collides with input field named 'value'

3 participants