Skip to content

Commit da1410b

Browse files
authored
[BugFix] Fix eval overwrites MAP root field when assigning multiple dotted-path names (opensearch-project#5185) (opensearch-project#5351)
* [BugFix] Fix eval overwrites MAP root field when assigning multiple dotted-path names (opensearch-project#5185) Signed-off-by: Songkan Tang <songkant@amazon.com> * Remove prefix-override branch in shouldOverrideField; scope struct-parent pruning to bin PR opensearch-project#4606 introduced a `newName.startsWith(originalName + ".")` branch in `shouldOverrideField` to make `bin` on a nested field work without an explicit `fields` projection (issue opensearch-project#4482). That heuristic silently removed any column that happened to be a prefix of an eval target, which broke two real cases: * Issue opensearch-project#5185 — assigning multiple dotted-path fields from a `spath` MAP root dropped the root column, so subsequent evals failed with "Field not found". * Reviewer's case on PR opensearch-project#5351 — `source=big5 | fields agent | eval agent.name = "test"` dropped the `agent` column entirely, violating SPL1 semantics where a dotted assignment creates a literal column without touching any sibling or parent field. Rather than layer another heuristic (`hasSubFields`) on top of the prefix branch, this change removes the prefix branch entirely and returns `shouldOverrideField` to an exact-name match. The command-specific behaviour that actually needed parent-pruning — `bin` on a nested leaf inside a struct that OpenSearch exposes as both parent and flattened leaves — is now handled locally in `visitBin` via `dropStructParentsFor`, which only fires when the prefixes really exist as struct-parent columns in the current row schema. Tests: * Kept the two spath regression tests from PR opensearch-project#5351 (`CalcitePPLSpathTest`, `CalcitePPLSpathCommandIT`, `issues/5185.yml`). * Added `CalciteEvalCommandIT.testEvalDottedNameDoesNotDropStructParent` and `testEvalDottedNamePreservesStructParent_ImplicitProject` to cover the reviewer's `agent` case. * `CalciteBinCommandIT.testBinWithNestedFieldWithoutExplicitProjection` and `testBinWithEvalCreatedDottedFieldName` still pass via the new `dropStructParentsFor` path instead of the global prefix branch. Resolves opensearch-project#5185 Signed-off-by: Songkan Tang <songkant@amazon.com> * Drop struct parents in projectPlusOverriding when overriding a nested leaf The previous commit scoped dropStructParentsFor to visitBin only. But projectPlusOverriding is also called from eval, rex/sed, trendline, expand, flatten, and patterns. An empirical scratch test showed that `eval resource.attributes.telemetry.sdk.version = "OVERRIDE"` on the telemetry index silently lost the override at the implicit `fields *` projection because tryToRemoveNestedFields deleted the overridden flat leaf (its parent `resource.attributes.telemetry.sdk` was still in the row schema). Move dropStructParentsFor inside projectPlusOverriding, gated on "the override actually matched an existing column name" (i.e. the pre-override row schema already had that exact dotted name). This covers every command that funnels through projectPlusOverriding without reintroducing the opensearch-project#5185 / reviewer regressions — both of those paths introduce brand-new dotted names that were not already in the row schema, so overriddenNames stays empty and the struct parents (e.g. `agent`, `doc`) survive untouched. Also add regression coverage in CalciteEvalCommandIT: - testEvalOverrideOfFlattenedNestedLeafSurvivesImplicitProject - testEvalOverrideOfFlattenedNestedLeafWithExplicitProject Signed-off-by: Songkan Tang <songkant@amazon.com> --------- Signed-off-by: Songkan Tang <songkant@amazon.com>
1 parent 8543e63 commit da1410b

5 files changed

Lines changed: 317 additions & 13 deletions

File tree

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java

Lines changed: 95 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -951,10 +951,60 @@ public RelNode visitBin(Bin node, CalcitePlanContext context) {
951951

952952
String alias = node.getAlias() != null ? node.getAlias() : fieldName;
953953
projectPlusOverriding(List.of(binExpression), List.of(alias), context);
954+
dropStructParentsFor(alias, context);
954955

955956
return context.relBuilder.peek();
956957
}
957958

959+
/**
960+
* If {@code dottedName} addresses a nested leaf inside a struct that OpenSearch has exposed
961+
* through both its struct-parent columns and its flattened leaf columns (e.g. the telemetry
962+
* mapping exposes {@code resource}, {@code resource.attributes}, ..., {@code
963+
* resource.attributes.telemetry.sdk.version} side-by-side), drop the struct-parent prefixes from
964+
* the current row. This keeps a subsequent {@link #tryToRemoveNestedFields(CalcitePlanContext)}
965+
* pass from collapsing the flattened leaves back into the parents when the final implicit {@code
966+
* fields *} projection runs.
967+
*
968+
* <p>This preserves the behaviour that issue #4482 originally required for {@code bin} on a
969+
* nested field without an explicit {@code fields} projection. It is invoked from two places:
970+
*
971+
* <ul>
972+
* <li>{@link #projectPlusOverriding(List, List, CalcitePlanContext)} — for every override whose
973+
* new name exactly matched a pre-existing column. This catches {@code eval} (and every
974+
* other command that funnels through {@code projectPlusOverriding}) assigning to an
975+
* existing flattened nested leaf.
976+
* <li>{@link #visitBin(Bin, CalcitePlanContext)} — defensively, so that {@code bin} keeps
977+
* dropping struct parents even when the alias happens not to match an existing field name
978+
* (e.g. when the user supplied a custom alias). This is also what the regression test in
979+
* {@code CalciteBinCommandIT#testBinWithNestedFieldWithoutExplicitProjection} exercises.
980+
* </ul>
981+
*
982+
* Using this narrowly-scoped pruning instead of a global prefix-override in {@link
983+
* #shouldOverrideField} is what keeps issue #5185 and the reviewer's {@code eval agent.name =
984+
* ...} case safe.
985+
*
986+
* <p>No-op when no such struct-parent columns exist (e.g. flat columns or MAP roots from {@code
987+
* spath}).
988+
*/
989+
private void dropStructParentsFor(String dottedName, CalcitePlanContext context) {
990+
if (dottedName == null || dottedName.indexOf('.') < 0) {
991+
return;
992+
}
993+
List<String> fieldNames = context.relBuilder.peek().getRowType().getFieldNames();
994+
List<RexNode> parentsToDrop = new ArrayList<>();
995+
int dotIdx = dottedName.indexOf('.');
996+
while (dotIdx >= 0) {
997+
String prefix = dottedName.substring(0, dotIdx);
998+
if (fieldNames.contains(prefix)) {
999+
parentsToDrop.add(context.relBuilder.field(prefix));
1000+
}
1001+
dotIdx = dottedName.indexOf('.', dotIdx + 1);
1002+
}
1003+
if (!parentsToDrop.isEmpty()) {
1004+
context.relBuilder.projectExcept(parentsToDrop);
1005+
}
1006+
}
1007+
9581008
@Override
9591009
public RelNode visitParse(Parse node, CalcitePlanContext context) {
9601010
visitChildren(node, context);
@@ -1270,12 +1320,12 @@ private RelNode buildConversionProjection(ConversionState state, CalcitePlanCont
12701320

12711321
private void projectPlusOverriding(
12721322
List<RexNode> newFields, List<String> newNames, CalcitePlanContext context) {
1273-
List<String> originalFieldNames = context.relBuilder.peek().getRowType().getFieldNames();
1323+
Set<String> originalFieldNameSet =
1324+
new HashSet<>(context.relBuilder.peek().getRowType().getFieldNames());
1325+
List<String> overriddenNames =
1326+
newNames.stream().filter(originalFieldNameSet::contains).toList();
12741327
List<RexNode> toOverrideList =
1275-
originalFieldNames.stream()
1276-
.filter(originalName -> shouldOverrideField(originalName, newNames))
1277-
.map(a -> (RexNode) context.relBuilder.field(a))
1278-
.toList();
1328+
overriddenNames.stream().map(a -> (RexNode) context.relBuilder.field(a)).toList();
12791329
// 1. add the new fields, For example "age0, country0"
12801330
context.relBuilder.projectPlus(newFields);
12811331
// 2. drop the overriding field list, it's duplicated now. For example "age, country"
@@ -1291,17 +1341,49 @@ private void projectPlusOverriding(
12911341
expectedRenameFields.addAll(newNames);
12921342
// 5. rename
12931343
context.relBuilder.rename(expectedRenameFields);
1344+
// 6. For each overridden dotted-path name that matched an existing flattened nested leaf,
1345+
// prune the struct-parent columns that OpenSearch exposed side-by-side with that leaf. Without
1346+
// this, a downstream implicit `fields *` invokes `tryToRemoveNestedFields`, which would drop
1347+
// the freshly-assigned dotted leaf back out again because its struct-parent prefix is still in
1348+
// the row schema (see issue #4482 and the scratch coverage in CalciteEvalCommandIT).
1349+
//
1350+
// Gating on "the override actually fired" is what keeps the reviewer's PR #5351 case safe:
1351+
// `source=idx | fields agent | eval agent.name = "test"` has no pre-existing `agent.name`
1352+
// column, so overriddenNames is empty and the struct-parent `agent` survives untouched.
1353+
// It also keeps issue #5185 safe — spath introduces a MAP root and subsequent eval assigns
1354+
// to brand-new dotted paths that were not already in the row schema.
1355+
for (String overridden : overriddenNames) {
1356+
dropStructParentsFor(overridden, context);
1357+
}
12941358
}
12951359

1360+
/**
1361+
* Determine whether the column {@code originalName} should be replaced when a batch of new
1362+
* columns named {@code newNames} is being added. Only exact-name matches count as overrides —
1363+
* {@code eval foo.bar = ...} creates a brand new field literally named {@code foo.bar} and must
1364+
* never drop sibling or parent fields. This mirrors SPL1 semantics, where assigning a dotted name
1365+
* introduces a literal column of that name without touching any other field.
1366+
*
1367+
* <p>Earlier revisions (see PR #4606 / #5351) attempted to broaden this to a {@code
1368+
* newName.startsWith(originalName + ".")} prefix match. That prefix branch silently dropped any
1369+
* column that happened to be a prefix of an eval target, which caused two regressions:
1370+
*
1371+
* <ul>
1372+
* <li>Issue #5185 — a MAP-typed root column produced by {@code spath} got dropped when eval
1373+
* introduced multiple dotted-path fields under it.
1374+
* <li>The reviewer's case on PR #5351 — {@code source=big5 | fields agent | eval agent.name =
1375+
* "test"} dropped the {@code agent} column entirely.
1376+
* </ul>
1377+
*
1378+
* Struct-parent pruning for the "override on a real flattened nested leaf" case is handled
1379+
* uniformly in {@link #projectPlusOverriding(List, List, CalcitePlanContext)}, which invokes
1380+
* {@link #dropStructParentsFor(String, CalcitePlanContext)} only for overrides that actually
1381+
* replaced an existing column. This keeps issue #4482 fixed across every command that funnels
1382+
* through {@code projectPlusOverriding} (bin, eval, rex/sed, trendline, expand, flatten,
1383+
* patterns) without reintroducing the #5185 / reviewer regressions here.
1384+
*/
12961385
private boolean shouldOverrideField(String originalName, List<String> newNames) {
1297-
return newNames.stream()
1298-
.anyMatch(
1299-
newName ->
1300-
// Match exact field names (e.g., "age" == "age") for flat fields
1301-
newName.equals(originalName)
1302-
// OR match nested paths (e.g., "resource.attributes..." starts with
1303-
// "resource.")
1304-
|| newName.startsWith(originalName + "."));
1386+
return newNames.contains(originalName);
13051387
}
13061388

13071389
private List<List<RexInputRef>> extractInputRefList(List<RelBuilder.AggCall> aggCalls) {

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteEvalCommandIT.java

Lines changed: 102 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,11 +6,13 @@
66
package org.opensearch.sql.calcite.remote;
77

88
import static org.opensearch.sql.legacy.TestsConstants.TEST_INDEX_BANK;
9+
import static org.opensearch.sql.legacy.TestsConstants.TEST_INDEX_TELEMETRY;
910
import static org.opensearch.sql.util.MatcherUtils.rows;
1011
import static org.opensearch.sql.util.MatcherUtils.schema;
1112
import static org.opensearch.sql.util.MatcherUtils.verifyDataRows;
1213
import static org.opensearch.sql.util.MatcherUtils.verifySchema;
1314

15+
import com.google.common.collect.ImmutableMap;
1416
import java.io.IOException;
1517
import org.json.JSONObject;
1618
import org.junit.jupiter.api.Test;
@@ -25,6 +27,7 @@ public void init() throws Exception {
2527
enableCalcite();
2628

2729
loadIndex(Index.BANK);
30+
loadIndex(Index.TELEMETRY);
2831

2932
// Create test data for string concatenation
3033
Request request1 = new Request("PUT", "/test_eval/_doc/1?refresh=true");
@@ -38,6 +41,21 @@ public void init() throws Exception {
3841
Request request3 = new Request("PUT", "/test_eval/_doc/3?refresh=true");
3942
request3.setJsonEntity("{\"name\": \"Charlie\", \"age\": null, \"title\": \"Analyst\"}");
4043
client().performRequest(request3);
44+
45+
// Index with a struct field `agent` to reproduce the reviewer's case from PR #5351:
46+
// source=<idx with agent struct> | fields agent | eval agent.name = "test"
47+
// Rely on dynamic mapping — OpenSearch infers `agent` as an object with string children
48+
// from the document contents. Using dynamic mapping keeps the init idempotent across
49+
// repeated `@Before` invocations in the preserved cluster.
50+
Request agentDoc1 = new Request("PUT", "/test_eval_agent/_doc/1?refresh=true");
51+
agentDoc1.setJsonEntity(
52+
"{\"agent\": {\"name\": \"winlogbeat\", \"version\": \"7.0\"}, \"message\": \"hello\"}");
53+
client().performRequest(agentDoc1);
54+
55+
Request agentDoc2 = new Request("PUT", "/test_eval_agent/_doc/2?refresh=true");
56+
agentDoc2.setJsonEntity(
57+
"{\"agent\": {\"name\": \"filebeat\", \"version\": \"8.1\"}, \"message\": \"world\"}");
58+
client().performRequest(agentDoc2);
4159
}
4260

4361
@Test
@@ -86,6 +104,90 @@ public void testEvalStringConcatenationWithLiterals() throws IOException {
86104
rows("Charlie", "Analyst", "Name: Charlie, Title: Analyst"));
87105
}
88106

107+
@Test
108+
public void testEvalDottedNameDoesNotDropStructParent() throws IOException {
109+
// Reviewer's case from PR #5351: assigning a new dotted-path column must not remove the
110+
// struct-parent column that happens to be a prefix of the eval target.
111+
// Equivalent SPL1 query:
112+
// source=<idx with agent struct> | fields agent | eval agent.name = "test"
113+
// Before the fix, the prefix-override in shouldOverrideField silently dropped the `agent`
114+
// column entirely from the result schema. With the fix, `agent` is preserved.
115+
// The newly-created literal column `agent.name` is also available (verified via an
116+
// explicit trailing `fields` projection that bypasses tryToRemoveNestedFields).
117+
JSONObject result =
118+
executeQuery(
119+
"source=test_eval_agent | fields agent | eval `agent.name` = 'test' | fields agent,"
120+
+ " `agent.name`");
121+
verifySchema(result, schema("agent", "struct"), schema("agent.name", "string"));
122+
verifyDataRows(
123+
result,
124+
rows(ImmutableMap.of("name", "winlogbeat", "version", "7.0"), "test"),
125+
rows(ImmutableMap.of("name", "filebeat", "version", "8.1"), "test"));
126+
}
127+
128+
@Test
129+
public void testEvalDottedNamePreservesStructParent_ImplicitProject() throws IOException {
130+
// Complementary coverage for the reviewer's case without the explicit trailing projection.
131+
// With the implicit `fields *` (AllFields) that the PPL parser appends, the downstream
132+
// `tryToRemoveNestedFields` pass still collapses the flattened leaf back into its struct
133+
// parent -- but the important regression guard is that the struct parent `agent` is no
134+
// longer dropped by `shouldOverrideField`'s prefix branch.
135+
JSONObject result =
136+
executeQuery("source=test_eval_agent | fields agent | eval `agent.name` = 'test'");
137+
verifySchema(result, schema("agent", "struct"));
138+
}
139+
140+
@Test
141+
public void testEvalOverrideOfFlattenedNestedLeafSurvivesImplicitProject() throws IOException {
142+
// Regression guard for PR #5351: eval assigning a new value to a dotted name that matches an
143+
// existing OpenSearch flattened nested leaf must not have that value silently eaten by the
144+
// implicit `fields *` pass.
145+
//
146+
// The telemetry mapping exposes struct parents (resource, resource.attributes, ...,
147+
// resource.attributes.telemetry.sdk) side-by-side with the flattened leaves. When eval
148+
// overrides the leaf, projectPlusOverriding now prunes those struct parents so a subsequent
149+
// tryToRemoveNestedFields pass does not delete the overridden leaf on the way out.
150+
//
151+
// Before the fix, this query returned rows with the original `resource` struct (still
152+
// containing the pre-override integer version) and no `resource.attributes.telemetry.sdk.*`
153+
// flattened leaves at all -- the "OVERRIDE" string was completely lost.
154+
JSONObject result =
155+
executeQuery(
156+
String.format(
157+
"source=%s | eval `resource.attributes.telemetry.sdk.version` = 'OVERRIDE'",
158+
TEST_INDEX_TELEMETRY));
159+
160+
verifyDataRows(
161+
result,
162+
rows(true, "java", "opentelemetry", 9, "OVERRIDE"),
163+
rows(false, "python", "opentelemetry", 12, "OVERRIDE"),
164+
rows(true, "javascript", "opentelemetry", 9, "OVERRIDE"),
165+
rows(false, "go", "opentelemetry", 16, "OVERRIDE"),
166+
rows(true, "rust", "opentelemetry", 12, "OVERRIDE"));
167+
}
168+
169+
@Test
170+
public void testEvalOverrideOfFlattenedNestedLeafWithExplicitProject() throws IOException {
171+
// Control for the test above: with an explicit trailing `fields` projection, the implicit
172+
// `fields *` codepath (and tryToRemoveNestedFields) does not run, so eval always returned
173+
// the overridden value even before the fix. This test pins the user-facing contract for the
174+
// explicit-projection variant regardless of internal pruning behaviour.
175+
JSONObject result =
176+
executeQuery(
177+
String.format(
178+
"source=%s | eval `resource.attributes.telemetry.sdk.version` = 'OVERRIDE' | fields"
179+
+ " `resource.attributes.telemetry.sdk.version`",
180+
TEST_INDEX_TELEMETRY));
181+
verifySchema(result, schema("resource.attributes.telemetry.sdk.version", "string"));
182+
verifyDataRows(
183+
result,
184+
rows("OVERRIDE"),
185+
rows("OVERRIDE"),
186+
rows("OVERRIDE"),
187+
rows("OVERRIDE"),
188+
rows("OVERRIDE"));
189+
}
190+
89191
@Test
90192
public void testEvalStringConcatenationWithExistingData() throws IOException {
91193
JSONObject result =

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLSpathCommandIT.java

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -216,4 +216,27 @@ public void testSpathAutoExtractWithSort() throws IOException {
216216
verifySchema(result, schema("doc.user.name", "string"));
217217
verifyDataRowsInOrder(result, rows("Alice"), rows("John"));
218218
}
219+
220+
@Test
221+
public void testSpathAutoExtractWithMultiFieldEval() throws IOException {
222+
JSONObject result =
223+
executeQuery(
224+
"source=test_spath_cmd | spath input=doc"
225+
+ " | eval doc.user.name=doc.user.name, doc.user.age=doc.user.age"
226+
+ " | fields doc.user.name, doc.user.age");
227+
verifySchema(result, schema("doc.user.name", "string"), schema("doc.user.age", "string"));
228+
verifyDataRows(result, rows("Alice", "25"), rows("John", "30"));
229+
}
230+
231+
@Test
232+
public void testSpathAutoExtractWithSeparateEvalCommands() throws IOException {
233+
JSONObject result =
234+
executeQuery(
235+
"source=test_spath_cmd | spath input=doc"
236+
+ " | eval doc.user.name=doc.user.name"
237+
+ " | eval doc.user.age=doc.user.age"
238+
+ " | fields doc.user.name, doc.user.age");
239+
verifySchema(result, schema("doc.user.name", "string"), schema("doc.user.age", "string"));
240+
verifyDataRows(result, rows("Alice", "25"), rows("John", "30"));
241+
}
219242
}
Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
setup:
2+
- do:
3+
query.settings:
4+
body:
5+
transient:
6+
plugins.calcite.enabled: true
7+
- do:
8+
indices.create:
9+
index: issue5185
10+
body:
11+
settings:
12+
number_of_shards: 1
13+
number_of_replicas: 0
14+
mappings:
15+
properties:
16+
doc:
17+
type: text
18+
- do:
19+
bulk:
20+
refresh: true
21+
body:
22+
- '{"index": {"_index": "issue5185", "_id": "1"}}'
23+
- '{"doc": "{\"user\":{\"name\":\"John\",\"age\":30}}"}'
24+
- '{"index": {"_index": "issue5185", "_id": "2"}}'
25+
- '{"doc": "{\"user\":{\"name\":\"Alice\",\"age\":25}}"}'
26+
27+
---
28+
teardown:
29+
- do:
30+
indices.delete:
31+
index: issue5185
32+
ignore_unavailable: true
33+
- do:
34+
query.settings:
35+
body:
36+
transient:
37+
plugins.calcite.enabled: false
38+
39+
---
40+
"Issue 5185: eval with multiple dotted-path assignments from MAP column":
41+
- skip:
42+
features:
43+
- headers
44+
- allowed_warnings
45+
- do:
46+
headers:
47+
Content-Type: 'application/json'
48+
ppl:
49+
body:
50+
query: "source=issue5185 | spath input=doc | eval doc.user.name=doc.user.name, doc.user.age=doc.user.age | fields doc.user.name, doc.user.age"
51+
52+
- match: { total: 2 }
53+
- length: { datarows: 2 }
54+
55+
---
56+
"Issue 5185: separate eval commands with dotted-path from MAP column":
57+
- skip:
58+
features:
59+
- headers
60+
- allowed_warnings
61+
- do:
62+
headers:
63+
Content-Type: 'application/json'
64+
ppl:
65+
body:
66+
query: "source=issue5185 | spath input=doc | eval doc.user.name=doc.user.name | eval doc.user.age=doc.user.age | fields doc.user.name, doc.user.age"
67+
68+
- match: { total: 2 }
69+
- length: { datarows: 2 }

0 commit comments

Comments
 (0)