Skip to content

Commit 109375d

Browse files
authored
[Calcite] Make containsNestedAggregator tolerate flat-leaf schemas (opensearch-project#5423)
* [Calcite] Make containsNestedAggregator tolerate flat-leaf schemas The check splits each aggregator argument on '.' and looks up the leading token as a top-level ARRAY column (the OpenSearch `nested` marker). It used relBuilder.field(root), which throws when the column doesn't exist. That works for the classic path, which always exposes a top-level column for object/nested parents. It breaks the analytics-engine path, which emits only flat leaves ("city.name", "city.location.latitude") — parent placeholders can't round-trip through Substrait. Any aggregation against an object-field leaf (e.g. stats max(city.location.latitude)) crashed with "Field [city] not found". Use RelDataType.getField, which returns null on miss. A null lookup is the correct "not nested" answer. Behavior unchanged for relations that do have a top-level ARRAY column. Signed-off-by: Marc Handalian <marc.handalian@gmail.com> * fix spotless Signed-off-by: Marc Handalian <marc.handalian@gmail.com> --------- Signed-off-by: Marc Handalian <marc.handalian@gmail.com>
1 parent 9124664 commit 109375d

1 file changed

Lines changed: 17 additions & 2 deletions

File tree

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java

Lines changed: 17 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1538,10 +1538,25 @@ private Pair<List<RexNode>, List<AggCall>> aggregateWithTrimming(
15381538
* count(a.b)] returns true.
15391539
*/
15401540
private boolean containsNestedAggregator(RelBuilder relBuilder, List<RexInputRef> aggCallRefs) {
1541+
// For each aggregator argument, take the part of its column name before the first dot
1542+
// (e.g. "city" from "city.location.latitude") and check whether that's a top-level
1543+
// ARRAY column — the marker for an OpenSearch `nested` field.
1544+
//
1545+
// The classic path always exposes a top-level column for object/nested parents. The
1546+
// analytics-engine path emits only the flat leaves ("city.name", "city.location.latitude")
1547+
// because parent placeholder types (MAP<VARCHAR, ANY>) can't round-trip through Substrait.
1548+
// RelDataType.getField returns null when the column doesn't exist — for analytics-engine,
1549+
// that null just means "not nested," which is the right answer.
1550+
RelDataType rowType = relBuilder.peek().getRowType();
15411551
return aggCallRefs.stream()
1542-
.map(r -> relBuilder.peek().getRowType().getFieldNames().get(r.getIndex()))
1552+
.map(r -> rowType.getFieldNames().get(r.getIndex()))
15431553
.map(name -> org.apache.commons.lang3.StringUtils.substringBefore(name, "."))
1544-
.anyMatch(root -> relBuilder.field(root).getType().getSqlTypeName() == SqlTypeName.ARRAY);
1554+
.anyMatch(
1555+
root -> {
1556+
RelDataTypeField field =
1557+
rowType.getField(root, /* caseSensitive= */ true, /* elideRecord= */ false);
1558+
return field != null && field.getType().getSqlTypeName() == SqlTypeName.ARRAY;
1559+
});
15451560
}
15461561

15471562
/**

0 commit comments

Comments
 (0)