Skip to content

Commit 459bffc

Browse files
committed
[Analytics Backend / DataFusion] Don't let substrait AssertionError kill the cluster
Substrait's plan validators (VariadicParameterConsistencyValidator, RelOptUtil.eq via Litmus.THROW, etc.) throw AssertionError directly via explicit `throw new AssertionError(...)` rather than via the `assert` keyword, so the JVM -da flag doesn't gate them. When a malformed plan triggers one inside a search-thread call to SubstraitRelVisitor.apply, the AssertionError propagates uncaught up the analytics-engine fragment handler stack, OpenSearchUncaughtExceptionHandler classifies it as fatal, and the entire cluster JVM exits. Wrap the visitor.apply call in a narrow try/catch that re-raises the AssertionError as IllegalStateException with the original message and cause preserved. The analytics-engine error path already buckets IllegalStateException at the fragment boundary into a normal HTTP 500 response — the cluster stays up and the failure shows in the per-query report instead. This came up while diagnosing CalciteMVAppendFunctionIT failures: malformed ARRAY<ANY> plans were taking down the cluster mid-test instead of producing per-test failures, masking the underlying substrait conversion error. Signed-off-by: Kai Huang <ahkcs@amazon.com>
1 parent 5b22de4 commit 459bffc

1 file changed

Lines changed: 14 additions & 1 deletion

File tree

sandbox/plugins/analytics-backend-datafusion/src/main/java/org/opensearch/be/datafusion/DataFusionFragmentConvertor.java

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -230,7 +230,20 @@ private byte[] convertToSubstrait(RelNode fragment) {
230230
RelNode preprocessed = UntypedNullPreprocessor.rewrite(fragment);
231231
RelRoot root = RelRoot.of(preprocessed, SqlKind.SELECT);
232232
SubstraitRelVisitor visitor = createVisitor(preprocessed);
233-
Rel substraitRel = visitor.apply(root.rel);
233+
Rel substraitRel;
234+
try {
235+
substraitRel = visitor.apply(root.rel);
236+
} catch (AssertionError e) {
237+
// Substrait validators (e.g. VariadicParameterConsistencyValidator,
238+
// RelOptUtil.eq via Litmus.THROW) throw AssertionError directly via Java
239+
// code rather than via the `assert` keyword, so JVM -da doesn't gate them.
240+
// If one fires inside a search thread, OpenSearchUncaughtExceptionHandler
241+
// exits the cluster JVM. Convert to IllegalStateException so the analytics-
242+
// engine error path treats it as a normal per-query failure (HTTP 500 with
243+
// a bucketable message) instead of taking down the cluster.
244+
throw new IllegalStateException(
245+
"Substrait conversion rejected the plan: " + e.getMessage(), e);
246+
}
234247

235248
List<String> fieldNames = root.fields.stream().map(field -> field.getValue()).toList();
236249

0 commit comments

Comments
 (0)