-
Notifications
You must be signed in to change notification settings - Fork 208
Register DISTINCT_COUNT_APPROX logical marker in PPLFuncImpTable #5466
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
yuancu
merged 1 commit into
opensearch-project:main
from
songkant-aws:dc-marker-registration
May 26, 2026
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
61 changes: 61 additions & 0 deletions
61
.../main/java/org/opensearch/sql/calcite/udf/udaf/DistinctCountApproxLogicalAggFunction.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,61 @@ | ||
| /* | ||
| * Copyright OpenSearch Contributors | ||
| * SPDX-License-Identifier: Apache-2.0 | ||
| */ | ||
|
|
||
| package org.opensearch.sql.calcite.udf.udaf; | ||
|
|
||
| import org.opensearch.sql.calcite.udf.UserDefinedAggFunction; | ||
|
|
||
| /** | ||
| * Logical marker UDAF for {@code DISTINCT_COUNT_APPROX}. Lets the PPL parser produce a RelNode that | ||
| * contains the operator without committing to a JVM execution path; backends are expected to push | ||
| * it down or rewrite it before execution: | ||
| * | ||
| * <ul> | ||
| * <li>OpenSearch V3 path: {@code OpenSearchExecutionEngine#registerOpenSearchFunctions} registers | ||
| * a real HyperLogLog++ implementation in {@code PPLFuncImpTable.aggExternalFunctionRegistry}, | ||
| * which overrides this marker (external registry has lookup precedence in {@code | ||
| * getImplementation}). {@code AggregateAnalyzer} then translates the operator to OpenSearch | ||
| * cardinality DSL. | ||
| * <li>Unified-query / DataFusion / analytics-engine path: backend planner rewrites the RexOver to | ||
| * {@code APPROX_COUNT_DISTINCT} (Calcite stdop) before substrait emission; the DataFusion | ||
| * substrait reader consumes that natively. | ||
| * </ul> | ||
| * | ||
| * <p>This class deliberately throws on every method. Reaching a method body means a backend either | ||
| * failed to push down or did not register an adapter — that is a configuration bug, not a runtime | ||
| * fallback. {@code RelevanceQueryFunction.RelevanceQueryImplementor} (used by {@code match}, {@code | ||
| * match_phrase}, etc.) follows the same pattern for relevance search functions that have no JVM | ||
| * execution semantics. | ||
| */ | ||
| public class DistinctCountApproxLogicalAggFunction | ||
| implements UserDefinedAggFunction<DistinctCountApproxLogicalAggFunction.MarkerAccumulator> { | ||
|
|
||
| private static final String NOT_EXECUTABLE = | ||
| "DISTINCT_COUNT_APPROX logical marker reached Enumerable execution; " | ||
| + "an engine-specific implementation must be registered or rewritten before execution."; | ||
|
|
||
| @Override | ||
| public MarkerAccumulator init() { | ||
| throw new UnsupportedOperationException(NOT_EXECUTABLE); | ||
| } | ||
|
|
||
| @Override | ||
| public MarkerAccumulator add(MarkerAccumulator acc, Object... values) { | ||
| throw new UnsupportedOperationException(NOT_EXECUTABLE); | ||
| } | ||
|
|
||
| @Override | ||
| public Object result(MarkerAccumulator accumulator) { | ||
| throw new UnsupportedOperationException(NOT_EXECUTABLE); | ||
| } | ||
|
|
||
| /** Placeholder accumulator. Never actually constructed because {@link #init()} throws. */ | ||
| public static class MarkerAccumulator implements Accumulator { | ||
| @Override | ||
| public Object value(Object... argList) { | ||
| throw new UnsupportedOperationException(NOT_EXECUTABLE); | ||
| } | ||
| } | ||
| } | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how many udf/udaf are skipping the JVM execution path now? How about add a new parent class for it, such as
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are only a couple of functions in this category today:
Both are registered from OpenSearchExecutionEngine.registerOpenSearchFunctions():
https://github.com/opensearch-project/sql/blob/main/opensearch/src/main/java/org/opensearch/sql/opensearch/executor/OpenSearchExecutionEngine.java#L329-L355
That registration only happens on the SQL/PPL V3 OpenSearchExecutionEngine path. The unified-query / analytics-engine path does not construct OpenSearchExecutionEngine; it uses UnifiedQueryPlanner to produce a Calcite RelNode via CalciteRelNodeVisitor, then passes that RelNode to the backend planner. As a result, PPLFuncImpTable cannot rely on aggExternalFunctionRegistry being populated on that path.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LantaoJin Could we hold on adding a new parent marker class? Because there is only a single UDAF case impacted.
I just checked that a NativeSpecificFunction marker would mostly only serve as a self-documenting type hint here — sandbox/analytics plugins don't depend on sql plugin's concrete classes (they dispatch on SqlOperator.getName() to their own enums via Calcite RelNode as a neutral ABI), and inside sql plugin there's no centralized hook (e.g., validator pass or RexImpTable) that would actually leverage instanceof on it today. Happy to revisit the abstraction in a follow-up once a third instance and a concrete consumer (validation pass /centralized rejection point) emerge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not a blocker. LGTM