Add pluggable filter predicate registry for custom query semantics#18129
Add pluggable filter predicate registry for custom query semantics#18129xiangfu0 wants to merge 9 commits intoapache:masterfrom
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #18129 +/- ##
============================================
+ Coverage 63.30% 64.04% +0.73%
+ Complexity 1627 789 -838
============================================
Files 3226 3231 +5
Lines 196636 196839 +203
Branches 30401 30440 +39
============================================
+ Hits 124490 126062 +1572
+ Misses 62170 60699 -1471
- Partials 9976 10078 +102
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds a ServiceLoader-backed plugin SPI to register custom filter predicates and their execution operators, enabling new query semantics without core modifications.
Changes:
- Introduces registries for custom filter predicate plugins and custom filter operator factories.
- Wires registry checks into parsing/rewriting/predicate construction and adds
Predicate.Type.CUSTOM. - Initializes registries in broker/controller/server starters and registers custom predicates into Calcite’s operator table.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| pinot-server/src/main/java/org/apache/pinot/server/starter/helix/BaseServerStarter.java | Initializes predicate and operator registries on server startup |
| pinot-query-planner/src/main/java/org/apache/pinot/calcite/sql/fun/PinotOperatorTable.java | Registers custom predicates as Calcite SQL operators |
| pinot-core/src/main/java/org/apache/pinot/core/plan/FilterPlanNode.java | Executes CUSTOM predicates via custom operator factories |
| pinot-core/src/main/java/org/apache/pinot/core/operator/filter/custom/CustomFilterOperatorRegistry.java | New ServiceLoader/programmatic registry for operator factories |
| pinot-core/src/main/java/org/apache/pinot/core/operator/filter/custom/CustomFilterOperatorFactory.java | New SPI for creating filter operators for custom predicates |
| pinot-controller/src/main/java/org/apache/pinot/controller/BaseControllerStarter.java | Initializes predicate registry on controller startup |
| pinot-common/src/main/java/org/apache/pinot/sql/parsers/rewriter/PredicateComparisonRewriter.java | Delegates rewriting to custom predicate plugins when applicable |
| pinot-common/src/main/java/org/apache/pinot/sql/parsers/CalciteSqlParser.java | Delegates validation to custom predicate plugins when applicable |
| pinot-common/src/main/java/org/apache/pinot/common/request/context/predicate/Predicate.java | Adds Predicate.Type.CUSTOM sentinel |
| pinot-common/src/main/java/org/apache/pinot/common/request/context/predicate/CustomPredicate.java | Introduces base class for custom predicates |
| pinot-common/src/main/java/org/apache/pinot/common/request/context/RequestContextUtils.java | Creates predicates via plugin when name matches registry |
| pinot-common/src/main/java/org/apache/pinot/common/filter/FilterPredicateRegistry.java | New ServiceLoader/programmatic registry for predicate plugins |
| pinot-common/src/main/java/org/apache/pinot/common/filter/FilterPredicatePlugin.java | New SPI for validation/rewriting/operator metadata/predicate creation |
| pinot-broker/src/main/java/org/apache/pinot/broker/broker/helix/BaseBrokerStarter.java | Initializes predicate registry on broker startup |
a83a1c0 to
45d13af
Compare
xiangfu0
left a comment
There was a problem hiding this comment.
Found a few high-signal issues; see inline comments.
434247b to
c9a4dcb
Compare
Introduce a plugin framework that allows registering custom filter predicates (e.g., SEMANTIC_MATCH) without modifying Pinot core code. This is a one-time surgery to add extension points at each layer of the query pipeline: - FilterPredicatePlugin SPI: validation, rewriting, Calcite operator metadata, and predicate creation - CustomFilterOperatorFactory SPI: filter operator construction at execution time - ServiceLoader-based discovery with programmatic registration fallback - CUSTOM sentinel in Predicate.Type for plugin-defined predicates Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Example custom filter predicate plugin showing how to use the registry: - LikeAnyPredicate: matches rows where column matches ANY LIKE pattern - LikeAnyPlugin: implements FilterPredicatePlugin (parsing/validation) - LikeAnyFilterOperatorFactory: implements CustomFilterOperatorFactory - Dictionary and raw-value evaluators using combined regex - Unit tests covering all plugin interfaces Usage: SELECT * FROM t WHERE LIKE_ANY(col, 'foo%', '%bar', 'A_e') Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The META-INF/services files for FilterPredicatePlugin and CustomFilterOperatorFactory cannot have license headers (ServiceLoader format). Add them to both apache-rat-plugin and license-maven-plugin exclusion lists. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…single get() - Use Locale.ROOT for toUpperCase() in both registries to avoid locale-sensitive behavior (e.g., Turkish locale) - Add AtomicBoolean guard to init() in both registries to make repeated calls a cheap no-op - Use single get() instead of containsKey()+get() in CalciteSqlParser to avoid potential race condition - Add instanceof CustomPredicate check before cast in FilterPlanNode (both function-LHS and column-LHS paths) for clear error messages - Add hasMetadataFilter overload to CustomFilterOperatorFactory with default delegation to the basic method Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix LikeAnyDictionaryEvaluator.applySV: O(1) boolean[] lookup instead of O(k) linear scan through matching dict IDs - Fix fully qualified java.util.ArrayList in PinotOperatorTable - Add @nullable annotation to dataSource param in CustomFilterOperatorFactory with javadoc explaining when it is null (function-based LHS predicates) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ae8c39f to
40974e9
Compare
Summary
Introduces a plugin framework (
FilterPredicatePlugin+CustomFilterOperatorFactory) that allows registering custom filter predicates (e.g.,SEMANTIC_MATCH) via ServiceLoader without modifying Pinot core code.Motivation
Adding a new filter predicate today requires touching 8 hardcoded points across 4 modules (enums, switch statements, if-else chains). This one-time change adds registry-based extension points so future predicates can be added purely via plugin JARs on the classpath.
New SPI Interfaces
FilterPredicatePlugin (pinot-common)
Handles the query parsing pipeline:
name()— predicate name (e.g., "SEMANTIC_MATCH")validateFilterExpression()— SQL validation (CalciteSqlParser)rewriteExpression()— predicate rewriting (PredicateComparisonRewriter)getOperandTypes()/getOptionalOperandIndices()— Calcite operator registrationcreatePredicate()— convert parsed operands to a Predicate objectCustomFilterOperatorFactory (pinot-core)
Handles query execution:
predicateName()— matches the predicate namecreateFilterOperator()— creates the filter operator for a segmentcreateFilterOperator(..., hasMetadataFilter)— metadata-filter-aware variant for AND queriesCustomPredicate (pinot-common)
Base class for plugin predicates. Returns
Predicate.Type.CUSTOMand carries acustomTypeNamefor factory lookup at execution time.New Files
FilterPredicatePlugin.javaFilterPredicateRegistry.javaCustomPredicate.javaCustomFilterOperatorFactory.javaCustomFilterOperatorRegistry.javaExtension Points Modified
Predicate.TypeCUSTOMsentinelCalciteSqlParser.validateFilter()PredicateComparisonRewriterRequestContextUtils.getFilterInner()FilterPlanNodeCUSTOMcase + function-LHS handlingPinotOperatorTableregisterCustomFilterPredicates()from registryBaseBrokerStarterFilterPredicateRegistry.init()at startupBaseServerStarterBaseControllerStarterFilterPredicateRegistry.init()at startupWhat a Plugin Looks Like
Performance Impact
Zero for existing built-in predicates. Custom predicates add one
ConcurrentHashMap.get()per query (planning) and per segment (execution). Per-row filter execution has zero overhead.Test plan
Follow-up PR will extract
VECTOR_SIMILARITYintopinot-plugins/pinot-vectoras proof-of-concept.🤖 Generated with Claude Code