Skip to content

Add pluggable filter predicate registry for custom query semantics#18129

Closed
xiangfu0 wants to merge 9 commits intoapache:masterfrom
xiangfu0:pluggable-filter-registry
Closed

Add pluggable filter predicate registry for custom query semantics#18129
xiangfu0 wants to merge 9 commits intoapache:masterfrom
xiangfu0:pluggable-filter-registry

Conversation

@xiangfu0
Copy link
Copy Markdown
Contributor

@xiangfu0 xiangfu0 commented Apr 8, 2026

Summary

Introduces a plugin framework (FilterPredicatePlugin + CustomFilterOperatorFactory) that allows registering custom filter predicates (e.g., SEMANTIC_MATCH) via ServiceLoader without modifying Pinot core code.

Motivation

Adding a new filter predicate today requires touching 8 hardcoded points across 4 modules (enums, switch statements, if-else chains). This one-time change adds registry-based extension points so future predicates can be added purely via plugin JARs on the classpath.

New SPI Interfaces

FilterPredicatePlugin (pinot-common)

Handles the query parsing pipeline:

  • name() — predicate name (e.g., "SEMANTIC_MATCH")
  • validateFilterExpression() — SQL validation (CalciteSqlParser)
  • rewriteExpression() — predicate rewriting (PredicateComparisonRewriter)
  • getOperandTypes() / getOptionalOperandIndices() — Calcite operator registration
  • createPredicate() — convert parsed operands to a Predicate object

CustomFilterOperatorFactory (pinot-core)

Handles query execution:

  • predicateName() — matches the predicate name
  • createFilterOperator() — creates the filter operator for a segment
  • createFilterOperator(..., hasMetadataFilter) — metadata-filter-aware variant for AND queries

CustomPredicate (pinot-common)

Base class for plugin predicates. Returns Predicate.Type.CUSTOM and carries a customTypeName for factory lookup at execution time.

New Files

File Module Purpose
FilterPredicatePlugin.java pinot-common SPI: validation, rewriting, operand types, predicate creation
FilterPredicateRegistry.java pinot-common ServiceLoader + programmatic registration
CustomPredicate.java pinot-common Base class for plugin predicates
CustomFilterOperatorFactory.java pinot-core SPI: filter operator creation
CustomFilterOperatorRegistry.java pinot-core ServiceLoader + programmatic registration

Extension Points Modified

File Change
Predicate.Type Added CUSTOM sentinel
CalciteSqlParser.validateFilter() Registry check before default branch
PredicateComparisonRewriter Registry check before boolean-conversion fallback
RequestContextUtils.getFilterInner() Registry check before EnumUtils fallback
FilterPlanNode CUSTOM case + function-LHS handling
PinotOperatorTable registerCustomFilterPredicates() from registry
BaseBrokerStarter FilterPredicateRegistry.init() at startup
BaseServerStarter Both registries init at startup
BaseControllerStarter FilterPredicateRegistry.init() at startup

What a Plugin Looks Like

// 1. Implement FilterPredicatePlugin
public class SemanticMatchPlugin implements FilterPredicatePlugin {
    @Override public String name() { return "SEMANTIC_MATCH"; }
    @Override public List<OperandType> getOperandTypes() {
        return List.of(OperandType.STRING, OperandType.STRING, OperandType.INTEGER);
    }
    @Override public List<Integer> getOptionalOperandIndices() { return List.of(2); }
    @Override public void validateFilterExpression(List<Expression> operands) { /* ... */ }
    @Override public Predicate createPredicate(List<ExpressionContext> operands) {
        return new SemanticMatchPredicate(operands.get(0), /* ... */);
    }
}

// 2. Implement CustomFilterOperatorFactory
public class SemanticMatchOperatorFactory implements CustomFilterOperatorFactory {
    @Override public String predicateName() { return "SEMANTIC_MATCH"; }
    @Override public BaseFilterOperator createFilterOperator(
        IndexSegment segment, QueryContext ctx, Predicate pred, DataSource ds, int numDocs) {
        return new SemanticMatchFilterOperator(/* ... */);
    }
}

// 3. Register via META-INF/services (ServiceLoader auto-discovery)

Performance Impact

Zero for existing built-in predicates. Custom predicates add one ConcurrentHashMap.get() per query (planning) and per segment (execution). Per-row filter execution has zero overhead.

Test plan

  • All modified modules compile cleanly
  • Checkstyle and license checks pass
  • Integration test with a sample custom predicate plugin

Follow-up PR will extract VECTOR_SIMILARITY into pinot-plugins/pinot-vector as proof-of-concept.

🤖 Generated with Claude Code

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 8, 2026

Codecov Report

❌ Patch coverage is 56.17530% with 110 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.04%. Comparing base (22b3b6f) to head (40974e9).

Files with missing lines Patch % Lines
...ache/pinot/calcite/sql/fun/PinotOperatorTable.java 8.33% 32 Missing and 1 partial ⚠️
...org/apache/pinot/sql/parsers/CalciteSqlParser.java 47.05% 7 Missing and 11 partials ⚠️
...e/pinot/common/filter/FilterPredicateRegistry.java 70.00% 8 Missing and 4 partials ⚠️
...or/filter/custom/CustomFilterOperatorRegistry.java 63.63% 6 Missing and 2 partials ⚠️
...ava/org/apache/pinot/core/plan/FilterPlanNode.java 66.66% 3 Missing and 5 partials ⚠️
...ot/common/request/context/RequestContextUtils.java 75.00% 2 Missing and 2 partials ⚠️
.../pinot/server/starter/helix/BaseServerStarter.java 0.00% 4 Missing ⚠️
...ava/org/apache/pinot/spi/plugin/PluginManager.java 77.77% 3 Missing and 1 partial ⚠️
...mentpruner/MultiPartitionColumnsSegmentPruner.java 0.00% 3 Missing ⚠️
...mentpruner/SinglePartitionColumnSegmentPruner.java 0.00% 3 Missing ⚠️
... and 6 more
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18129      +/-   ##
============================================
+ Coverage     63.30%   64.04%   +0.73%     
+ Complexity     1627      789     -838     
============================================
  Files          3226     3231       +5     
  Lines        196636   196839     +203     
  Branches      30401    30440      +39     
============================================
+ Hits         124490   126062    +1572     
+ Misses        62170    60699    -1471     
- Partials       9976    10078     +102     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 64.01% <56.17%> (+0.74%) ⬆️
java-21 64.00% <56.17%> (+0.72%) ⬆️
temurin 64.04% <56.17%> (+0.73%) ⬆️
unittests 64.04% <56.17%> (+0.73%) ⬆️
unittests1 56.26% <54.27%> (+0.99%) ⬆️
unittests2 34.97% <22.31%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@xiangfu0 xiangfu0 added query Related to query processing extension-point Adds or modifies an extension/SPI point plugins Related to the plugin system feature New functionality labels Apr 8, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds a ServiceLoader-backed plugin SPI to register custom filter predicates and their execution operators, enabling new query semantics without core modifications.

Changes:

  • Introduces registries for custom filter predicate plugins and custom filter operator factories.
  • Wires registry checks into parsing/rewriting/predicate construction and adds Predicate.Type.CUSTOM.
  • Initializes registries in broker/controller/server starters and registers custom predicates into Calcite’s operator table.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
pinot-server/src/main/java/org/apache/pinot/server/starter/helix/BaseServerStarter.java Initializes predicate and operator registries on server startup
pinot-query-planner/src/main/java/org/apache/pinot/calcite/sql/fun/PinotOperatorTable.java Registers custom predicates as Calcite SQL operators
pinot-core/src/main/java/org/apache/pinot/core/plan/FilterPlanNode.java Executes CUSTOM predicates via custom operator factories
pinot-core/src/main/java/org/apache/pinot/core/operator/filter/custom/CustomFilterOperatorRegistry.java New ServiceLoader/programmatic registry for operator factories
pinot-core/src/main/java/org/apache/pinot/core/operator/filter/custom/CustomFilterOperatorFactory.java New SPI for creating filter operators for custom predicates
pinot-controller/src/main/java/org/apache/pinot/controller/BaseControllerStarter.java Initializes predicate registry on controller startup
pinot-common/src/main/java/org/apache/pinot/sql/parsers/rewriter/PredicateComparisonRewriter.java Delegates rewriting to custom predicate plugins when applicable
pinot-common/src/main/java/org/apache/pinot/sql/parsers/CalciteSqlParser.java Delegates validation to custom predicate plugins when applicable
pinot-common/src/main/java/org/apache/pinot/common/request/context/predicate/Predicate.java Adds Predicate.Type.CUSTOM sentinel
pinot-common/src/main/java/org/apache/pinot/common/request/context/predicate/CustomPredicate.java Introduces base class for custom predicates
pinot-common/src/main/java/org/apache/pinot/common/request/context/RequestContextUtils.java Creates predicates via plugin when name matches registry
pinot-common/src/main/java/org/apache/pinot/common/filter/FilterPredicateRegistry.java New ServiceLoader/programmatic registry for predicate plugins
pinot-common/src/main/java/org/apache/pinot/common/filter/FilterPredicatePlugin.java New SPI for validation/rewriting/operator metadata/predicate creation
pinot-broker/src/main/java/org/apache/pinot/broker/broker/helix/BaseBrokerStarter.java Initializes predicate registry on broker startup

Comment thread pinot-common/src/main/java/org/apache/pinot/sql/parsers/CalciteSqlParser.java Outdated
@xiangfu0 xiangfu0 force-pushed the pluggable-filter-registry branch 2 times, most recently from a83a1c0 to 45d13af Compare April 9, 2026 06:52
Copy link
Copy Markdown
Contributor Author

@xiangfu0 xiangfu0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found a few high-signal issues; see inline comments.

@xiangfu0 xiangfu0 force-pushed the pluggable-filter-registry branch 3 times, most recently from 434247b to c9a4dcb Compare April 13, 2026 04:10
xiangfu0 and others added 9 commits April 13, 2026 01:43
Introduce a plugin framework that allows registering custom filter predicates
(e.g., SEMANTIC_MATCH) without modifying Pinot core code. This is a one-time
surgery to add extension points at each layer of the query pipeline:

- FilterPredicatePlugin SPI: validation, rewriting, Calcite operator metadata,
  and predicate creation
- CustomFilterOperatorFactory SPI: filter operator construction at execution time
- ServiceLoader-based discovery with programmatic registration fallback
- CUSTOM sentinel in Predicate.Type for plugin-defined predicates

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Example custom filter predicate plugin showing how to use the registry:
- LikeAnyPredicate: matches rows where column matches ANY LIKE pattern
- LikeAnyPlugin: implements FilterPredicatePlugin (parsing/validation)
- LikeAnyFilterOperatorFactory: implements CustomFilterOperatorFactory
- Dictionary and raw-value evaluators using combined regex
- Unit tests covering all plugin interfaces

Usage: SELECT * FROM t WHERE LIKE_ANY(col, 'foo%', '%bar', 'A_e')

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The META-INF/services files for FilterPredicatePlugin and
CustomFilterOperatorFactory cannot have license headers (ServiceLoader
format). Add them to both apache-rat-plugin and license-maven-plugin
exclusion lists.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…single get()

- Use Locale.ROOT for toUpperCase() in both registries to avoid
  locale-sensitive behavior (e.g., Turkish locale)
- Add AtomicBoolean guard to init() in both registries to make
  repeated calls a cheap no-op
- Use single get() instead of containsKey()+get() in CalciteSqlParser
  to avoid potential race condition
- Add instanceof CustomPredicate check before cast in FilterPlanNode
  (both function-LHS and column-LHS paths) for clear error messages
- Add hasMetadataFilter overload to CustomFilterOperatorFactory with
  default delegation to the basic method

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix LikeAnyDictionaryEvaluator.applySV: O(1) boolean[] lookup instead
  of O(k) linear scan through matching dict IDs
- Fix fully qualified java.util.ArrayList in PinotOperatorTable
- Add @nullable annotation to dataSource param in CustomFilterOperatorFactory
  with javadoc explaining when it is null (function-based LHS predicates)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@xiangfu0 xiangfu0 force-pushed the pluggable-filter-registry branch from ae8c39f to 40974e9 Compare April 13, 2026 08:44
@xiangfu0 xiangfu0 closed this Apr 14, 2026
@xiangfu0 xiangfu0 deleted the pluggable-filter-registry branch April 14, 2026 10:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

extension-point Adds or modifies an extension/SPI point feature New functionality plugins Related to the plugin system query Related to query processing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants