[GLUTEN-12143][VL] Route bitmap_construct_agg to native Velox execution#12142
Conversation
Register bitmap_construct_agg as a supported aggregate expression in the Velox backend, allowing it to be executed natively instead of falling back to vanilla Spark. Changes: - Add BITMAP_CONSTRUCT_AGG constant to ExpressionNames - Register Sig[BitmapConstructAgg] in Spark 3.5, 4.0, and 4.1 shims - Add bitmap_construct_agg to C++ plan validator allowed list Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Run Gluten Clickhouse CI on x86 |
weiting-chen
left a comment
There was a problem hiding this comment.
Thanks for the clean implementation — the three-layer plumbing (constant + shim registration + C++ validator) follows the established pattern correctly.
Two suggestions for consideration (non-blocking):
| Sig[RegrSXY](ExpressionNames.REGR_SXY), | ||
| Sig[RegrReplacement](ExpressionNames.REGR_REPLACEMENT) | ||
| Sig[RegrReplacement](ExpressionNames.REGR_REPLACEMENT), | ||
| Sig[BitmapConstructAgg](ExpressionNames.BITMAP_CONSTRUCT_AGG) |
There was a problem hiding this comment.
Consider adding plan-shape assertion in tests
Problem: GlutenBitmapExpressionsQuerySuite inherits upstream Spark tests that only use checkAnswer() — they verify correctness but do not assert that native Velox execution is actually active. A regression that silently disables offloading would go undetected since vanilla Spark also produces correct results.
Evidence:
class GlutenBitmapExpressionsQuerySuite
extends BitmapExpressionsQuerySuite
with GlutenSQLTestsTrait {}
// All upstream tests only call checkAnswer(df, expected)Suggested Fix: Add at least one Gluten-specific test asserting the native transformer is present:
test("bitmap_construct_agg routes to native") {
val df = spark.sql("SELECT bitmap_construct_agg(col) FROM values (1L), (2L) AS t(col)")
val plan = df.queryExecution.executedPlan
assert(collect(plan) { case h: HashAggregateExecTransformer => h }.nonEmpty,
"Expected native HashAggregateExecTransformer in plan")
}There was a problem hiding this comment.
Added in all spark version. Thanks @weiting-chen
| "regr_sxy", | ||
| "regr_replacement"}; | ||
| "regr_replacement", | ||
| "bitmap_construct_agg"}; |
There was a problem hiding this comment.
Follow-up opportunity: register bitmap_or_agg (and bitmap_and_agg for Spark 4.1)
Problem: bitmap_or_agg was introduced in the same Spark 3.5 commit as bitmap_construct_agg, and bitmap_and_agg in Spark 4.1. If native Velox implementations exist for these, registering them together would avoid unnecessary columnar-to-row transitions when users combine bitmap functions in the same query stage.
Investigation Needed: Confirm whether bitmap_or_agg and bitmap_and_agg have native Velox implementations (look for registration in cpp/velox/udf/ or the Velox aggregate function registry). If yes, consider adding them in a follow-up PR:
"bitmap_construct_agg",
"bitmap_or_agg",
"bitmap_and_agg"};There was a problem hiding this comment.
I've Velox PR for bitmap_or_agg in review phase. Will add follow up PR for Gluten once Velox PR merges. Thanks!
Adds a test verifying that bitmap_construct_agg routes to native Velox execution (HashAggregateExecBaseTransformer) rather than falling back to vanilla Spark. Test added for Spark 3.5, 4.0, and 4.1. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Run Gluten Clickhouse CI on x86 |
|
+1. Three-layer plumbing ( Grounded the plan-shape test guard: |
bitmap_construct_agg offloaded to Velox throws GlutenException instead of SparkArrayIndexOutOfBoundsException. Exclude these error-path tests following the established pattern for native execution error type mismatches. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
|
Following up on my earlier +1 — looked deeper after the ClickHouse CI failures and I think the registration is at the wrong layer. CH CI is PR-tied (not infra flake). 5 tests fail in Root cause: Sibling pattern —
Suggest moving That should keep CH on the vanilla fallback path and let Velox continue to route natively. (Sorry for the late catch — my +1 cited |
…pings Move Sig[BitmapConstructAgg] from SparkXShims.aggregateExpressionMappings (all-backend) to VeloxSparkPlanExecApi.extraExpressionMappings (Velox-only). This prevents ClickHouse backend from attempting to push down bitmap_construct_agg, which it does not support. Follows the same pattern as BloomFilterAgg registration. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Run Gluten Clickhouse CI on x86 |
…ionMappings BitmapConstructAgg only exists in Spark 3.5+. Use runtime class loading with scala.util.Try to gracefully handle Spark 3.3/3.4 where the class is absent. This keeps the registration Velox-only (CH unaffected) and avoids compilation failures on older Spark versions. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Run Gluten Clickhouse CI on x86 |
| // For test purpose. | ||
| Sig[VeloxDummyExpression](VeloxDummyExpression.VELOX_DUMMY_EXPRESSION) | ||
| ) | ||
| ) ++ scala.util.Try( |
There was a problem hiding this comment.
This reflection-based registration is a workaround, not the right long-term shape — let's move it to the shims.
There was a problem hiding this comment.
Done. Moved to Shim and put it in blacklist on CH backend.
|
The CH CI failure ( |
Move Sig[BitmapConstructAgg] back to shims/aggregateExpressionMappings (spark35/40/41) following the established pattern for version-specific expressions. Add BITMAP_CONSTRUCT_AGG -> DefaultValidator() to CH_AGGREGATE_FUNC_BLACKLIST so ClickHouse backend falls back to vanilla Spark instead of attempting unsupported native push-down. Remove the Class.forName reflection workaround from VeloxSparkPlanExecApi. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Run Gluten Clickhouse CI on x86 |
1 similar comment
|
Run Gluten Clickhouse CI on x86 |
The plan-shape assertion test verifies native Velox execution, which is not applicable to the ClickHouse backend. Exclude it using .excludeCH() following the same pattern as GlutenBloomFilterAggregateQuerySuite. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Run Gluten Clickhouse CI on x86 |
I discussed with @baibaichen and he suggested to put this function in blacklist at CH backend and put it in SHIM layers only for Velox. Updated the code. Thanks @yaooqinn for the review. |
Done, addressed. Please help with review @baibaichen |
|
LGTM |
| .exclude("column pruning - non-readable file") | ||
| enableSuite[GlutenBitmapExpressionsQuerySuite] | ||
| // bitmap_construct_agg is not supported natively in CH backend. | ||
| .excludeCH("bitmap_construct_agg routes to native") |
There was a problem hiding this comment.
This one checks the native plan path, specifically added to test the native plan. Since for CH backend it runs on spark, this test fail in CH backend.
Route
bitmap_construct_aggto Native Velox ExecutionThis PR enables native Velox execution for the
bitmap_construct_aggaggregate function, avoiding fallback to vanilla Spark.Changes
ExpressionNames.scala): AddedBITMAP_CONSTRUCT_AGGconstant.Spark35Shims.scala,Spark40Shims.scala,Spark41Shims.scala): RegisterSig[BitmapConstructAgg]mapping to the substrait function name.SubstraitToVeloxPlanValidator.cc): Added"bitmap_construct_agg"to thesupportedAggFuncsset so the native validator accepts this function.GlutenBitmapExpressionsQuerySuite.scalafor spark35/40/41): Verifies the query plan containsHashAggregateExecBaseTransformer(native execution) rather than falling back.Test Results
All tests include the new plan-shape assertion test confirming native Velox execution.
Related issue: #12143