feat: Add plan conversion statistics to extended explain info#2412
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2412 +/- ##
============================================
+ Coverage 56.12% 58.54% +2.41%
- Complexity 976 1445 +469
============================================
Files 119 146 +27
Lines 11743 13534 +1791
Branches 2251 2356 +105
============================================
+ Hits 6591 7923 +1332
- Misses 4012 4379 +367
- Partials 1140 1232 +92 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Subset of output from benchmarks |
| --conf spark.comet.explain.enabled=true \ | ||
| --conf spark.memory.offHeap.enabled=true \ | ||
| --conf spark.memory.offHeap.size=16g | ||
| --conf spark.memory.offHeap.size=2g |
There was a problem hiding this comment.
Unrelated change, but we don't need 16g for this simple example
|
|
||
| withSQLConf(CometConf.COMET_EXPLAIN_VERBOSE_ENABLED.key -> "true") { | ||
| val extendedExplain = new ExtendedExplainInfo().generateExtendedInfo(cometPlan) | ||
| assert(extendedExplain.contains("Comet accelerated 33% of eligible operators")) |
There was a problem hiding this comment.
would be this number fluctuating?
There was a problem hiding this comment.
It is currently stable across all Spark versions that we test with.
There was a problem hiding this comment.
I was wrong. There is a failure due to a different percentage. I will make the test less specific.
- DPP fallback *** FAILED *** (1 second, 553 milliseconds)
"BroadcastHashJoin
:- ColumnarToRow
: +- Scan parquet [COMET: Dynamic Partition Pruning is not supported]
: +- SubqueryBroadcast
: +- BroadcastExchange
: +- CometColumnarToRow
: +- CometFilter
: +- CometScan [native_iceberg_compat] parquet
+- BroadcastExchange
+- CometColumnarToRow
+- CometFilter
+- CometScan [native_iceberg_compat] parquet
Comet accelerated 4 out of 9 eligible operators (44%). Final plan contains 3 transitions between Spark and Comet." did not contain "Comet accelerated 33% of eligible operators" (CometExecSuite.scala:124)
From the user perspective it might be not very clear what is accelerated? which is not the case? |
Thanks, I will add documentation as part of this PR |
| var transitions: Int = 0 | ||
|
|
||
| override def toString: String = { | ||
| s"sparkOperators=$sparkOperators, cometOperators=$cometOperators, " + |
There was a problem hiding this comment.
Perhaps we could use a more verbose string here so that the meaning of these stats is a little more obvious
|
We could potentially add a method to get the plan stats independent of the explain string too. I can see this being useful in writing some tool to analyse queries/plans without executing them. |
|
Thanks for the reviews @comphead and @parthchandra. Could you take another look? |
parthchandra
left a comment
There was a problem hiding this comment.
lgtm. Minor improvement suggested. Also, is the comment re the unit test addressed?
Co-authored-by: Parth Chandra <parthc@apache.org>
|
Thanks for the reviews @parthchandra @comphead. I have addressed feedback and plan on merging this later today, unless you have any other feedback |
Which issue does this PR close?
N/A
Rationale for this change
I would like to make it easier to see what % of operators are accelerated by Comet.
What changes are included in this PR?
Add some basic statistics to the "extended explain" output to show % of operators accelerated by Comet.
Example output:
How are these changes tested?
Manually