Fix scalar partial ordering comparison#6999
Conversation
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Merging this PR will improve performance by 18.36%
Performance Changes
Comparing Footnotes
|
robert3005
left a comment
There was a problem hiding this comment.
we are missing some imports here
0f6645b to
24a99fc
Compare
|
lets run benchmarks first just to check for regressions |
Polar Signals Profiling ResultsLatest Run
Powered by Polar Signals Cloud |
Benchmarks: PolarSignals ProfilingVortex (geomean): 1.026x ➖ datafusion / vortex-file-compressed (1.026x ➖, 0↑ 0↓)
|
Benchmarks: TPC-H SF=1 on NVMEVerdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (1.009x ➖, 0↑ 0↓)
datafusion / vortex-compact (1.008x ➖, 0↑ 0↓)
datafusion / parquet (1.028x ➖, 1↑ 2↓)
datafusion / arrow (1.048x ➖, 1↑ 2↓)
duckdb / vortex-file-compressed (1.018x ➖, 0↑ 1↓)
duckdb / vortex-compact (1.022x ➖, 0↑ 0↓)
duckdb / parquet (1.018x ➖, 3↑ 3↓)
duckdb / duckdb (1.013x ➖, 1↑ 2↓)
Full attributed analysis
|
Benchmarks: FineWeb NVMeVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (1.159x ❌, 0↑ 7↓)
datafusion / vortex-compact (1.037x ➖, 0↑ 0↓)
datafusion / parquet (1.087x ➖, 0↑ 4↓)
duckdb / vortex-file-compressed (1.129x ❌, 0↑ 6↓)
duckdb / vortex-compact (1.076x ➖, 0↑ 1↓)
duckdb / parquet (1.066x ➖, 0↑ 1↓)
Full attributed analysis
|
Benchmarks: TPC-DS SF=1 on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (1.000x ➖, 1↑ 0↓)
datafusion / vortex-compact (0.990x ➖, 2↑ 0↓)
datafusion / parquet (0.985x ➖, 2↑ 0↓)
duckdb / vortex-file-compressed (1.002x ➖, 0↑ 1↓)
duckdb / vortex-compact (0.999x ➖, 1↑ 2↓)
duckdb / parquet (0.997x ➖, 1↑ 0↓)
duckdb / duckdb (0.987x ➖, 6↑ 0↓)
Full attributed analysis
|
Benchmarks: TPC-H SF=10 on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (0.941x ➖, 7↑ 0↓)
datafusion / vortex-compact (0.901x ➖, 13↑ 0↓)
datafusion / parquet (0.947x ➖, 5↑ 0↓)
datafusion / arrow (0.911x ➖, 9↑ 0↓)
duckdb / vortex-file-compressed (0.887x ✅, 14↑ 0↓)
duckdb / vortex-compact (1.000x ➖, 0↑ 0↓)
duckdb / parquet (0.997x ➖, 0↑ 0↓)
duckdb / duckdb (0.953x ➖, 0↑ 0↓)
Full attributed analysis
|
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
24a99fc to
d541313
Compare
Benchmarks: FineWeb S3Verdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (0.907x ➖, 0↑ 0↓)
datafusion / vortex-compact (1.078x ➖, 0↑ 3↓)
datafusion / parquet (0.963x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (0.993x ➖, 0↑ 0↓)
duckdb / vortex-compact (0.962x ➖, 0↑ 0↓)
duckdb / parquet (0.983x ➖, 0↑ 0↓)
Full attributed analysis
|
|
test failure fixed, but that shouldnt stop the benchmarks from running |
Benchmarks: TPC-H SF=1 on S3Verdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (0.713x ➖, 8↑ 0↓)
datafusion / vortex-compact (1.029x ➖, 1↑ 3↓)
datafusion / parquet (1.083x ➖, 0↑ 6↓)
duckdb / vortex-file-compressed (0.899x ➖, 1↑ 0↓)
duckdb / vortex-compact (0.970x ➖, 0↑ 0↓)
duckdb / parquet (1.011x ➖, 0↑ 1↓)
Full attributed analysis
|
Benchmarks: Random AccessVortex (geomean): 0.864x ✅ unknown / unknown (0.960x ➖, 10↑ 4↓)
|
Benchmarks: Statistical and Population GeneticsVerdict: No clear signal (low confidence) duckdb / vortex-file-compressed (1.041x ➖, 0↑ 1↓)
duckdb / vortex-compact (1.004x ➖, 0↑ 0↓)
duckdb / parquet (1.008x ➖, 0↑ 0↓)
Full attributed analysis
|
Benchmarks: Clickbench on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (0.911x ➖, 23↑ 1↓)
datafusion / parquet (0.907x ➖, 20↑ 0↓)
duckdb / vortex-file-compressed (0.921x ➖, 14↑ 0↓)
duckdb / parquet (0.942x ➖, 4↑ 0↓)
duckdb / duckdb (0.978x ➖, 6↑ 2↓)
Full attributed analysis
|
Benchmarks: TPC-H SF=10 on S3Verdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (1.023x ➖, 0↑ 1↓)
datafusion / vortex-compact (0.811x ➖, 4↑ 0↓)
datafusion / parquet (1.021x ➖, 0↑ 1↓)
duckdb / vortex-file-compressed (0.957x ➖, 0↑ 0↓)
duckdb / vortex-compact (0.901x ➖, 1↑ 0↓)
duckdb / parquet (0.967x ➖, 0↑ 0↓)
Full attributed analysis
|
Benchmarks: CompressionVortex (geomean): 0.994x ➖ unknown / unknown (1.054x ➖, 6↑ 54↓)
|
## Summary When we compare `Scalar`s, we have to be careful that we do not use `<` or `>` if we do not know that the scalars have different types. If they have different types, the result will not panic or raise an error, it will just return `false` for `>`. This changes the `scalar_cmp` function inside the comparison execution to use the `partial_cmp` method directly, where we map the optional result to an error and raise it. ## Testing Adds 2 regression tests, the first one is the one I stumbled on and the second is a more targeted one ## Unresolved Questions - Why is this function public? - Are there any other places that we do scalar comparison with the built-in `>` operators? If so, we may need to fix them. --------- Signed-off-by: Connor Tsui <connor.tsui20@gmail.com> Co-authored-by: Joe Isaacs <joe.isaacs@live.co.uk>
Summary
When we compare
Scalars, we have to be careful that we do not use<or>if we do not know that the scalars have different types. If they have different types, the result will not panic or raise an error, it will just returnfalsefor>.This changes the
scalar_cmpfunction inside the comparison execution to use thepartial_cmpmethod directly, where we map the optional result to an error and raise it.Testing
Adds 2 regression tests, the first one is the one I stumbled on and the second is a more targeted one
Unresolved Questions
>operators? If so, we may need to fix them.