Skip to content
Closed
Show file tree
Hide file tree
Changes from 8 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
ca01860
add crypto dependencies and implement crypto_utils with tests
Shekharrajak Jan 17, 2026
7748b8a
implement ECB, CBC, and GCM cipher modes with tests
Shekharrajak Jan 17, 2026
20a7bbb
implement aes_encrypt with vectorized processing and tests
Shekharrajak Jan 17, 2026
92ce310
register aes_encrypt function in scalar function registry
Shekharrajak Jan 17, 2026
8147ebc
add scala serde layer for aes_encrypt expression
Shekharrajak Jan 17, 2026
d4d53d7
add encryption benchmark for aes_encrypt performance testing
Shekharrajak Jan 17, 2026
6c658ff
fix aes_encrypt serde to handle StaticInvoke properly
Shekharrajak Jan 18, 2026
6a9d2f8
fix: pass return type in scalar function serialization
Shekharrajak Jan 18, 2026
bd897f9
test: add CometStaticInvokeSuite for aes_encrypt
Shekharrajak Jan 18, 2026
b0e52a8
fix: address clippy warnings in encryption code
Shekharrajak Jan 18, 2026
ec407ff
Delete benchmarks/CometDPPBenchmark-jdk17-results.txt
Shekharrajak Jan 19, 2026
52dea17
fix: address clippy warnings in encryption code
Shekharrajak Jan 18, 2026
da932ca
fix: remove invalid AesEncrypt.enabled config entry
Shekharrajak Jan 20, 2026
f3ca5f2
fix: update auto-generated compatibility docs formatting
Shekharrajak Jan 20, 2026
86a2841
Merge upstream/main into feature/aes-encrypt-support
Shekharrajak Jan 29, 2026
2401225
docs: update compatibility and config tables
Shekharrajak Jan 29, 2026
65c5a61
Generate and format compatibility docs after merge
Shekharrajak Jan 31, 2026
538068a
Fix Miri undefined behavior in CBC/ECB encryption
Shekharrajak Feb 6, 2026
3ff9e6b
Handle Array inputs in aes_encrypt scalar path
Shekharrajak Feb 6, 2026
997315c
Remove debug output from scalar function fallback
Shekharrajak Feb 6, 2026
071ad2a
Update encryption tests to use deterministic modes
Shekharrajak Feb 6, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 64 additions & 0 deletions benchmarks/CometDPPBenchmark-jdk17-results.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@

================================================================================
DPP Row Reduction Analysis (selectivity=0.01)
Comment thread
Shekharrajak marked this conversation as resolved.
Outdated
================================================================================
Fact rows: 5242880 | Dim rows: 10000 (filtered to 100)
Expected with DPP: ~52428 rows

Implementation numOutputRows Reduction Factor
--------------------------------------------------------------------------------
Spark (baseline) 15,781,140 1.0x
Comet (auto scan) 5,295,380 3.0x
Comet (native_datafusion + DPP) 52,500 301x
================================================================================

Key Metrics:
- I/O Reduction: 301x fewer rows scanned with DPP
- Row Reduction: 15728640 fewer rows processed
- Selectivity Impact: 1.0%% of data actually needed
================================================================================


================================================================================
DPP Row Reduction Analysis (selectivity=0.1)
================================================================================
Fact rows: 5242880 | Dim rows: 10000 (filtered to 1000)
Expected with DPP: ~524288 rows

Implementation numOutputRows Reduction Factor
--------------------------------------------------------------------------------
Spark (baseline) 16,253,640 1.0x
Comet (auto scan) 5,767,880 2.8x
Comet (native_datafusion + DPP) 525,000 31x
================================================================================

Key Metrics:
- I/O Reduction: 31x fewer rows scanned with DPP
- Row Reduction: 15728640 fewer rows processed
- Selectivity Impact: 10.0%% of data actually needed
================================================================================

OpenJDK 64-Bit Server VM 17.0.13+11 on Mac OS X 26.1
Apple M4 Max
DPP Join (fact=5242880, dim=10000, selectivity=0.01): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------------------
Spark (baseline) 64 73 11 81.5 12.3 1.0X
Comet (auto scan) 64 71 7 81.4 12.3 1.0X
Comet (native_datafusion + DPP) 55 59 3 95.1 10.5 1.2X

OpenJDK 64-Bit Server VM 17.0.13+11 on Mac OS X 26.1
Apple M4 Max
DPP Join (fact=5242880, dim=10000, selectivity=0.1): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
-----------------------------------------------------------------------------------------------------------------------------------
Spark (baseline) 65 69 4 80.7 12.4 1.0X
Comet (auto scan) 62 66 3 83.9 11.9 1.0X
Comet (native_datafusion + DPP) 56 59 3 94.4 10.6 1.2X

OpenJDK 64-Bit Server VM 17.0.13+11 on Mac OS X 26.1
Apple M4 Max
DPP Join (fact=5242880, dim=10000, selectivity=0.5): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
-----------------------------------------------------------------------------------------------------------------------------------
Spark (baseline) 75 78 3 69.7 14.3 1.0X
Comet (auto scan) 73 77 3 72.2 13.8 1.0X
Comet (native_datafusion + DPP) 66 71 5 79.4 12.6 1.1X

9 changes: 3 additions & 6 deletions docs/source/user-guide/latest/compatibility.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,15 +105,14 @@ Cast operations in Comet fall into three levels of support:
<!-- prettier-ignore-end -->

**Notes:**

- **decimal -> string**: There can be formatting differences in some case due to Spark using scientific notation where Comet does not
- **double -> decimal**: There can be rounding differences
- **double -> string**: There can be differences in precision. For example, the input "1.4E-45" will produce 1.0E-45 instead of 1.4E-45
- **float -> decimal**: There can be rounding differences
- **float -> string**: There can be differences in precision. For example, the input "1.4E-45" will produce 1.0E-45 instead of 1.4E-45
- **string -> date**: Only supports years between 262143 BC and 262142 AD
- **string -> decimal**: Does not support fullwidth unicode digits (e.g \\uFF10)
or strings containing null bytes (e.g \\u0000)
or strings containing null bytes (e.g \\u0000)
- **string -> timestamp**: Not all valid formats are supported
<!--END:CAST_LEGACY_TABLE-->

Expand All @@ -140,15 +139,14 @@ Cast operations in Comet fall into three levels of support:
<!-- prettier-ignore-end -->

**Notes:**

- **decimal -> string**: There can be formatting differences in some case due to Spark using scientific notation where Comet does not
- **double -> decimal**: There can be rounding differences
- **double -> string**: There can be differences in precision. For example, the input "1.4E-45" will produce 1.0E-45 instead of 1.4E-45
- **float -> decimal**: There can be rounding differences
- **float -> string**: There can be differences in precision. For example, the input "1.4E-45" will produce 1.0E-45 instead of 1.4E-45
- **string -> date**: Only supports years between 262143 BC and 262142 AD
- **string -> decimal**: Does not support fullwidth unicode digits (e.g \\uFF10)
or strings containing null bytes (e.g \\u0000)
or strings containing null bytes (e.g \\u0000)
- **string -> timestamp**: Not all valid formats are supported
<!--END:CAST_TRY_TABLE-->

Expand All @@ -175,15 +173,14 @@ Cast operations in Comet fall into three levels of support:
<!-- prettier-ignore-end -->

**Notes:**

- **decimal -> string**: There can be formatting differences in some case due to Spark using scientific notation where Comet does not
- **double -> decimal**: There can be rounding differences
- **double -> string**: There can be differences in precision. For example, the input "1.4E-45" will produce 1.0E-45 instead of 1.4E-45
- **float -> decimal**: There can be rounding differences
- **float -> string**: There can be differences in precision. For example, the input "1.4E-45" will produce 1.0E-45 instead of 1.4E-45
- **string -> date**: Only supports years between 262143 BC and 262142 AD
- **string -> decimal**: Does not support fullwidth unicode digits (e.g \\uFF10)
or strings containing null bytes (e.g \\u0000)
or strings containing null bytes (e.g \\u0000)
- **string -> timestamp**: ANSI mode not supported
<!--END:CAST_ANSI_TABLE-->

Expand Down
1 change: 1 addition & 0 deletions docs/source/user-guide/latest/configs.md
Original file line number Diff line number Diff line change
Expand Up @@ -189,6 +189,7 @@ These settings can be used to determine which parts of the plan are accelerated
| `spark.comet.expression.Abs.enabled` | Enable Comet acceleration for `Abs` | true |
| `spark.comet.expression.Acos.enabled` | Enable Comet acceleration for `Acos` | true |
| `spark.comet.expression.Add.enabled` | Enable Comet acceleration for `Add` | true |
| `spark.comet.expression.AesEncrypt.enabled` | Enable Comet acceleration for `AesEncrypt` | true |
| `spark.comet.expression.Alias.enabled` | Enable Comet acceleration for `Alias` | true |
| `spark.comet.expression.And.enabled` | Enable Comet acceleration for `And` | true |
| `spark.comet.expression.ArrayAppend.enabled` | Enable Comet acceleration for `ArrayAppend` | true |
Expand Down
135 changes: 135 additions & 0 deletions native/Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 5 additions & 0 deletions native/spark-expr/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,11 @@ twox-hash = "2.1.2"
rand = { workspace = true }
hex = "0.4.3"
base64 = "0.22.1"
aes = "0.8"
aes-gcm = "0.10"
cbc = { version = "0.1", features = ["alloc"] }
cipher = "0.4"
ecb = "0.1"

[dev-dependencies]
arrow = {workspace = true}
Expand Down
5 changes: 5 additions & 0 deletions native/spark-expr/src/comet_scalar_funcs.rs
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
// specific language governing permissions and limitations
// under the License.

use crate::encryption_funcs::spark_aes_encrypt;
use crate::hash_funcs::*;
use crate::math_funcs::abs::abs;
use crate::math_funcs::checked_arithmetic::{checked_add, checked_div, checked_mul, checked_sub};
Expand Down Expand Up @@ -165,6 +166,10 @@ pub fn create_comet_physical_fun_with_eval_mode(
let func = Arc::new(spark_xxhash64);
make_comet_scalar_udf!("xxhash64", func, without data_type)
}
"aes_encrypt" => {
let func = Arc::new(spark_aes_encrypt);
make_comet_scalar_udf!("aes_encrypt", func, without data_type)
}
"isnan" => {
let func = Arc::new(spark_isnan);
make_comet_scalar_udf!("isnan", func, without data_type)
Expand Down
Loading