perf: zero copy validity export to duckdb#7371
perf: zero copy validity export to duckdb#7371joseph-isaacs wants to merge 16 commits intodevelopfrom
Conversation
Polar Signals Profiling ResultsLatest Run
Previous Runs (4)
Powered by Polar Signals Cloud |
Benchmarks: PolarSignals ProfilingVortex (geomean): 0.993x ➖ datafusion / vortex-file-compressed (0.993x ➖, 3↑ 3↓)
|
File Sizes: PolarSignals ProfilingNo file size changes detected. |
Benchmarks: FineWeb NVMeVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (0.951x ➖, 1↑ 0↓)
datafusion / vortex-compact (1.199x ❌, 1↑ 7↓)
datafusion / parquet (0.991x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (0.888x ✅, 4↑ 0↓)
duckdb / vortex-compact (0.911x ➖, 2↑ 0↓)
duckdb / parquet (0.938x ➖, 2↑ 0↓)
Full attributed analysis
|
File Sizes: FineWeb NVMeNo file size changes detected. |
Benchmarks: TPC-H SF=1 on NVMEVerdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (0.923x ➖, 7↑ 1↓)
datafusion / vortex-compact (0.980x ➖, 0↑ 0↓)
datafusion / parquet (1.025x ➖, 0↑ 2↓)
datafusion / arrow (0.973x ➖, 1↑ 0↓)
duckdb / vortex-file-compressed (1.018x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.036x ➖, 0↑ 1↓)
duckdb / parquet (1.004x ➖, 1↑ 1↓)
duckdb / duckdb (1.011x ➖, 0↑ 0↓)
Full attributed analysis
|
File Sizes: TPC-H SF=1 on NVMENo file size changes detected. |
Benchmarks: TPC-DS SF=1 on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (0.942x ➖, 22↑ 0↓)
datafusion / vortex-compact (1.076x ➖, 3↑ 42↓)
datafusion / parquet (1.001x ➖, 1↑ 1↓)
duckdb / vortex-file-compressed (1.014x ➖, 0↑ 4↓)
duckdb / vortex-compact (1.005x ➖, 0↑ 2↓)
duckdb / parquet (1.006x ➖, 0↑ 1↓)
duckdb / duckdb (1.002x ➖, 2↑ 4↓)
Full attributed analysis
|
File Sizes: TPC-DS SF=1 on NVMENo file size changes detected. |
Benchmarks: TPC-H SF=1 on S3Verdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (0.983x ➖, 0↑ 0↓)
datafusion / vortex-compact (1.212x ➖, 0↑ 5↓)
datafusion / parquet (0.975x ➖, 1↑ 0↓)
duckdb / vortex-file-compressed (0.949x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.005x ➖, 0↑ 0↓)
duckdb / parquet (1.027x ➖, 0↑ 0↓)
Full attributed analysis
|
Benchmarks: TPC-H SF=10 on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (0.965x ➖, 1↑ 0↓)
datafusion / vortex-compact (0.987x ➖, 0↑ 0↓)
datafusion / parquet (0.994x ➖, 0↑ 0↓)
datafusion / arrow (0.981x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (0.994x ➖, 0↑ 0↓)
duckdb / vortex-compact (0.996x ➖, 0↑ 0↓)
duckdb / parquet (1.003x ➖, 0↑ 0↓)
duckdb / duckdb (0.997x ➖, 0↑ 0↓)
Full attributed analysis
|
File Sizes: TPC-H SF=10 on NVMENo file size changes detected. |
Benchmarks: FineWeb S3Verdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (0.919x ➖, 1↑ 0↓)
datafusion / vortex-compact (1.115x ➖, 0↑ 3↓)
datafusion / parquet (0.940x ➖, 1↑ 0↓)
duckdb / vortex-file-compressed (0.982x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.048x ➖, 0↑ 0↓)
duckdb / parquet (1.010x ➖, 0↑ 0↓)
Full attributed analysis
|
Benchmarks: Statistical and Population GeneticsVerdict: No clear signal (low confidence) duckdb / vortex-file-compressed (1.037x ➖, 0↑ 1↓)
duckdb / vortex-compact (1.000x ➖, 0↑ 0↓)
duckdb / parquet (1.000x ➖, 0↑ 0↓)
Full attributed analysis
|
File Sizes: Statistical and Population GeneticsNo file size changes detected. |
Benchmarks: TPC-H SF=10 on S3Verdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (0.880x ➖, 1↑ 1↓)
datafusion / vortex-compact (1.052x ➖, 0↑ 0↓)
datafusion / parquet (0.912x ➖, 1↑ 0↓)
duckdb / vortex-file-compressed (1.019x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.037x ➖, 0↑ 0↓)
duckdb / parquet (0.994x ➖, 0↑ 0↓)
Full attributed analysis
|
Benchmarks: Clickbench on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (0.868x ✅, 14↑ 1↓)
datafusion / parquet (0.995x ➖, 1↑ 0↓)
duckdb / vortex-file-compressed (0.958x ➖, 3↑ 0↓)
duckdb / parquet (0.999x ➖, 0↑ 0↓)
duckdb / duckdb (0.955x ➖, 5↑ 0↓)
Full attributed analysis
|
File Sizes: Clickbench on NVMEFile Size Changes (3 files changed, -0.0% overall, 0↑ 3↓)
Totals:
|
Benchmarks: CompressionVortex (geomean): 1.009x ➖ unknown / unknown (0.993x ➖, 1↑ 2↓)
|
Benchmarks: Random AccessVortex (geomean): 1.295x ❌ unknown / unknown (1.173x ❌, 2↑ 47↓)
|
|
|
||
| // Set the validity pointer for the vector to external data, and store the buffer in auxiliary | ||
| // to keep it alive. This enables zero-copy export of validity masks. | ||
| void duckdb_vx_vector_set_validity_data(duckdb_vector ffi_vector, void *validity_ptr, idx_t capacity, |
There was a problem hiding this comment.
If validity_ptr points to buffer, just pass the buffer
There was a problem hiding this comment.
validity_ptr is not the buffer is something a few levels of ptr deep. We could fix, but would also want do change Primitive/Decimal Export at once
| // Same hack for ValidityMask: access protected fields via inheritance. | ||
| class ExternalValidityMask : public ValidityMask { | ||
| public: | ||
| inline void SetExternal(validity_t *ptr, idx_t cap, |
There was a problem hiding this comment.
Same here, pass just the buffer and derive ptr from it
| *ext_buf, reinterpret_cast<TemplatedValidityData<validity_t> *>(ext_buf->get())); | ||
|
|
||
| // Set validity_mask, capacity, and validity_data (which keeps the buffer alive). | ||
| ext_validity->SetExternal(reinterpret_cast<validity_t *>(validity_ptr), capacity, |
There was a problem hiding this comment.
Technically this will slice the class to base's validity, but as derived class doesn't have any members, it's fine. Worth adding a comment
…ty-export Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk> # Conflicts: # vortex-duckdb/src/exporter/constant.rs # vortex-duckdb/src/exporter/vector.rs
…ty-export Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Zero copy export validity similarly to how we export data for Primitive or Decimal.