Add UDF's for sketch support#6349
Merged
alexanderbianchi merged 9 commits intoquickwit-oss:mainfrom Apr 29, 2026
Merged
Conversation
mattmkim
approved these changes
Apr 29, 2026
# Conflicts: # quickwit/quickwit-datafusion/src/sources/metrics/mod.rs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Adds DataFusion query support for DDSketch-backed parquet indexes.
This PR adds:
dd_sketch(keys, counts, count, min, max, flags), a decomposable aggregate UDF that merges sparse DDSketch bucket arrays and scalar bounds into a merged sketch state.dd_quantile(sketch, q), a scalar UDF that computes a percentile from the merged sketch by rank-scanning merged buckets and clamping the mapped value to the sketch min/max.quickwit-df-core, so runtime plugins can register both scalar UDFs and UDAFs.STORED AS sketchesand automaticsketches-*index resolution inquickwit-datafusion.list_sketch_splits, while keeping existing metrics split routing unchanged.The intended query shape is:
dd_sketchis the merge step.dd_quantileis only the final projection over the merged sketch. For now, onlyflags = 0is accepted; non-zero flags are rejected rather than silently decoded with the wrong layout.How was this PR tested?
Follow-up work intentionally not in this PR
This PR is scoped to making the sketch query path available and validating the UDF/UDAF contract. The items below are valuable, but they are larger planning/runtime/catalog changes and should be handled as follow-up PRs.
Implement a grouped accumulator for
dd_sketch.state()/merge_batch()behavior.Support non-zero sketch flags and reference DDSketch layout decoding.
Add production DataFusion memory policy and query-level memory tracking.
Unify predicate lowering for split pruning and parquet pruning.
Add a runtime-scoped parquet index catalog with split planning bounds and parquet metadata caching.
Make DDL aliases and Substrait routing preserve authoritative table metadata.
Advertise scan ordering only when selected split metadata proves it.