Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -200,6 +200,7 @@ rpath = false
strip = false # Retain debug info for flamegraphs

[profile.ci]
debug = false
inherits = "dev"
incremental = false

Expand Down
30 changes: 30 additions & 0 deletions datafusion/common/src/config.rs
Original file line number Diff line number Diff line change
Expand Up @@ -842,6 +842,36 @@ config_namespace! {
/// will be collected into a single partition
pub hash_join_single_partition_threshold_rows: usize, default = 1024 * 128

/// Maximum size in bytes for the build side of a hash join to be pushed down as an InList expression for dynamic filtering.
/// Build sides larger than this will use hash table lookups instead.
/// Set to 0 to always use hash table lookups.
///
/// InList pushdown can be more efficient for small build sides because it can result in better
/// statistics pruning as well as use any bloom filters present on the scan side.
/// InList expressions are also more transparent and easier to serialize over the network in distributed uses of DataFusion.
/// On the other hand InList pushdown requires making a copy of the data and thus adds some overhead to the build side and uses more memory.
///
/// This setting is per-partition, so we may end up using `hash_join_inlist_pushdown_max_size` * `target_partitions` memory.
///
/// The default is 128kB per partition.
/// This should allow point lookup joins (e.g. joining on a unique primary key) to use InList pushdown in most cases
/// but avoids excessive memory usage or overhead for larger joins.
pub hash_join_inlist_pushdown_max_size: usize, default = 128 * 1024

/// Maximum number of distinct values (rows) in the build side of a hash join to be pushed down as an InList expression for dynamic filtering.
/// Build sides with more rows than this will use hash table lookups instead.
/// Set to 0 to always use hash table lookups.
///
/// This provides an additional limit beyond `hash_join_inlist_pushdown_max_size` to prevent
/// very large IN lists that might not provide much benefit over hash table lookups.
///
/// This uses the deduplicated row count once the build side has been evaluated.
///
/// The default is 150 values per partition.
/// This is inspired by Trino's `max-filter-keys-per-column` setting.
/// See: <https://trino.io/docs/current/admin/dynamic-filtering.html#dynamic-filter-collection-thresholds>
pub hash_join_inlist_pushdown_max_distinct_values: usize, default = 150

/// The default filter selectivity used by Filter Statistics
/// when an exact selectivity cannot be determined. Valid values are
/// between 0 (no selectivity) and 100 (all rows are selected).
Expand Down
Loading
Loading