You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
datafusion.execution.batch_size 8192 Default batch size while creating new batches, it's especially useful for buffer-in-memory batches since creating tiny batches would result in too much metadata memory consumption
318
319
datafusion.execution.coalesce_batches true When set to true, record batches will be examined between each operator and small batches will be coalesced into larger batches. This is helpful when there are highly selective filters or joins that could produce tiny output batches. The target batch size is determined by the configuration setting
319
320
datafusion.execution.collect_statistics false Should DataFusion collect statistics after listing files
321
+
datafusion.execution.enable_aggregation_blocked_groups true Should DataFusion use the the blocked approach to manage the groups values and their related states in accumulators. By default, the blocked approach will be used. And the blocked approach allocates capacity for the block based on a predefined block size firstly. When the block reaches its limit, we allocate a new block (also with the same predefined block size based capacity) instead of expanding the current one and copying the data. If setting this flag to `false`, will fall-back to use the single approach, values are managed within a single large block(can think of it as a Vec). As this block grows, it often triggers numerous copies, resulting in poor performance.
320
322
datafusion.execution.enable_recursive_ctes true Should DataFusion support recursive CTEs
321
323
datafusion.execution.enforce_batch_size_in_joins false Should DataFusion enforce batch size in joins or not. By default, DataFusion will not enforce batch size in joins. Enforcing batch size in joins can reduce memory usage when joining large tables with a highly-selective join filter, but is also slightly slower.
322
324
datafusion.execution.keep_partition_by_columns false Should DataFusion keep the columns used for partition_by in the output RecordBatches
0 commit comments