chore: Fix all sqllogictest dangling configs#21108
Conversation
There was a problem hiding this comment.
Pull request overview
This PR tightens datafusion-sqllogictest hygiene by making dangling DataFusion session configuration changes a hard test failure, and updates existing .slt files to restore any modified configs so each file leaves the runner in a clean state.
Changes:
- Update the SLT runner to collect “config drift” errors per file and fail the test run (instead of only printing warnings on drop).
- Add/adjust config reset statements across existing
.sltfiles to eliminate current dangling-config violations. - Add a unit test to ensure modified configs are detected by the new validation logic.
Reviewed changes
Copilot reviewed 55 out of 56 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| datafusion/sqllogictest/test_files/window_topk_pushdown.slt | Adds explicit resets for enable_topk_repartition and documents/sets target_partitions baseline. |
| datafusion/sqllogictest/test_files/window.slt | Adds cleanup reset statements for several execution/optimizer settings at end of file section. |
| datafusion/sqllogictest/test_files/update.slt | Resets optimizer.max_passes during cleanup. |
| datafusion/sqllogictest/test_files/union.slt | Resets execution.batch_size at end to avoid config drift. |
| datafusion/sqllogictest/test_files/subquery.slt | Resets execution.batch_size and explain.logical_plan_only during cleanup. |
| datafusion/sqllogictest/test_files/struct.slt | Resets SQL dialect setting at end of test. |
| datafusion/sqllogictest/test_files/string/string_view.slt | Resets explain.logical_plan_only at end of test. |
| datafusion/sqllogictest/test_files/sort_pushdown.slt | Restores runner baseline partitions and resets parquet/sort-pushdown related configs. |
| datafusion/sqllogictest/test_files/sort_merge_join.slt | Resets optimizer.prefer_hash_join at end of test. |
| datafusion/sqllogictest/test_files/set_variable.slt | Adds cleanup resets for catalog and execution flags altered during the file. |
| datafusion/sqllogictest/test_files/select.slt | Restores runner baseline partitions and resets optimizer.max_passes. |
| datafusion/sqllogictest/test_files/repartition_subset_satisfaction.slt | Restores runner baseline partitions and resets repartition/subset settings before cleanup drops. |
| datafusion/sqllogictest/test_files/repartition_scan.slt | Adds cleanup resets for catalog creation and repartition file sizing. |
| datafusion/sqllogictest/test_files/repartition.slt | Restores runner baseline partitions and resets round-robin repartition setting. |
| datafusion/sqllogictest/test_files/push_down_filter_unnest.slt | Resets explain.physical_plan_only at end of test. |
| datafusion/sqllogictest/test_files/push_down_filter_regression.slt | Restores runner baseline partitions and resets explain/parquet/dynamic-filter configs. |
| datafusion/sqllogictest/test_files/push_down_filter_parquet.slt | Resets explain.physical_plan_only and parquet filter pushdown. |
| datafusion/sqllogictest/test_files/projection_pushdown.slt | Adds “config reset” blocks restoring runner baseline partitions for multi-partition sections. |
| datafusion/sqllogictest/test_files/preserve_file_partitioning.slt | Restores runner baseline partitions and resets preserve-file-partitions and join thresholds. |
| datafusion/sqllogictest/test_files/predicates.slt | Resets catalog.information_schema at end. |
| datafusion/sqllogictest/test_files/pipe_operator.slt | Resets SQL dialect setting at end. |
| datafusion/sqllogictest/test_files/parquet_statistics.slt | Resets statistics/explain flags after tests. |
| datafusion/sqllogictest/test_files/parquet_sorted_statistics.slt | Restores runner baseline partitions and resets collect/split-by-statistics configs. |
| datafusion/sqllogictest/test_files/parquet.slt | Adds cleanup resets for listing/parquet/runtime/catalog settings and restores baseline partitions. |
| datafusion/sqllogictest/test_files/order.slt | Adds cleanup resets for catalog/optimizer/explain/sql_parser configs and restores baseline partitions. |
| datafusion/sqllogictest/test_files/optimizer_group_by_constant.slt | Restores runner baseline partitions and resets explain.logical_plan_only. |
| datafusion/sqllogictest/test_files/listing_table_statistics.slt | Converts cleanup from set ... = false to reset ... for stats-related settings. |
| datafusion/sqllogictest/test_files/limit_single_row_batches.slt | Resets batch size and restores runner baseline partitions during cleanup. |
| datafusion/sqllogictest/test_files/limit_pruning.slt | Resets parquet pushdown filters at end. |
| datafusion/sqllogictest/test_files/limit.slt | Restores runner baseline partitions and resets explain/repartition-file-min-size settings. |
| datafusion/sqllogictest/test_files/joins.slt | Adds cleanup resets for batch size, explain flags, and join/sort optimizer settings; restores baseline partitions. |
| datafusion/sqllogictest/test_files/join_only.slt | Adds cleanup reset for repartition-joins after including join test part. |
| datafusion/sqllogictest/test_files/join_limit_pushdown.slt | Restores runner baseline partitions and resets explain/hash-join preference. |
| datafusion/sqllogictest/test_files/insert_to_external.slt | Restores runner baseline partitions at end. |
| datafusion/sqllogictest/test_files/insert.slt | Restores runner baseline partitions at end. |
| datafusion/sqllogictest/test_files/information_schema_table_types.slt | Resets catalog.information_schema at end. |
| datafusion/sqllogictest/test_files/information_schema_multiple_catalogs.slt | Resets default catalog/schema and information_schema settings. |
| datafusion/sqllogictest/test_files/information_schema_columns.slt | Resets default catalog/schema and information_schema settings. |
| datafusion/sqllogictest/test_files/information_schema.slt | Adds cleanup resets for information_schema, partitions baseline, planning concurrency, and parquet created_by. |
| datafusion/sqllogictest/test_files/ident_normalization.slt | Resets catalog.information_schema after enabling ident normalization. |
| datafusion/sqllogictest/test_files/group_by.slt | Resets batch size and restores runner baseline partitions. |
| datafusion/sqllogictest/test_files/floor_preimage.slt | Resets explain.logical_plan_only at end. |
| datafusion/sqllogictest/test_files/explain_tree.slt | Resets explain format and tree render width at end. |
| datafusion/sqllogictest/test_files/explain.slt | Converts cleanup from set collect_statistics=false to reset collect_statistics. |
| datafusion/sqllogictest/test_files/dynamic_filter_pushdown_config.slt | Adds a parquet max row group size reset and updates cleanup comment block. |
| datafusion/sqllogictest/test_files/distinct_on.slt | Resets explain.logical_plan_only at end. |
| datafusion/sqllogictest/test_files/delete.slt | Resets optimizer.max_passes at end. |
| datafusion/sqllogictest/test_files/datetime/current_time_timezone.slt | Resets execution.time_zone at end. |
| datafusion/sqllogictest/test_files/datetime/current_date_timezone.slt | Resets execution.time_zone at end. |
| datafusion/sqllogictest/test_files/cte.slt | Resets batch size, recursive CTE enablement, and ident normalization setting. |
| datafusion/sqllogictest/test_files/arrow_files.slt | Resets sql_parser.map_string_types_to_utf8view at end. |
| datafusion/sqllogictest/test_files/aggregate_skip_partial.slt | Adds multiple cleanup blocks to reset batch size, partitions baseline, and skip-partial-aggregation knobs. |
| datafusion/sqllogictest/test_files/aggregate_repartition.slt | Restores runner baseline partitions at end. |
| datafusion/sqllogictest/test_files/aggregate.slt | Adds explicit baseline partition restoration before final cleanup. |
| datafusion/sqllogictest/src/engines/datafusion_engine/runner.rs | Replaces Drop-time warning with explicit validation, collects errors for the harness, and adds a unit test. |
| datafusion/sqllogictest/bin/sqllogictests.rs | Aggregates config-drift errors per file, calls shutdown_async, and fails the run when drift is detected. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
alamb
left a comment
There was a problem hiding this comment.
Looks good to me -- thank you @2010YOUY01
I left some suggestions on comments and maybe some cleanup but nothing that is required in my mind
I tested by reverting one of the fixes
diff --git a/datafusion/sqllogictest/test_files/arrow_files.slt b/datafusion/sqllogictest/test_files/arrow_files.slt
index 94e150738..c3bc967ba 100644
--- a/datafusion/sqllogictest/test_files/arrow_files.slt
+++ b/datafusion/sqllogictest/test_files/arrow_files.slt
@@ -388,7 +388,3 @@ physical_plan DataSourceExec: file_groups={1 group: [[WORKSPACE_ROOT/datafusion/
# querying corrupted stream format should result in error
query error DataFusion error: Arrow error: Parser error: Unsupported message header type in IPC stream: 'NONE'
SELECT * FROM arrow_stream_corrupted_metadata_length
-
-# Config reset
-statement ok
-RESET datafusion.sql_parser.map_string_types_to_utf8view;and running
cargo test --profile=ci --test sqllogictestsWhich errors (as expected) with a useful message:
Completed 414 test files in 4 seconds External error: Other Error: SLT file arrow_files.slt left modified configuration
datafusion.sql_parser.map_string_types_to_utf8view: true -> false
Error: Execution("1 failures")
error: test failed, to rerun pass `-p datafusion-sqllogictest --test sqllogictests`
Caused by:
process didn't exit successfully: `/Users/andrewlamb/Software/datafusion2/target/ci/deps/sqllogictests-7e2a8cc6115b158a` (exit status: 1)
|
|
||
| // If there was no correctness error, check that the config is unchanged. | ||
| runner.shutdown_async().await; | ||
| take_config_change_result(&config_change_errors) |
There was a problem hiding this comment.
Why does this need to be a mutex? Is it because there is no way to get the errros back from the runner?
Maybe we could add some documentation about what is going on here?
There was a problem hiding this comment.
Yes, it's due sqllogictest is an external dependency, and it's not able to directly access the inner DataFusion runner, and I can't find an easier way to do it.
Added comments to explain.
| fn take_config_change_result( | ||
| config_change_errors: &DataFusionConfigChangeErrors, | ||
| ) -> Result<()> { | ||
| let errors = std::mem::take(&mut *config_change_errors.lock().unwrap()); |
There was a problem hiding this comment.
since this just copies the strings anyways, it seems like the take is somewhat unecessary. Maybe you could just get a lock rather than trying to take it?
There was a problem hiding this comment.
Good point, simplified.
|
I re-ran this locally and everything looks very nice -- thank you |
Which issue does this PR close?
Follow up to #20838
Rationale for this change
in #20838 it adds a way to check and warn about dangling config settings, it has detected several existing violations but has not fixed them:
This PR does
I tested to add a config statement at the end of one
sltfile, and run the test, it runs and failed as expectedWhat changes are included in this PR?
Are these changes tested?
UT
Are there any user-facing changes?
no