Check sqllogictests for any dangling config settings(#17914) by cj-zhukov · Pull Request #20474 · apache/datafusion

cj-zhukov · 2026-02-22T07:59:19Z

Which issue does this PR close?

Closes #Check sqllogictests for any dangling config settings #17914.

Rationale for this change

This PR introduces a validation script to prevent dangling configuration settings in sqllogictest (.slt) files.

What changes are included in this PR?

A new shell script ci/scripts/check_slt_configs.sh that scans all .slt files and detects DataFusion configuration options that are set to: set datafusion.<config> = true but are never reset back to: set datafusion.<config> = false
Updates to existing .slt files where dangling boolean configs were found.
All identified configs have been explicitly reset to false to ensure files restore the default session state.
A new CI job: sqllogictest-config-check that runs the validation script and fails the workflow if any abandoned boolean configuration is detected.

Are these changes tested?

Yes, these changes are tested.

Are there any user-facing changes?

No, there are no user-facing changes.

cj-zhukov · 2026-02-22T08:12:01Z

High-Level Overview

This PR introduces a validation script to prevent dangling configuration settings in sqllogictest (.slt) files.

This PR Adds:

A new shell script ci/scripts/check_slt_configs.sh that scans all .slt files and detects DataFusion configuration options that are set to: set datafusion.<config> = true but are never reset back to: set datafusion.<config> = false
Updates to existing .slt files where dangling boolean configs were found.
All identified configs have been explicitly reset to false to ensure files restore the default session state.
A new CI job: sqllogictest-config-check that runs the validation script and fails the workflow if any abandoned boolean configuration is detected.

Current Limitations:

Only checks boolean flags set to true and id does not validate non-boolean configuration changes (e.g. default_catalog, default_schema, etc.)
Does not respect ordering: It only verifies that a matching = false exists somewhere in the file and It does not ensure the reset occurs after the corresponding = true.
Does not validate toggle correctness: it does not verify correct pairing, nesting, or the number of enable/disable occurrence

Follow-Up Improvements:

Parse (.slt) files properly instead of relying on regex/grep.
Track configuration state transitions in order.
Validate correct pairing and restoration semantics.
Support non-boolean configuration values.
This could be implemented in Rust using a proper parser (e.g., nom), similar to what is already used in datafusion-examples.
I’m happy to work on a follow-up PR to evolve this validation into a more robust and structured implementation.

cj-zhukov · 2026-02-22T09:48:54Z

@Jefffrey Since you originally opened this issue, could you please take a look at this PR?

martin-g · 2026-02-23T12:40:30Z

ci/scripts/check_slt_configs.sh

+# Any configuration set with:
+#     set datafusion.<config> = true
+# must be reset in the same file with:
+#     set datafusion.<config> = false


What if the setting's default is true and an .slt test needed to set it to false but forgot to unset it ?
It would be better to use the reset datafusion.***; command instead

Good point - using RESET is semantically more correct since it restores the default value rather than assuming it is false. I'm going to update the script to accept RESET datafusion.<config> as a valid restoration mechanism.

martin-g · 2026-02-23T12:40:59Z

ci/scripts/check_slt_configs.sh

+    return
+  fi
+
+  echo ""


This looks strange.
Why not log the error from line 74 here and exit ?

I agree with you - let's do it

martin-g · 2026-02-23T13:29:43Z

ci/scripts/check_slt_configs.sh

+  # Process each match line-by-line
+  while IFS= read -r match; do
+    line_number=$(echo "$match" | cut -d: -f1)
+    line_content=$(echo "$match" | cut -d: -f2-)


nit: You could avoid starting a new process by using Bash-isms like:

line_number=${match%%:*} line_content=${match#*:}

Good suggestion

martin-g · 2026-02-23T13:31:46Z

datafusion/sqllogictest/test_files/push_down_filter.slt

+set datafusion.execution.collect_statistics = false;
+
+statement ok
+set datafusion.execution.parquet.pushdown_filters = false;


duplicate of line 749

Thank you for catching this - I missed that duplicate configuration in the push_down_filter.slt

martin-g · 2026-02-23T13:33:56Z

datafusion/sqllogictest/test_files/set_variable.slt

 datafusion.runtime.temp_directory
+
+statement ok
+set datafusion.catalog.information_schema = false


Suggested change

set datafusion.catalog.information_schema = false

set datafusion.catalog.information_schema = false;

I noticed that other SET statements in this file do not use semicolons.
Should we standardize on including ;, or keep the existing style for consistency?

martin-g · 2026-02-23T13:37:51Z

.github/workflows/rust.yml

+    needs: linux-build-lib
+    runs-on: ubuntu-latest
+    container:
+      image: amd64/rust


Any reason to use a container ?
The CI job just runs a Bash shell script

That's correct - no reason for a container. I'll run it directly on ubuntu-latest instead of using the Rust container so it runs earlier and fails fast.

martin-g · 2026-02-23T13:39:08Z

ci/scripts/check_slt_configs.sh

+  fi
+
+  matches=$(grep -En \
+    'set[[:space:]]+datafusion\.[a-zA-Z0-9_.]+[[:space:]]*=[[:space:]]*true' \


set could also be SET
See https://github.com/cj-zhukov/datafusion/blob/fe921dca542e84a48bc02e3c409ce7ac45686283/datafusion/sqllogictest/test_files/set_variable.slt#L23 for such example

Good catch - I missed that SET can appear in uppercase.
I’ll update the script to perform case-insensitive matching so both set and SET are handled correctly.

martin-g · 2026-02-23T13:41:31Z

datafusion/sqllogictest/test_files/information_schema_multiple_catalogs.slt

 drop table t3

 statement ok
 set datafusion.catalog.default_catalog = my_catalog;


The catalog and the schema below are never reset.
This is an example of non-boolean settings.

Thank you for pointing this out. You’re right - this is currently a limitation. The script only checks boolean settings enabled via = true and does not detect non-boolean configuration changes.

At this stage, I’d prefer to keep the script simple rather than significantly increasing its complexity in Bash. If we decide to expand the scope to cover all SET datafusion.* mutations (including non-boolean values), I think it would be better to address that in a follow-up by implementing a more robust solution in Rust (e.g., using nom). That would allow us to properly track configuration changes and ensure they are restored in a more reliable way. I'm ready to work on this.

Happy to hear your thoughts on whether we should keep this PR focused or broaden its scope.

Agreed! Keep it simple for the first version!
Maybe add a comment/TODO to the new Bash script about this use case.

martin-g · 2026-02-23T13:42:25Z

ci/scripts/check_slt_configs.sh

+
+    # Check if reset exists anywhere in file
+    if ! grep -Eq \
+      "set[[:space:]]+$config[[:space:]]*=[[:space:]]*false" \


Suggested change

"set[[:space:]]+$config[[:space:]]*=[[:space:]]*false" \

"set[[:space:]]+${config}[[:space:]]*=[[:space:]]*false" \

martin-g · 2026-02-23T13:43:14Z

datafusion/sqllogictest/test_files/type_coercion.slt

 DROP TABLE t0;
+
+statement ok
+set datafusion.explain.logical_plan_only = false;


This is already done at https://github.com/apache/datafusion/pull/20474/changes#diff-7f1b4e520d77b90cae272fc9d03f3c397d1ab10e47de434ff9f8058ffd7cc083R249

Jefffrey

Is there a way to do this check in the SLT runner test code itself? e.g. after it runs an SLT file it checks if the config state is dirty and reports it via a warning in test output? I'm not particularly sure of adding more and more bash scripts like this

…ng-config-settings

cj-zhukov · 2026-02-25T06:34:42Z

Is there a way to do this check in the SLT runner test code itself? e.g. after it runs an SLT file it checks if the config state is dirty and reports it via a warning in test output? I'm not particularly sure of adding more and more bash scripts like this

Thank you - that makes sense. I agree that enforcing this at the SLT runner level would be a cleaner and more robust solution than adding additional Bash-based checks in CI.

I initially considered implementing this directly in Rust, but the shell script felt like a simpler and less invasive first step. That said, I understand the concern about accumulating more CI scripting logic, and I agree that handling this inside the SLT runner is architecturally the better approach.

We can mark this PR as draft (without closing the root issue). I'm going to work on a follow-up PR that implements the check directly in the SLT runner - e.g., verifying after each SLT file execution that the configuration state matches the default and reporting it in the test output. That should give us a more complete and future-proof solution.

Let me know if that sounds like a good direction, and I’ll proceed accordingly.

Jefffrey · 2026-02-25T08:53:47Z

That sounds good

## Which issue does this PR close?  - Closes ##17914. ## Rationale for this change In a previous PR #20474, I added a bash script that parsed the `SLT` files and checked whether any DataFusion configuration options were modified without being reset. While that approach worked, it relied on external scripting and additional parsing logic. This PR introduces a simpler and more direct solution implemented in Rust. At the end of each `SLT` test file execution, the current configuration is compared with the default configuration using a `Drop` implementation. If any configuration values were modified and not restored, a warning is printed. This approach is easier to maintain and keeps the validation logic within the Rust codebase rather than relying on an external bash script.  ## What changes are included in this PR? - Capture the default DataFusion configuration when the `SLT` runner is initialized. - Implement `Drop` for the DataFusion SLT engine. - When an `SLT` file finishes executing, compare the current configuration with the default configuration. - If differences are detected, print a warning showing which configuration options were modified.  ## Are these changes tested? This behavior is exercised by the existing `SLT` test suite. The configuration check runs automatically when each `SLT` file completes execution.  ## Are there any user-facing changes? No. This change only affects internal `SLT` test infrastructure and does not modify any public APIs.   --------- Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>

cj-zhukov · 2026-03-13T11:55:41Z

I'm closing this PR as all the work was completed and merged in #20838

…pache#20838) ## Which issue does this PR close?  - Closes #apache#17914. ## Rationale for this change In a previous PR apache#20474, I added a bash script that parsed the `SLT` files and checked whether any DataFusion configuration options were modified without being reset. While that approach worked, it relied on external scripting and additional parsing logic. This PR introduces a simpler and more direct solution implemented in Rust. At the end of each `SLT` test file execution, the current configuration is compared with the default configuration using a `Drop` implementation. If any configuration values were modified and not restored, a warning is printed. This approach is easier to maintain and keeps the validation logic within the Rust codebase rather than relying on an external bash script.  ## What changes are included in this PR? - Capture the default DataFusion configuration when the `SLT` runner is initialized. - Implement `Drop` for the DataFusion SLT engine. - When an `SLT` file finishes executing, compare the current configuration with the default configuration. - If differences are detected, print a warning showing which configuration options were modified.  ## Are these changes tested? This behavior is exercised by the existing `SLT` test suite. The configuration check runs automatically when each `SLT` file completes execution.  ## Are there any user-facing changes? No. This change only affects internal `SLT` test infrastructure and does not modify any public APIs.   --------- Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>

Check sqllogictests for any dangling config settings(apache#17914)

fe921dc

github-actions bot added development-process Related to development process of DataFusion sqllogictest SQL Logic Tests (.slt) labels Feb 22, 2026

martin-g reviewed Feb 23, 2026

View reviewed changes

Jefffrey reviewed Feb 25, 2026

View reviewed changes

cj-zhukov and others added 2 commits February 25, 2026 10:20

improvement the script after feadback

0dec112

Merge branch 'main' into cj-zhukov/check-sqllogictests-for-any-dangli…

f6321cc

…ng-config-settings

cj-zhukov marked this pull request as draft February 26, 2026 05:44

cj-zhukov mentioned this pull request Mar 10, 2026

Check sqllogictests for any dangling config settings (#17914) #20838

Merged

cj-zhukov closed this Mar 13, 2026

cj-zhukov deleted the cj-zhukov/check-sqllogictests-for-any-dangling-config-settings branch March 13, 2026 11:56

+                  return
+                fi
+                echo ""

	set datafusion.catalog.information_schema = false
	set datafusion.catalog.information_schema = false;

	"set[[:space:]]+$config[[:space:]]=[[:space:]]false" \
	"set[[:space:]]+${config}[[:space:]]=[[:space:]]false" \

Conversation

cj-zhukov commented Feb 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

cj-zhukov commented Feb 22, 2026

High-Level Overview

Uh oh!

cj-zhukov commented Feb 22, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Jefffrey left a comment

Choose a reason for hiding this comment

Uh oh!

cj-zhukov commented Feb 25, 2026

Uh oh!

Jefffrey commented Feb 25, 2026

Uh oh!

cj-zhukov commented Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cj-zhukov commented Feb 22, 2026 •

edited

Loading