Add T006 select-into-without-typed-fields (T-SQL runtime schema inference)#48
Merged
Merged
Conversation
T-SQL's "SELECT * INTO new_table FROM source" derives the destination
schema from whatever the source produces at execution time. If the
source columns change shape (a column added, type widened, an index
changed) the destination silently adopts those changes, and any code
reading from new_table finds the schema has shifted underneath it.
Delayed, hard-to-trace data-integrity bugs.
Severity = warning. The rule fires only on the wildcard form; the
explicit-column form ("SELECT col1, col2 INTO target FROM source") still
derives types from source columns but is at least naming what is
being copied, so it stays a green path here and is covered by the
contracts pack at C001/C003 if the source schema drifts.
Implementation is a one-pattern regex "\bSELECT\s+\*\s+INTO\s+\S+",
modelled on T001/T005's shape. Hand-tested against 9 cases (4 should-
flag, 5 should-pass) including multi-line, lowercase, T-SQL variable
assignment ("SELECT @x = COUNT(*) FROM ..."), and SELECT * inside a
CTE without INTO.
Registry counts move to 41 total / 30 warnings (was 40 / 29).
Closes #43.
10 new tests on test_tsql.py covering: - The issue example (SELECT * INTO staging_orders FROM orders) - SELECT * INTO with WHERE - Multi-line SELECT * INTO - Case-insensitive matching - The recommended pass form (SELECT typed_col INTO target) - Single-column INTO is fine - SELECT * without INTO is fine - T-SQL variable assignment (SELECT @x = COUNT(*) FROM ...) is not a SELECT * INTO target and must not flag - SELECT * inside a CTE without INTO must not flag - Rule message mentions "runtime" or "source" so the operator understands why the schema-derivation matters test_tsql.py 28 pass (was 18).
[pre-commit.ci] auto-applied fixes from configured hooks
11 tasks
Pawansingh3889
added a commit
that referenced
this pull request
May 9, 2026
…eck (#50) The previous pr_title.yml workflow used amannn/action-semantic-pull-request@v5 to enforce strict Conventional Commits format on every PR title (feat:, fix:, etc.) plus a lowercase-subject regex. That conflicted with the project's actual title convention ("Add Wxxx rule-name (description)", "Fix Snnn false negatives ...") and so the validate check went red on every merged PR (e.g. #46, #47, #48, #49). Branch protection does not require this check, so the merge button stayed enabled and the failures shipped silently in the merged commits' status timelines. Replacing the strict format enforcer with a small inline shell script that catches the actual problems we have hit in practice: 1. Empty / whitespace-only titles -- noticeable in `gh pr list`. 2. Auto-generated branch-name titles (e.g. "Feat/w024-select-distinct -suspicious") that GitHub picks when an async PR-template render wipes out a freshly typed title. Matches a regex of the form ^[A-Z][a-z]+/[a-z0-9-] which catches the real cases without touching anything that looks like a sentence. 3. Titles longer than 100 characters, which truncate badly in the UI. Project titles like "Add W024 select-distinct-suspicious (...)" pass the sanity check; titles like "Feat/foo-bar" fail. Verified the regex matrix locally before committing. To restore stricter enforcement later, swap the inline script back for amannn/action-semantic-pull-request@v5 and adjust the project title convention to start with feat: / fix: prefixes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #43.
T-SQL accepts
SELECT * INTO new_table FROM sourceand derives the destination schema from whatever the source produces at execution time. If the source columns change shape (a column added, a type widened, an index swapped) the destination silently adopts those changes, and any code reading fromnew_tablefinds the schema has shifted underneath it. Delayed, hard-to-trace data-integrity bugs.Recommended pattern is the explicit form:
CREATE TABLE new_table (col1 TYPE, ...)thenINSERT INTO new_table (col1, ...) SELECT col1, ... FROM source. The destination schema lives in source control and a contract breakage shows up as a compile error rather than silently propagated wrong types.Severity is
warning, modelled on T001 / T005. The rule fires only on the wildcard form. The explicit-column variantSELECT col1, col2 INTO target FROM sourcestill derives types from source columns but at least names what is being copied; it stays a green path here and is covered by the contracts pack at C001 / C003 if a column type drifts.Implementation is a one-pattern regex
\bSELECT\s+\*\s+INTO\s+\S+, modelled on the shape of T001 and T005.Samples:
Tests: 10 new methods on
test_tsql.pycovering the issue example, multi-line layout, lowercase, the recommended pass form, single-column INTO, SELECT * without INTO, T-SQL variable assignment, CTE without INTO, and a message-content assertion.test_tsql.py28 pass (was 18). Full suite: 233 pass, 1 skipped, no regressions.Four commits on this branch. The first adds the rule in
tsql.pyand registers it in__init__.py; registry counts move to 41 total and 30 warnings. The second adds the tests. The third documents the rule in CHANGELOG under[Unreleased]Added. The fourth adds the T006 row to the README rule table.Commits signed (ED25519).