Skip to content

Add T006 select-into-without-typed-fields (T-SQL runtime schema inference)#48

Merged
Pawansingh3889 merged 5 commits into
mainfrom
feat/t006-select-into-without-typed-fields
May 9, 2026
Merged

Add T006 select-into-without-typed-fields (T-SQL runtime schema inference)#48
Pawansingh3889 merged 5 commits into
mainfrom
feat/t006-select-into-without-typed-fields

Conversation

@Pawansingh3889

Copy link
Copy Markdown
Owner

Closes #43.

T-SQL accepts SELECT * INTO new_table FROM source and derives the destination schema from whatever the source produces at execution time. If the source columns change shape (a column added, a type widened, an index swapped) the destination silently adopts those changes, and any code reading from new_table finds the schema has shifted underneath it. Delayed, hard-to-trace data-integrity bugs.

Recommended pattern is the explicit form: CREATE TABLE new_table (col1 TYPE, ...) then INSERT INTO new_table (col1, ...) SELECT col1, ... FROM source. The destination schema lives in source control and a contract breakage shows up as a compile error rather than silently propagated wrong types.

Severity is warning, modelled on T001 / T005. The rule fires only on the wildcard form. The explicit-column variant SELECT col1, col2 INTO target FROM source still derives types from source columns but at least names what is being copied; it stays a green path here and is covered by the contracts pack at C001 / C003 if a column type drifts.

Implementation is a one-pattern regex \bSELECT\s+\*\s+INTO\s+\S+, modelled on the shape of T001 and T005.

Samples:

-- Flagged
SELECT * INTO staging_orders FROM orders;
SELECT * INTO archive_2024 FROM orders WHERE year = 2024;
SELECT *
INTO staging_orders
FROM orders;

-- Not flagged
SELECT order_id, customer_id INTO staging_orders FROM orders;  -- recommended pass form from #43
SELECT id INTO ids FROM orders;
SELECT * FROM orders WHERE id = 1;                              -- no INTO
SELECT @x = COUNT(*) FROM orders;                                -- T-SQL local-variable assignment
WITH s AS (SELECT * FROM orders) SELECT id FROM s;               -- CTE without INTO

Tests: 10 new methods on test_tsql.py covering the issue example, multi-line layout, lowercase, the recommended pass form, single-column INTO, SELECT * without INTO, T-SQL variable assignment, CTE without INTO, and a message-content assertion. test_tsql.py 28 pass (was 18). Full suite: 233 pass, 1 skipped, no regressions.

Four commits on this branch. The first adds the rule in tsql.py and registers it in __init__.py; registry counts move to 41 total and 30 warnings. The second adds the tests. The third documents the rule in CHANGELOG under [Unreleased] Added. The fourth adds the T006 row to the README rule table.

Commits signed (ED25519).

Pawansingh3889 and others added 5 commits May 8, 2026 06:40
T-SQL's "SELECT * INTO new_table FROM source" derives the destination
schema from whatever the source produces at execution time. If the
source columns change shape (a column added, type widened, an index
changed) the destination silently adopts those changes, and any code
reading from new_table finds the schema has shifted underneath it.
Delayed, hard-to-trace data-integrity bugs.

Severity = warning. The rule fires only on the wildcard form; the
explicit-column form ("SELECT col1, col2 INTO target FROM source") still
derives types from source columns but is at least naming what is
being copied, so it stays a green path here and is covered by the
contracts pack at C001/C003 if the source schema drifts.

Implementation is a one-pattern regex "\bSELECT\s+\*\s+INTO\s+\S+",
modelled on T001/T005's shape. Hand-tested against 9 cases (4 should-
flag, 5 should-pass) including multi-line, lowercase, T-SQL variable
assignment ("SELECT @x = COUNT(*) FROM ..."), and SELECT * inside a
CTE without INTO.

Registry counts move to 41 total / 30 warnings (was 40 / 29).

Closes #43.
10 new tests on test_tsql.py covering:

- The issue example (SELECT * INTO staging_orders FROM orders)
- SELECT * INTO with WHERE
- Multi-line SELECT * INTO
- Case-insensitive matching
- The recommended pass form (SELECT typed_col INTO target)
- Single-column INTO is fine
- SELECT * without INTO is fine
- T-SQL variable assignment (SELECT @x = COUNT(*) FROM ...) is not a
  SELECT * INTO target and must not flag
- SELECT * inside a CTE without INTO must not flag
- Rule message mentions "runtime" or "source" so the operator
  understands why the schema-derivation matters

test_tsql.py 28 pass (was 18).
[pre-commit.ci] auto-applied fixes from configured hooks
@Pawansingh3889 Pawansingh3889 merged commit b193c0d into main May 9, 2026
5 of 7 checks passed
Pawansingh3889 added a commit that referenced this pull request May 9, 2026
…eck (#50)

The previous pr_title.yml workflow used amannn/action-semantic-pull-request@v5
to enforce strict Conventional Commits format on every PR title (feat:,
fix:, etc.) plus a lowercase-subject regex. That conflicted with the
project's actual title convention ("Add Wxxx rule-name (description)",
"Fix Snnn false negatives ...") and so the validate check went red on
every merged PR (e.g. #46, #47, #48, #49). Branch protection does not
require this check, so the merge button stayed enabled and the failures
shipped silently in the merged commits' status timelines.

Replacing the strict format enforcer with a small inline shell script
that catches the actual problems we have hit in practice:

1. Empty / whitespace-only titles -- noticeable in `gh pr list`.
2. Auto-generated branch-name titles (e.g. "Feat/w024-select-distinct
   -suspicious") that GitHub picks when an async PR-template render
   wipes out a freshly typed title. Matches a regex of the form
   ^[A-Z][a-z]+/[a-z0-9-] which catches the real cases without
   touching anything that looks like a sentence.
3. Titles longer than 100 characters, which truncate badly in the UI.

Project titles like "Add W024 select-distinct-suspicious (...)" pass
the sanity check; titles like "Feat/foo-bar" fail. Verified the regex
matrix locally before committing.

To restore stricter enforcement later, swap the inline script back for
amannn/action-semantic-pull-request@v5 and adjust the project title
convention to start with feat: / fix: prefixes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

rule: T006 select-into-without-typed-fields - warn on SELECT INTO *

1 participant