Skip to content

Commit b193c0d

Browse files
Add T006 select-into-without-typed-fields (T-SQL runtime schema inference) (#48)
* Add T006 select-into-without-typed-fields rule T-SQL's "SELECT * INTO new_table FROM source" derives the destination schema from whatever the source produces at execution time. If the source columns change shape (a column added, type widened, an index changed) the destination silently adopts those changes, and any code reading from new_table finds the schema has shifted underneath it. Delayed, hard-to-trace data-integrity bugs. Severity = warning. The rule fires only on the wildcard form; the explicit-column form ("SELECT col1, col2 INTO target FROM source") still derives types from source columns but is at least naming what is being copied, so it stays a green path here and is covered by the contracts pack at C001/C003 if the source schema drifts. Implementation is a one-pattern regex "\bSELECT\s+\*\s+INTO\s+\S+", modelled on T001/T005's shape. Hand-tested against 9 cases (4 should- flag, 5 should-pass) including multi-line, lowercase, T-SQL variable assignment ("SELECT @x = COUNT(*) FROM ..."), and SELECT * inside a CTE without INTO. Registry counts move to 41 total / 30 warnings (was 40 / 29). Closes #43. * Add T006 select-into-without-typed-fields tests 10 new tests on test_tsql.py covering: - The issue example (SELECT * INTO staging_orders FROM orders) - SELECT * INTO with WHERE - Multi-line SELECT * INTO - Case-insensitive matching - The recommended pass form (SELECT typed_col INTO target) - Single-column INTO is fine - SELECT * without INTO is fine - T-SQL variable assignment (SELECT @x = COUNT(*) FROM ...) is not a SELECT * INTO target and must not flag - SELECT * inside a CTE without INTO must not flag - Rule message mentions "runtime" or "source" so the operator understands why the schema-derivation matters test_tsql.py 28 pass (was 18). * Document T006 in CHANGELOG under [Unreleased] Added * Add T006 row to README rule table * style: pre-commit auto-fixes [pre-commit.ci] auto-applied fixes from configured hooks --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
1 parent ff44acf commit b193c0d

6 files changed

Lines changed: 141 additions & 7 deletions

File tree

CHANGELOG.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,18 @@ a deprecation window (see `GOVERNANCE.md` § Scope discipline).
3434
helper from `base.py`. `, LATERAL ...` is recognised as a
3535
legitimate Snowflake/Postgres lateral join and not flagged.
3636
Resolves #42.
37+
- **T006 `select-into-without-typed-fields`** (warning) - flags
38+
T-SQL `SELECT * INTO target FROM source` because the destination
39+
table's schema is derived from whatever the source produces at
40+
execution time. If the source columns change shape (a column added,
41+
type widened, an index changed) the destination silently adopts
42+
those changes and any code reading from `target` finds the schema
43+
has shifted underneath it. Recommended pattern: `CREATE TABLE
44+
target (...)` with explicit typed columns, then `INSERT INTO target
45+
(col1, ...) SELECT col1, ...`. Fires only on the wildcard form;
46+
the explicit-column variant (`SELECT col1, col2 INTO target FROM
47+
source`) is allowed through here and covered by the contracts pack
48+
at C001/C003 if a column type drifts. Resolves #43.
3749

3850
### Fixed
3951

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -265,6 +265,7 @@ positives on non-T-SQL input.
265265
| T003 | `cursor-declaration` | `DECLARE c CURSOR FOR ...` -- row-by-row processing |
266266
| T004 | `deprecated-outer-join` | `WHERE a.x *= b.y` -- removed in SQL Server 2012+ |
267267
| T005 | `create-index-without-online` | `CREATE INDEX ix ON t (...)` -- locks table; add `WITH (ONLINE = ON)` |
268+
| T006 | `select-into-without-typed-fields` | `SELECT * INTO target FROM source` -- destination schema is inferred at runtime |
268269

269270
### Contracts (opt-in via `--contract`)
270271

sql_guard/rules/__init__.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,7 @@
4848
CreateIndexWithoutOnline,
4949
CursorDeclaration,
5050
DeprecatedOuterJoin,
51+
SelectStarInto,
5152
WithNolock,
5253
XpCmdshell,
5354
)
@@ -113,12 +114,13 @@
113114
ImplicitCrossJoin(),
114115
DeeplyNestedSubquery(),
115116
UnusedCTE(),
116-
# T-SQL specific (T001-T005)
117+
# T-SQL specific (T001-T006)
117118
WithNolock(),
118119
XpCmdshell(),
119120
CursorDeclaration(),
120121
DeprecatedOuterJoin(),
121122
CreateIndexWithoutOnline(),
123+
SelectStarInto(),
122124
]
123125

124126

sql_guard/rules/tsql.py

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -165,3 +165,55 @@ def check_statement(self, statement: str, start_line: int, file: str) -> Finding
165165
suggestion="Add WITH (ONLINE = ON) on Enterprise; disable T005 on Standard/Express",
166166
)
167167
return None
168+
169+
170+
class SelectStarInto(Rule):
171+
"""T006: ``SELECT * INTO target`` infers column types at runtime.
172+
173+
T-SQL's ``SELECT * INTO new_table FROM source`` derives the schema of
174+
``new_table`` from whatever the source produces at execution time.
175+
If the source columns change shape (a column added, type widened, an
176+
index changed) the destination table silently adopts those changes,
177+
and any code reading from ``new_table`` finds the schema has shifted
178+
underneath it. The data-integrity hit is delayed and hard to trace.
179+
180+
Recommended pattern: ``CREATE TABLE new_table (...)`` with explicit
181+
typed columns, then ``INSERT INTO new_table (col1, col2, ...) SELECT
182+
...``. The destination schema lives in source control and a contract
183+
breakage shows up as a compile error rather than silently propagated
184+
wrong types.
185+
186+
The rule fires only on the wildcard form. ``SELECT col1, col2 INTO
187+
target FROM source`` still derives types from source columns but at
188+
least names what is being copied; it stays a green path and gets
189+
caught (if a column type drifts) by the contracts pack at C001/C003.
190+
191+
Suppress with an inline ``-- noqa: T006`` comment on the same line,
192+
or use the project-wide ``-- sql-guard: disable=T006`` directive.
193+
"""
194+
195+
id = "T006"
196+
name = "select-into-without-typed-fields"
197+
severity = "warning"
198+
description = "SELECT * INTO derives the destination schema from the source at runtime"
199+
multiline = True
200+
201+
_pattern = Rule._compile(r"\bSELECT\s+\*\s+INTO\s+\S+")
202+
203+
def check_statement(self, statement: str, start_line: int, file: str) -> Finding | None:
204+
if self._pattern.search(statement):
205+
return Finding(
206+
rule_id=self.id,
207+
severity=self.severity,
208+
file=file,
209+
line=start_line,
210+
message=(
211+
"SELECT * INTO derives the destination schema from the source "
212+
"at runtime -- silent breakage when the source changes shape"
213+
),
214+
suggestion=(
215+
"CREATE TABLE target (col1 TYPE, ...) explicitly, then "
216+
"INSERT INTO target (col1, ...) SELECT col1, ... FROM source"
217+
),
218+
)
219+
return None

tests/test_rules.py

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -18,18 +18,19 @@
1818

1919
class TestRuleRegistry:
2020
def test_all_rules_loaded(self) -> None:
21-
assert len(ALL_RULES) == 40
21+
assert len(ALL_RULES) == 41
2222

2323
def test_11_errors(self) -> None:
2424
# 9 E-series + 2 T-series (T002 xp-cmdshell, T004 deprecated-outer-join).
2525
errors = [r for r in ALL_RULES if r.severity == "error"]
2626
assert len(errors) == 11
2727

28-
def test_29_warnings(self) -> None:
29-
# 23 W-series + 3 S-series + 3 T-series (T001 with-nolock,
30-
# T003 cursor-declaration, T005 create-index-without-online).
28+
def test_30_warnings(self) -> None:
29+
# 23 W-series + 3 S-series + 4 T-series (T001 with-nolock,
30+
# T003 cursor-declaration, T005 create-index-without-online,
31+
# T006 select-into-without-typed-fields).
3132
warnings = [r for r in ALL_RULES if r.severity == "warning"]
32-
assert len(warnings) == 29
33+
assert len(warnings) == 30
3334

3435
def test_unique_ids(self) -> None:
3536
ids = [r.id for r in ALL_RULES]

tests/test_tsql.py

Lines changed: 67 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,11 @@
1-
"""Tests for T-SQL-specific rules (T001-T004)."""
1+
"""Tests for T-SQL-specific rules (T001-T006)."""
22

33
from __future__ import annotations
44

55
from sql_guard.rules.tsql import (
66
CursorDeclaration,
77
DeprecatedOuterJoin,
8+
SelectStarInto,
89
WithNolock,
910
XpCmdshell,
1011
)
@@ -138,3 +139,68 @@ def test_w002_still_accepts_fetch_first():
138139
rule = MissingLimit()
139140
sql = "SELECT id FROM orders ORDER BY id FETCH FIRST 10 ROWS ONLY"
140141
assert _check_statement(rule, sql) is None
142+
143+
144+
# T006 select-into-without-typed-fields
145+
146+
147+
def test_t006_flags_basic_select_star_into():
148+
rule = SelectStarInto()
149+
finding = _check_statement(rule, "SELECT * INTO staging_orders FROM orders;")
150+
assert finding is not None
151+
assert finding.rule_id == "T006"
152+
assert finding.severity == "warning"
153+
154+
155+
def test_t006_flags_select_star_into_with_where():
156+
rule = SelectStarInto()
157+
sql = "SELECT * INTO archive_2024 FROM orders WHERE year = 2024;"
158+
assert _check_statement(rule, sql) is not None
159+
160+
161+
def test_t006_flags_multiline_select_star_into():
162+
rule = SelectStarInto()
163+
sql = "SELECT *\nINTO staging_orders\nFROM orders;"
164+
assert _check_statement(rule, sql) is not None
165+
166+
167+
def test_t006_case_insensitive():
168+
rule = SelectStarInto()
169+
assert _check_statement(rule, "select * into staging from orders;") is not None
170+
171+
172+
def test_t006_does_not_flag_typed_columns():
173+
# The recommended pass form from the issue.
174+
rule = SelectStarInto()
175+
sql = "SELECT order_id, customer_id INTO staging_orders FROM orders;"
176+
assert _check_statement(rule, sql) is None
177+
178+
179+
def test_t006_does_not_flag_single_column_into():
180+
rule = SelectStarInto()
181+
assert _check_statement(rule, "SELECT id INTO ids FROM orders;") is None
182+
183+
184+
def test_t006_does_not_flag_select_star_without_into():
185+
rule = SelectStarInto()
186+
assert _check_statement(rule, "SELECT * FROM orders WHERE id = 1;") is None
187+
188+
189+
def test_t006_does_not_flag_tsql_variable_assignment():
190+
# SELECT @x = COUNT(*) FROM ... is a T-SQL local-variable assignment,
191+
# not a SELECT * INTO target. No schema is being inferred.
192+
rule = SelectStarInto()
193+
assert _check_statement(rule, "SELECT @x = COUNT(*) FROM orders;") is None
194+
195+
196+
def test_t006_does_not_flag_select_star_inside_cte():
197+
rule = SelectStarInto()
198+
sql = "WITH s AS (SELECT * FROM orders) SELECT id FROM s;"
199+
assert _check_statement(rule, sql) is None
200+
201+
202+
def test_t006_message_mentions_runtime_schema():
203+
rule = SelectStarInto()
204+
finding = _check_statement(rule, "SELECT * INTO staging FROM orders;")
205+
assert finding is not None
206+
assert "runtime" in finding.message.lower() or "source" in finding.message.lower()

0 commit comments

Comments
 (0)