Skip to content

Commit ff44acf

Browse files
Add E009 update-from-without-join (T-SQL implicit-join Cartesian risk) (#47)
* Lift strip_strings_and_comments helper into base.py Moves the SQL string/comment stripper that S001 introduced into a package-shared helper exposed from sql_guard.rules.base. structural.py now imports it; the next rule (E009) and any future paren-aware rule get the same behaviour without duplicating the regexes. Behaviour-preserving refactor only. structural test suite stays at 23 passing. * Add E009 update-from-without-join rule T-SQL accepts UPDATE customers SET status = o.status FROM customers c, orders o WHERE c.customer_id = o.customer_id. The comma form is a legacy implicit join: get the WHERE wrong (or omit it) and every row in the target table is updated against the Cartesian product. Silent data corruption, no syntax error. Reported as severity=error rather than warning (the sister rule S001 covers SELECT FROM comma-joins as a warning) because the failure mode here is a write that touches every row of a table, not a read. Implementation mirrors the post-fix S001: walk from each UPDATE keyword to the next FROM, then forward through the FROM clause tracking parenthesis depth, stopping at WHERE / GROUP BY / ORDER BY / HAVING / LIMIT / explicit JOIN / UNION / EXCEPT / INTERSECT, flagging the first depth-0 comma. Uses the strip_strings_and_comments helper lifted to base.py in the previous commit. Comma followed by LATERAL is recognised as a legitimate Snowflake/Postgres lateral join. Hand-tested against 12 cases (4 should-flag, 8 should-pass) including the issue example, multi-line layout, three-way comma joins, Postgres single-table UPDATE FROM, LATERAL, LEFT JOIN, and CTE-with-UPDATE. Registry counts updated: 40 total (was 39), 11 errors (was 10). Closes #42. * Add E009 update-from-without-join tests Adds an E009 line to the errors.sql fixture (T-SQL UPDATE FROM with a comma-separated table list, the issue example shape) and seven test methods on TestErrorRules: - test_e009_update_from_implicit_join: fixture-based, asserts at least one E009 finding with cartesian / comma-separated wording - test_e009_explicit_join_ok: the rule message's recommended fix pattern (UPDATE ... FROM a INNER JOIN b ON ...) must not flag - test_e009_postgres_single_from_table_ok: canonical Postgres UPDATE ... FROM single-table form - test_e009_lateral_after_comma_ok: , LATERAL is a real Snowflake/Postgres lateral join, not flagged - test_e009_update_without_from_ok: plain UPDATE with no FROM - test_e009_three_table_comma_join_flagged - test_e009_multiline_comma_join_flagged test_rules.py 49 pass, full suite 226 pass / 1 skipped. * Document E009 in CHANGELOG under [Unreleased] Added * style: pre-commit auto-fixes [pre-commit.ci] auto-applied fixes from configured hooks --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
1 parent 47db8f0 commit ff44acf

7 files changed

Lines changed: 239 additions & 27 deletions

File tree

CHANGELOG.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,18 @@ a deprecation window (see `GOVERNANCE.md` § Scope discipline).
2222
Contributed by [@mvanhorn](https://github.com/mvanhorn)
2323
([#39](https://github.com/Pawansingh3889/sql-guard/pull/39)).
2424
Resolves #3.
25+
- **E009 `update-from-without-join`** (error) - flags
26+
`UPDATE ... FROM a, b WHERE ...` and similar comma-separated FROM
27+
clauses on UPDATE statements. T-SQL accepts this legacy implicit
28+
join: get the WHERE wrong (or omit it) and every row in the target
29+
table is updated against the Cartesian product. Severity is error
30+
rather than warning (the sister rule S001 covers SELECT FROM
31+
comma-joins as a warning) because the failure mode is a write that
32+
touches every row of a table. Walk-from-UPDATE-keyword,
33+
paren-depth-aware scan reusing the `strip_strings_and_comments`
34+
helper from `base.py`. `, LATERAL ...` is recognised as a
35+
legitimate Snowflake/Postgres lateral join and not flagged.
36+
Resolves #42.
2537

2638
### Fixed
2739

sql_guard/rules/__init__.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111
GrantRevoke,
1212
InsertWithoutColumns,
1313
StringConcatInWhere,
14+
UpdateFromImplicitJoin,
1415
UpdateWithoutWhere,
1516
)
1617
from sql_guard.rules.warnings import (
@@ -74,7 +75,7 @@
7475
]
7576

7677
ALL_RULES: list[Rule] = [
77-
# Errors (E001-E008)
78+
# Errors (E001-E009)
7879
DeleteWithoutWhere(),
7980
DropWithoutIfExists(),
8081
GrantRevoke(),
@@ -83,6 +84,7 @@
8384
UpdateWithoutWhere(),
8485
AlterAddNotNullNoDefault(),
8586
DropColumn(),
87+
UpdateFromImplicitJoin(),
8688
# Warnings (W001-W023 with gaps)
8789
SelectStar(),
8890
MissingLimit(),

sql_guard/rules/base.py

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,26 @@
66
from dataclasses import dataclass
77

88

9+
_SQL_STRING = re.compile(r"'(?:[^']|'')*'")
10+
_SQL_LINE_COMMENT = re.compile(r"--[^\n]*")
11+
_SQL_BLOCK_COMMENT = re.compile(r"/\*.*?\*/", re.DOTALL)
12+
13+
14+
def strip_strings_and_comments(text: str) -> str:
15+
"""Replace SQL string literals and comments with empty equivalents.
16+
17+
Single-quoted strings (with `''` escapes), `--` line comments, and
18+
`/* ... */` block comments are all removed (strings collapse to ``''``,
19+
comments to empty). Useful before paren-depth or keyword scanning so
20+
commas, parentheses, or keywords inside literals/comments do not
21+
affect the scan.
22+
"""
23+
text = _SQL_STRING.sub("''", text)
24+
text = _SQL_LINE_COMMENT.sub("", text)
25+
text = _SQL_BLOCK_COMMENT.sub("", text)
26+
return text
27+
28+
929
@dataclass
1030
class Finding:
1131
"""A single issue found by a rule."""

sql_guard/rules/errors.py

Lines changed: 119 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,10 @@
1-
"""Error rules (E001-E008) -- these block commits."""
1+
"""Error rules (E001-E009) -- these block commits."""
22

33
from __future__ import annotations
44

5-
from sql_guard.rules.base import Finding, Rule
5+
import re
6+
7+
from sql_guard.rules.base import Finding, Rule, strip_strings_and_comments
68

79

810
class DeleteWithoutWhere(Rule):
@@ -223,3 +225,118 @@ def check_statement(self, statement: str, start_line: int, file: str) -> Finding
223225
suggestion="Stop reading the column for one release, then drop in a follow-up",
224226
)
225227
return None
228+
229+
230+
class UpdateFromImplicitJoin(Rule):
231+
"""E009: ``UPDATE ... FROM`` with comma-separated tables.
232+
233+
T-SQL accepts ``UPDATE customers SET status = o.status FROM customers c,
234+
orders o WHERE c.customer_id = o.customer_id``. The comma form is a
235+
legacy implicit join: get the WHERE wrong (or omit it) and every row in
236+
the target table is updated against the Cartesian product. Silent data
237+
corruption, no syntax error.
238+
239+
Reported as **error** rather than warning (the sister rule S001 covers
240+
`SELECT FROM` comma-joins as a warning) because the failure mode here
241+
is a write that touches every row of a table, not a read.
242+
243+
The scan walks from each ``UPDATE`` keyword to the next ``FROM``, then
244+
forward through the ``FROM`` clause tracking parenthesis depth, stops
245+
at ``WHERE`` / ``GROUP BY`` / ``ORDER BY`` / ``HAVING`` / ``LIMIT`` /
246+
explicit ``JOIN`` / ``UNION`` / ``EXCEPT`` / ``INTERSECT``, and flags
247+
the first depth-0 comma. Strings and comments are stripped first.
248+
Comma followed by ``LATERAL`` is recognised as a legitimate
249+
Snowflake/Postgres lateral join and not flagged.
250+
251+
Suppress with an inline ``-- noqa: E009`` comment on the same line, or
252+
use the project-wide ``-- sql-guard: disable=E009`` directive.
253+
"""
254+
255+
id = "E009"
256+
name = "update-from-without-join"
257+
severity = "error"
258+
description = "UPDATE ... FROM with comma-separated tables silently creates a Cartesian product"
259+
multiline = True
260+
261+
_update_pattern = re.compile(r"\bUPDATE\b", re.IGNORECASE)
262+
_from_pattern = re.compile(r"\bFROM\b", re.IGNORECASE)
263+
_stop_pattern = re.compile(
264+
r"\b("
265+
r"WHERE|"
266+
r"GROUP\s+BY|ORDER\s+BY|HAVING|"
267+
r"LIMIT|FETCH|OFFSET|"
268+
r"INNER\s+JOIN|"
269+
r"LEFT\s+(?:OUTER\s+)?JOIN|"
270+
r"RIGHT\s+(?:OUTER\s+)?JOIN|"
271+
r"FULL\s+(?:OUTER\s+)?JOIN|"
272+
r"CROSS\s+JOIN|"
273+
r"NATURAL\s+(?:INNER\s+|LEFT\s+|RIGHT\s+|FULL\s+)?JOIN|"
274+
r"JOIN|"
275+
r"UNION|EXCEPT|INTERSECT"
276+
r")\b",
277+
re.IGNORECASE,
278+
)
279+
_lateral_pattern = re.compile(r"\s*LATERAL\b", re.IGNORECASE)
280+
281+
def check_statement(self, statement: str, start_line: int, file: str) -> Finding | None:
282+
cleaned = strip_strings_and_comments(statement)
283+
n = len(cleaned)
284+
285+
update_match = self._update_pattern.search(cleaned)
286+
if not update_match:
287+
return None
288+
289+
# The FROM clause must come after the UPDATE keyword. A FROM that
290+
# appears before UPDATE (e.g. inside a CTE) does not belong to this
291+
# UPDATE.
292+
from_match = self._from_pattern.search(cleaned, update_match.end())
293+
if not from_match:
294+
return None
295+
296+
depth = 0
297+
i = from_match.end()
298+
while i < n:
299+
ch = cleaned[i]
300+
301+
if ch == "(":
302+
depth += 1
303+
i += 1
304+
continue
305+
if ch == ")":
306+
depth -= 1
307+
if depth < 0:
308+
break
309+
i += 1
310+
continue
311+
312+
if depth != 0:
313+
i += 1
314+
continue
315+
316+
stop = self._stop_pattern.match(cleaned, i)
317+
if stop:
318+
break
319+
if ch == ";":
320+
break
321+
if ch == ",":
322+
if self._lateral_pattern.match(cleaned, i + 1):
323+
i += 1
324+
continue
325+
line_offset = statement[:i].count("\n")
326+
return Finding(
327+
rule_id=self.id,
328+
severity=self.severity,
329+
file=file,
330+
line=start_line + line_offset,
331+
message=(
332+
"UPDATE ... FROM with comma-separated tables -- "
333+
"silent Cartesian product risks corrupting every row"
334+
),
335+
suggestion=(
336+
"Use UPDATE ... FROM a INNER JOIN b ON a.id = b.id "
337+
"(explicit JOIN with ON clause)"
338+
),
339+
)
340+
341+
i += 1
342+
return None

sql_guard/rules/structural.py

Lines changed: 2 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212

1313
import re
1414

15-
from sql_guard.rules.base import Finding, Rule
15+
from sql_guard.rules.base import Finding, Rule, strip_strings_and_comments
1616

1717
try:
1818
import sqlparse
@@ -23,24 +23,6 @@
2323
HAS_SQLPARSE = False
2424

2525

26-
_SQL_STRING = re.compile(r"'(?:[^']|'')*'")
27-
_SQL_LINE_COMMENT = re.compile(r"--[^\n]*")
28-
_SQL_BLOCK_COMMENT = re.compile(r"/\*.*?\*/", re.DOTALL)
29-
30-
31-
def _strip_strings_and_comments(text: str) -> str:
32-
"""Replace string literals and comments with empty equivalents.
33-
34-
Preserves character positions roughly (replaces strings with ``''`` and
35-
comments with empty), so commas inside literals or comments do not
36-
trigger comma-join detection but the surrounding structure is unchanged.
37-
"""
38-
text = _SQL_STRING.sub("''", text)
39-
text = _SQL_LINE_COMMENT.sub("", text)
40-
text = _SQL_BLOCK_COMMENT.sub("", text)
41-
return text
42-
43-
4426
class ImplicitCrossJoin(Rule):
4527
"""S001: Implicit cross join via comma-separated tables in FROM.
4628
@@ -89,7 +71,7 @@ class ImplicitCrossJoin(Rule):
8971
_lateral_pattern = re.compile(r"\s*LATERAL\b", re.IGNORECASE)
9072

9173
def check_statement(self, statement: str, start_line: int, file: str) -> Finding | None:
92-
cleaned = _strip_strings_and_comments(statement)
74+
cleaned = strip_strings_and_comments(statement)
9375
n = len(cleaned)
9476

9577
for from_match in self._from_pattern.finditer(cleaned):

tests/fixtures/errors.sql

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,3 +17,6 @@ INSERT INTO orders VALUES (1, 'test', 100);
1717

1818
-- E006: UPDATE without WHERE
1919
UPDATE orders SET status = 'cancelled';
20+
21+
-- E009: UPDATE FROM with comma-separated tables (T-SQL legacy implicit join)
22+
UPDATE customers SET status = o.status FROM customers c, orders o WHERE c.customer_id = o.customer_id;

tests/test_rules.py

Lines changed: 80 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -18,12 +18,12 @@
1818

1919
class TestRuleRegistry:
2020
def test_all_rules_loaded(self) -> None:
21-
assert len(ALL_RULES) == 39
21+
assert len(ALL_RULES) == 40
2222

23-
def test_10_errors(self) -> None:
24-
# 8 E-series + 2 T-series (T002 xp-cmdshell, T004 deprecated-outer-join).
23+
def test_11_errors(self) -> None:
24+
# 9 E-series + 2 T-series (T002 xp-cmdshell, T004 deprecated-outer-join).
2525
errors = [r for r in ALL_RULES if r.severity == "error"]
26-
assert len(errors) == 10
26+
assert len(errors) == 11
2727

2828
def test_29_warnings(self) -> None:
2929
# 23 W-series + 3 S-series + 3 T-series (T001 with-nolock,
@@ -91,6 +91,82 @@ def test_e006_update_with_where_ok(self, tmp_path) -> None:
9191
e006 = [f for f in result.findings if f.rule_id == "E006"]
9292
assert not e006
9393

94+
def test_e009_update_from_implicit_join(self) -> None:
95+
findings = check([str(FIXTURES / "errors.sql")])
96+
e009 = [f for f in findings.findings if f.rule_id == "E009"]
97+
assert len(e009) >= 1
98+
assert "UPDATE" in e009[0].message
99+
assert (
100+
"cartesian" in e009[0].message.lower() or "comma-separated" in e009[0].message.lower()
101+
)
102+
103+
def test_e009_explicit_join_ok(self, tmp_path) -> None:
104+
# The recommended fix from the rule message must NOT trigger E009.
105+
sql = tmp_path / "safe_update_from.sql"
106+
sql.write_text(
107+
"UPDATE c SET c.status = o.status "
108+
"FROM customers c INNER JOIN orders o "
109+
"ON c.customer_id = o.customer_id;\n"
110+
)
111+
result = check([str(sql)])
112+
e009 = [f for f in result.findings if f.rule_id == "E009"]
113+
assert not e009
114+
115+
def test_e009_postgres_single_from_table_ok(self, tmp_path) -> None:
116+
# Postgres UPDATE ... FROM with a single table is the canonical
117+
# form and must not flag.
118+
sql = tmp_path / "postgres_update.sql"
119+
sql.write_text(
120+
"UPDATE customers SET status = o.status "
121+
"FROM orders o WHERE customers.id = o.customer_id;\n"
122+
)
123+
result = check([str(sql)])
124+
e009 = [f for f in result.findings if f.rule_id == "E009"]
125+
assert not e009
126+
127+
def test_e009_lateral_after_comma_ok(self, tmp_path) -> None:
128+
# `, LATERAL ...` is a real Snowflake / Postgres lateral join.
129+
sql = tmp_path / "lateral_update.sql"
130+
sql.write_text(
131+
"UPDATE c SET tag = sub.tag FROM customers c, "
132+
"LATERAL (SELECT tag FROM tags WHERE customer_id = c.id LIMIT 1) sub;\n"
133+
)
134+
result = check([str(sql)])
135+
e009 = [f for f in result.findings if f.rule_id == "E009"]
136+
assert not e009
137+
138+
def test_e009_update_without_from_ok(self, tmp_path) -> None:
139+
# No FROM clause at all is a plain single-table UPDATE.
140+
sql = tmp_path / "plain_update.sql"
141+
sql.write_text("UPDATE customers SET status = 'active' WHERE id = 1;\n")
142+
result = check([str(sql)])
143+
e009 = [f for f in result.findings if f.rule_id == "E009"]
144+
assert not e009
145+
146+
def test_e009_three_table_comma_join_flagged(self, tmp_path) -> None:
147+
sql = tmp_path / "three_table.sql"
148+
sql.write_text(
149+
"UPDATE c SET c.label = p.label "
150+
"FROM customers c, orders o, products p "
151+
"WHERE c.id = o.customer_id AND o.product_id = p.id;\n"
152+
)
153+
result = check([str(sql)])
154+
e009 = [f for f in result.findings if f.rule_id == "E009"]
155+
assert len(e009) == 1
156+
157+
def test_e009_multiline_comma_join_flagged(self, tmp_path) -> None:
158+
sql = tmp_path / "multiline.sql"
159+
sql.write_text(
160+
"UPDATE customers\n"
161+
"SET status = o.status\n"
162+
"FROM customers c,\n"
163+
" orders o\n"
164+
"WHERE c.customer_id = o.customer_id;\n"
165+
)
166+
result = check([str(sql)])
167+
e009 = [f for f in result.findings if f.rule_id == "E009"]
168+
assert len(e009) == 1
169+
94170

95171
# ---------------------------------------------------------------------------
96172
# Warning rules

0 commit comments

Comments
 (0)