Skip to content

Add E009 update-from-without-join (T-SQL implicit-join Cartesian risk)#47

Merged
Pawansingh3889 merged 5 commits into
mainfrom
feat/e009-update-from-without-join
May 8, 2026
Merged

Add E009 update-from-without-join (T-SQL implicit-join Cartesian risk)#47
Pawansingh3889 merged 5 commits into
mainfrom
feat/e009-update-from-without-join

Conversation

@Pawansingh3889

Copy link
Copy Markdown
Owner

Closes #42.

T-SQL accepts UPDATE customers SET status = o.status FROM customers c, orders o WHERE c.customer_id = o.customer_id. The comma form is a legacy implicit join: get the WHERE wrong (or omit it) and every row in the target table is updated against the Cartesian product. Silent data corruption, no syntax error.

Reported as severity = error rather than warning, because the failure mode is a write that touches every row of a table, not a read. The sister rule S001 covers SELECT FROM comma-joins as a warning.

Implementation mirrors the post-fix S001 from #46. Walk from each UPDATE keyword to the next FROM, then forward through the FROM clause tracking parenthesis depth, stop at WHERE / GROUP BY / ORDER BY / HAVING / LIMIT / explicit JOIN / UNION / EXCEPT / INTERSECT, and flag the first depth-0 comma. The strip_strings_and_comments helper that S001 introduced has been lifted to sql_guard.rules.base (first commit of this PR) so E009 reuses it without duplicating the regexes.

Deliberate suppression cases:

UPDATE c SET tag = sub.tag FROM customers c, LATERAL (SELECT tag FROM tags WHERE customer_id = c.id LIMIT 1) sub;
UPDATE customers SET status = o.status FROM orders o WHERE customers.id = o.customer_id;
UPDATE customers SET status = 'active' WHERE id = 1;
UPDATE c SET c.status = o.status FROM customers c INNER JOIN orders o ON c.customer_id = o.customer_id;

LATERAL after a comma is a real Snowflake / Postgres lateral join. Postgres canonical UPDATE FROM with a single table is the recommended form. Plain UPDATE with no FROM is fine. The recommended fix from the rule message must not flag itself.

Tests: 7 new methods on TestErrorRules covering the issue example via the fixture, the recommended fix, Postgres single-table UPDATE FROM, LATERAL after comma, plain UPDATE without FROM, three-table comma join, and multi-line comma join. Plus an E009 line added to tests/fixtures/errors.sql.

Four commits on this branch. The first is a behaviour-preserving refactor lifting strip_strings_and_comments from structural.py to base.py so E009 reuses it. The next adds the rule in errors.py and registers it; registry counts move to 40 total and 11 errors. The third adds tests and the fixture line. The fourth documents the rule in CHANGELOG under [Unreleased] Added.

Full suite: 226 pass, 1 skipped. No regressions. Commits signed (ED25519).

Moves the SQL string/comment stripper that S001 introduced into a
package-shared helper exposed from sql_guard.rules.base. structural.py
now imports it; the next rule (E009) and any future paren-aware rule
get the same behaviour without duplicating the regexes.

Behaviour-preserving refactor only. structural test suite stays at
23 passing.
T-SQL accepts UPDATE customers SET status = o.status FROM customers c,
orders o WHERE c.customer_id = o.customer_id. The comma form is a
legacy implicit join: get the WHERE wrong (or omit it) and every row
in the target table is updated against the Cartesian product. Silent
data corruption, no syntax error.

Reported as severity=error rather than warning (the sister rule S001
covers SELECT FROM comma-joins as a warning) because the failure mode
here is a write that touches every row of a table, not a read.

Implementation mirrors the post-fix S001: walk from each UPDATE
keyword to the next FROM, then forward through the FROM clause
tracking parenthesis depth, stopping at WHERE / GROUP BY / ORDER BY /
HAVING / LIMIT / explicit JOIN / UNION / EXCEPT / INTERSECT, flagging
the first depth-0 comma. Uses the strip_strings_and_comments helper
lifted to base.py in the previous commit. Comma followed by LATERAL
is recognised as a legitimate Snowflake/Postgres lateral join.

Hand-tested against 12 cases (4 should-flag, 8 should-pass) including
the issue example, multi-line layout, three-way comma joins, Postgres
single-table UPDATE FROM, LATERAL, LEFT JOIN, and CTE-with-UPDATE.

Registry counts updated: 40 total (was 39), 11 errors (was 10).

Closes #42.
Adds an E009 line to the errors.sql fixture (T-SQL UPDATE FROM with a
comma-separated table list, the issue example shape) and seven test
methods on TestErrorRules:

- test_e009_update_from_implicit_join: fixture-based, asserts at
  least one E009 finding with cartesian / comma-separated wording
- test_e009_explicit_join_ok: the rule message's recommended fix
  pattern (UPDATE ... FROM a INNER JOIN b ON ...) must not flag
- test_e009_postgres_single_from_table_ok: canonical Postgres
  UPDATE ... FROM single-table form
- test_e009_lateral_after_comma_ok: , LATERAL is a real
  Snowflake/Postgres lateral join, not flagged
- test_e009_update_without_from_ok: plain UPDATE with no FROM
- test_e009_three_table_comma_join_flagged
- test_e009_multiline_comma_join_flagged

test_rules.py 49 pass, full suite 226 pass / 1 skipped.
@Pawansingh3889 Pawansingh3889 changed the title Feat/e009 update from without join Add E009 update-from-without-join (T-SQL implicit-join Cartesian risk) May 8, 2026
[pre-commit.ci] auto-applied fixes from configured hooks
@Pawansingh3889 Pawansingh3889 merged commit ff44acf into main May 8, 2026
5 of 7 checks passed
Pawansingh3889 added a commit that referenced this pull request May 9, 2026
…eck (#50)

The previous pr_title.yml workflow used amannn/action-semantic-pull-request@v5
to enforce strict Conventional Commits format on every PR title (feat:,
fix:, etc.) plus a lowercase-subject regex. That conflicted with the
project's actual title convention ("Add Wxxx rule-name (description)",
"Fix Snnn false negatives ...") and so the validate check went red on
every merged PR (e.g. #46, #47, #48, #49). Branch protection does not
require this check, so the merge button stayed enabled and the failures
shipped silently in the merged commits' status timelines.

Replacing the strict format enforcer with a small inline shell script
that catches the actual problems we have hit in practice:

1. Empty / whitespace-only titles -- noticeable in `gh pr list`.
2. Auto-generated branch-name titles (e.g. "Feat/w024-select-distinct
   -suspicious") that GitHub picks when an async PR-template render
   wipes out a freshly typed title. Matches a regex of the form
   ^[A-Z][a-z]+/[a-z0-9-] which catches the real cases without
   touching anything that looks like a sentence.
3. Titles longer than 100 characters, which truncate badly in the UI.

Project titles like "Add W024 select-distinct-suspicious (...)" pass
the sanity check; titles like "Feat/foo-bar" fail. Verified the regex
matrix locally before committing.

To restore stricter enforcement later, swap the inline script back for
amannn/action-semantic-pull-request@v5 and adjust the project title
convention to start with feat: / fix: prefixes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

rule: E009 update-from-without-join - error on UPDATE FROM without an explicit JOIN clause

1 participant