Add E009 update-from-without-join (T-SQL implicit-join Cartesian risk)#47
Merged
Merged
Conversation
Moves the SQL string/comment stripper that S001 introduced into a package-shared helper exposed from sql_guard.rules.base. structural.py now imports it; the next rule (E009) and any future paren-aware rule get the same behaviour without duplicating the regexes. Behaviour-preserving refactor only. structural test suite stays at 23 passing.
T-SQL accepts UPDATE customers SET status = o.status FROM customers c, orders o WHERE c.customer_id = o.customer_id. The comma form is a legacy implicit join: get the WHERE wrong (or omit it) and every row in the target table is updated against the Cartesian product. Silent data corruption, no syntax error. Reported as severity=error rather than warning (the sister rule S001 covers SELECT FROM comma-joins as a warning) because the failure mode here is a write that touches every row of a table, not a read. Implementation mirrors the post-fix S001: walk from each UPDATE keyword to the next FROM, then forward through the FROM clause tracking parenthesis depth, stopping at WHERE / GROUP BY / ORDER BY / HAVING / LIMIT / explicit JOIN / UNION / EXCEPT / INTERSECT, flagging the first depth-0 comma. Uses the strip_strings_and_comments helper lifted to base.py in the previous commit. Comma followed by LATERAL is recognised as a legitimate Snowflake/Postgres lateral join. Hand-tested against 12 cases (4 should-flag, 8 should-pass) including the issue example, multi-line layout, three-way comma joins, Postgres single-table UPDATE FROM, LATERAL, LEFT JOIN, and CTE-with-UPDATE. Registry counts updated: 40 total (was 39), 11 errors (was 10). Closes #42.
Adds an E009 line to the errors.sql fixture (T-SQL UPDATE FROM with a comma-separated table list, the issue example shape) and seven test methods on TestErrorRules: - test_e009_update_from_implicit_join: fixture-based, asserts at least one E009 finding with cartesian / comma-separated wording - test_e009_explicit_join_ok: the rule message's recommended fix pattern (UPDATE ... FROM a INNER JOIN b ON ...) must not flag - test_e009_postgres_single_from_table_ok: canonical Postgres UPDATE ... FROM single-table form - test_e009_lateral_after_comma_ok: , LATERAL is a real Snowflake/Postgres lateral join, not flagged - test_e009_update_without_from_ok: plain UPDATE with no FROM - test_e009_three_table_comma_join_flagged - test_e009_multiline_comma_join_flagged test_rules.py 49 pass, full suite 226 pass / 1 skipped.
[pre-commit.ci] auto-applied fixes from configured hooks
11 tasks
Pawansingh3889
added a commit
that referenced
this pull request
May 9, 2026
…eck (#50) The previous pr_title.yml workflow used amannn/action-semantic-pull-request@v5 to enforce strict Conventional Commits format on every PR title (feat:, fix:, etc.) plus a lowercase-subject regex. That conflicted with the project's actual title convention ("Add Wxxx rule-name (description)", "Fix Snnn false negatives ...") and so the validate check went red on every merged PR (e.g. #46, #47, #48, #49). Branch protection does not require this check, so the merge button stayed enabled and the failures shipped silently in the merged commits' status timelines. Replacing the strict format enforcer with a small inline shell script that catches the actual problems we have hit in practice: 1. Empty / whitespace-only titles -- noticeable in `gh pr list`. 2. Auto-generated branch-name titles (e.g. "Feat/w024-select-distinct -suspicious") that GitHub picks when an async PR-template render wipes out a freshly typed title. Matches a regex of the form ^[A-Z][a-z]+/[a-z0-9-] which catches the real cases without touching anything that looks like a sentence. 3. Titles longer than 100 characters, which truncate badly in the UI. Project titles like "Add W024 select-distinct-suspicious (...)" pass the sanity check; titles like "Feat/foo-bar" fail. Verified the regex matrix locally before committing. To restore stricter enforcement later, swap the inline script back for amannn/action-semantic-pull-request@v5 and adjust the project title convention to start with feat: / fix: prefixes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #42.
T-SQL accepts
UPDATE customers SET status = o.status FROM customers c, orders o WHERE c.customer_id = o.customer_id. The comma form is a legacy implicit join: get the WHERE wrong (or omit it) and every row in the target table is updated against the Cartesian product. Silent data corruption, no syntax error.Reported as
severity = errorrather than warning, because the failure mode is a write that touches every row of a table, not a read. The sister rule S001 covers SELECT FROM comma-joins as a warning.Implementation mirrors the post-fix S001 from #46. Walk from each
UPDATEkeyword to the nextFROM, then forward through the FROM clause tracking parenthesis depth, stop atWHERE/GROUP BY/ORDER BY/HAVING/LIMIT/ explicitJOIN/UNION/EXCEPT/INTERSECT, and flag the first depth-0 comma. Thestrip_strings_and_commentshelper that S001 introduced has been lifted tosql_guard.rules.base(first commit of this PR) so E009 reuses it without duplicating the regexes.Deliberate suppression cases:
LATERAL after a comma is a real Snowflake / Postgres lateral join. Postgres canonical UPDATE FROM with a single table is the recommended form. Plain UPDATE with no FROM is fine. The recommended fix from the rule message must not flag itself.
Tests: 7 new methods on
TestErrorRulescovering the issue example via the fixture, the recommended fix, Postgres single-table UPDATE FROM, LATERAL after comma, plain UPDATE without FROM, three-table comma join, and multi-line comma join. Plus an E009 line added totests/fixtures/errors.sql.Four commits on this branch. The first is a behaviour-preserving refactor lifting
strip_strings_and_commentsfromstructural.pytobase.pyso E009 reuses it. The next adds the rule inerrors.pyand registers it; registry counts move to 40 total and 11 errors. The third adds tests and the fixture line. The fourth documents the rule in CHANGELOG under[Unreleased]Added.Full suite: 226 pass, 1 skipped. No regressions. Commits signed (ED25519).