Fix S001 implicit-cross-join false negatives (aliases, schema-qual, multi-line)#46
Merged
Merged
Conversation
The previous detection regex `\bFROM\s+(\w+\s*,\s*\w+)` only matched
when the comma came immediately after the first word after FROM. Most
real-world comma-joins missed it: aliased tables, schema-qualified
names, three-or-more-way joins, and multi-line layouts all slipped
through silently. Reproductions on main:
FROM orders o, customers c WHERE ... -> not flagged
FROM raw.orders, raw.customers WHERE ... -> not flagged
FROM o1, o2, o3 WHERE ... -> not flagged
FROM orders o,\n customers c\nWHERE ... -> not flagged
Replaces the regex with a paren-depth-aware scan: from each FROM
keyword, walk forward tracking parenthesis depth, stop at WHERE /
GROUP BY / ORDER BY / HAVING / LIMIT / explicit JOIN / UNION / EXCEPT
/ INTERSECT, and flag the first depth-0 comma. Strings and comments
are stripped first so commas inside them do not register.
Two suppression cases added:
- DELETE FROM is skipped (single-target DELETE; bare comma after
DELETE FROM is invalid syntax in most dialects)
- Comma followed by LATERAL is treated as a legitimate
Snowflake/Postgres lateral join, not flagged
This addresses #41 (the W027 comma-join request) — the rule already
existed as S001 but was not catching the patterns it claimed to
cover. Improving S001 in place avoids a duplicate W027.
14 new tests under TestImplicitCrossJoin: aliases, schema-qualified names, three-or-more-way joins, multi-line layout, comma-joins inside CTEs, plus the suppression cases (function-call commas, subquery commas, VALUES rows, LATERAL after comma, LEFT OUTER JOIN, string literals with commas, DELETE FROM, comments between tables). Each case has a one-line comment explaining what it pins down. Total tests in this class go from 3 to 17. Full suite stays at 226 passing.
[pre-commit.ci] auto-applied fixes from configured hooks
This was referenced May 8, 2026
Pawansingh3889
added a commit
that referenced
this pull request
May 9, 2026
…eck (#50) The previous pr_title.yml workflow used amannn/action-semantic-pull-request@v5 to enforce strict Conventional Commits format on every PR title (feat:, fix:, etc.) plus a lowercase-subject regex. That conflicted with the project's actual title convention ("Add Wxxx rule-name (description)", "Fix Snnn false negatives ...") and so the validate check went red on every merged PR (e.g. #46, #47, #48, #49). Branch protection does not require this check, so the merge button stayed enabled and the failures shipped silently in the merged commits' status timelines. Replacing the strict format enforcer with a small inline shell script that catches the actual problems we have hit in practice: 1. Empty / whitespace-only titles -- noticeable in `gh pr list`. 2. Auto-generated branch-name titles (e.g. "Feat/w024-select-distinct -suspicious") that GitHub picks when an async PR-template render wipes out a freshly typed title. Matches a regex of the form ^[A-Z][a-z]+/[a-z0-9-] which catches the real cases without touching anything that looks like a sentence. 3. Titles longer than 100 characters, which truncate badly in the UI. Project titles like "Add W024 select-distinct-suspicious (...)" pass the sanity check; titles like "Feat/foo-bar" fail. Verified the regex matrix locally before committing. To restore stricter enforcement later, swap the inline script back for amannn/action-semantic-pull-request@v5 and adjust the project title convention to start with feat: / fix: prefixes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #41.
S001 (
implicit-cross-join) already exists, but the detection regex\bFROM\s+(\w+\s*,\s*\w+)only matches when the comma comes immediately after the first word afterFROM. Most real-world comma-joins slip through silently.Reproductions on
main(none of these flag):Replaces the regex with a paren-depth-aware scan. From each
FROMkeyword, walk forward tracking parenthesis depth, stop atWHERE/GROUP BY/ORDER BY/HAVING/LIMIT/FETCH/OFFSET/ explicitJOIN(all variants) /UNION/EXCEPT/INTERSECT, and flag the first depth-0 comma. Strings and comments are stripped first so commas inside literals or comments do not trigger.Two deliberate suppression cases.
DELETE FROMis skipped because a single-target DELETE is the normal case in standard SQL and bareDELETE FROM x, yis invalid in Postgres, SQL Server, and MySQL. Comma followed byLATERALis recognised as a legitimate Snowflake / Postgres lateral join and not flagged, soFROM events e, LATERAL FLATTEN(input => e.tags)andFROM t1, LATERAL (SELECT ...)both pass.Why fix S001 rather than ship a duplicate W027: the rule already existed and was simply not catching the patterns it claimed to cover. Adding W027 would have left the buggy S001 in place. Closing #41 here.
Tests: 14 new regression cases under
TestImplicitCrossJoincovering aliases, schema-qualified names, three-or-more-way joins, multi-line layout, comma-join inside CTE, plus all the suppression cases. Test class size goes from 3 to 17. Full suite: 226 pass and 1 skipped, no regressions.CHANGELOG entry added under
[Unreleased]Fixed.Commits signed (ED25519).