Skip to content

Fix S001 implicit-cross-join false negatives (aliases, schema-qual, multi-line)#46

Merged
Pawansingh3889 merged 4 commits into
mainfrom
fix/s001-comma-join-false-negatives
May 7, 2026
Merged

Fix S001 implicit-cross-join false negatives (aliases, schema-qual, multi-line)#46
Pawansingh3889 merged 4 commits into
mainfrom
fix/s001-comma-join-false-negatives

Conversation

@Pawansingh3889
Copy link
Copy Markdown
Owner

Closes #41.

S001 (implicit-cross-join) already exists, but the detection regex \bFROM\s+(\w+\s*,\s*\w+) only matches when the comma comes immediately after the first word after FROM. Most real-world comma-joins slip through silently.

Reproductions on main (none of these flag):

SELECT * FROM orders o, customers c WHERE o.cust_id = c.id;
SELECT * FROM raw.orders, raw.customers WHERE raw.orders.cust_id = raw.customers.id;
SELECT * FROM o1, o2, o3 WHERE o1.id = o2.id AND o2.id = o3.id;
SELECT *
FROM orders o,
     customers c
WHERE o.cust_id = c.id;

Replaces the regex with a paren-depth-aware scan. From each FROM keyword, walk forward tracking parenthesis depth, stop at WHERE / GROUP BY / ORDER BY / HAVING / LIMIT / FETCH / OFFSET / explicit JOIN (all variants) / UNION / EXCEPT / INTERSECT, and flag the first depth-0 comma. Strings and comments are stripped first so commas inside literals or comments do not trigger.

Two deliberate suppression cases. DELETE FROM is skipped because a single-target DELETE is the normal case in standard SQL and bare DELETE FROM x, y is invalid in Postgres, SQL Server, and MySQL. Comma followed by LATERAL is recognised as a legitimate Snowflake / Postgres lateral join and not flagged, so FROM events e, LATERAL FLATTEN(input => e.tags) and FROM t1, LATERAL (SELECT ...) both pass.

Why fix S001 rather than ship a duplicate W027: the rule already existed and was simply not catching the patterns it claimed to cover. Adding W027 would have left the buggy S001 in place. Closing #41 here.

Tests: 14 new regression cases under TestImplicitCrossJoin covering aliases, schema-qualified names, three-or-more-way joins, multi-line layout, comma-join inside CTE, plus all the suppression cases. Test class size goes from 3 to 17. Full suite: 226 pass and 1 skipped, no regressions.

CHANGELOG entry added under [Unreleased] Fixed.

Commits signed (ED25519).

Pawansingh3889 and others added 4 commits May 6, 2026 06:40
The previous detection regex `\bFROM\s+(\w+\s*,\s*\w+)` only matched
when the comma came immediately after the first word after FROM. Most
real-world comma-joins missed it: aliased tables, schema-qualified
names, three-or-more-way joins, and multi-line layouts all slipped
through silently. Reproductions on main:

  FROM orders o, customers c WHERE ...                  -> not flagged
  FROM raw.orders, raw.customers WHERE ...              -> not flagged
  FROM o1, o2, o3 WHERE ...                             -> not flagged
  FROM orders o,\n     customers c\nWHERE ...            -> not flagged

Replaces the regex with a paren-depth-aware scan: from each FROM
keyword, walk forward tracking parenthesis depth, stop at WHERE /
GROUP BY / ORDER BY / HAVING / LIMIT / explicit JOIN / UNION / EXCEPT
/ INTERSECT, and flag the first depth-0 comma. Strings and comments
are stripped first so commas inside them do not register.

Two suppression cases added:

  - DELETE FROM is skipped (single-target DELETE; bare comma after
    DELETE FROM is invalid syntax in most dialects)
  - Comma followed by LATERAL is treated as a legitimate
    Snowflake/Postgres lateral join, not flagged

This addresses #41 (the W027 comma-join request) — the rule already
existed as S001 but was not catching the patterns it claimed to
cover. Improving S001 in place avoids a duplicate W027.
14 new tests under TestImplicitCrossJoin: aliases, schema-qualified
names, three-or-more-way joins, multi-line layout, comma-joins inside
CTEs, plus the suppression cases (function-call commas, subquery
commas, VALUES rows, LATERAL after comma, LEFT OUTER JOIN, string
literals with commas, DELETE FROM, comments between tables). Each
case has a one-line comment explaining what it pins down.

Total tests in this class go from 3 to 17. Full suite stays at 226
passing.
[pre-commit.ci] auto-applied fixes from configured hooks
@Pawansingh3889 Pawansingh3889 merged commit 47db8f0 into main May 7, 2026
5 of 7 checks passed
Pawansingh3889 added a commit that referenced this pull request May 9, 2026
…eck (#50)

The previous pr_title.yml workflow used amannn/action-semantic-pull-request@v5
to enforce strict Conventional Commits format on every PR title (feat:,
fix:, etc.) plus a lowercase-subject regex. That conflicted with the
project's actual title convention ("Add Wxxx rule-name (description)",
"Fix Snnn false negatives ...") and so the validate check went red on
every merged PR (e.g. #46, #47, #48, #49). Branch protection does not
require this check, so the merge button stayed enabled and the failures
shipped silently in the merged commits' status timelines.

Replacing the strict format enforcer with a small inline shell script
that catches the actual problems we have hit in practice:

1. Empty / whitespace-only titles -- noticeable in `gh pr list`.
2. Auto-generated branch-name titles (e.g. "Feat/w024-select-distinct
   -suspicious") that GitHub picks when an async PR-template render
   wipes out a freshly typed title. Matches a regex of the form
   ^[A-Z][a-z]+/[a-z0-9-] which catches the real cases without
   touching anything that looks like a sentence.
3. Titles longer than 100 characters, which truncate badly in the UI.

Project titles like "Add W024 select-distinct-suspicious (...)" pass
the sanity check; titles like "Feat/foo-bar" fail. Verified the regex
matrix locally before committing.

To restore stricter enforcement later, swap the inline script back for
amannn/action-semantic-pull-request@v5 and adjust the project title
convention to start with feat: / fix: prefixes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

rule: W027 comma-join - warn on comma-separated FROM clauses (implicit JOIN)

1 participant