Skip to content

fix flatten regression#4197

Merged
sfc-gh-aling merged 4 commits into
mainfrom
aling-fix-flatten-fitler
Apr 23, 2026
Merged

fix flatten regression#4197
sfc-gh-aling merged 4 commits into
mainfrom
aling-fix-flatten-fitler

Conversation

@sfc-gh-aling

@sfc-gh-aling sfc-gh-aling commented Apr 23, 2026

Copy link
Copy Markdown
Collaborator
  1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

    Fixes SNOW-NNNNNNN

  2. Fill out the following pre-review checklist:

    • I am adding a new automated test(s) to verify correctness of my new code
      • If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing
    • I am adding new logging messages
    • I am adding a new telemetry message
    • I am adding new credentials
    • I am adding a new dependency
    • If this is a new feature/behavior, I'm adding the Local Testing parity changes.
    • I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: Thread-safe Developer Guidelines
    • If adding any arguments to public Snowpark APIs or creating new public Snowpark APIs, I acknowledge that I have ensured my changes include AST support. Follow the link for more information: AST Support Guidelines
  3. Please describe how your code solves the related issue.

Commit 07689e0 ("Loosen flattening rules for sort and filter") loosened can_clause_dependent_columns_flatten so that, when both _is_snowpark_connect_compatible_mode and _snowpark_connect_flatten_select_after_sort are True, a filter depending on a NEW (e.g. aliased) column in a subquery is allowed to be flattened.
That loosening is correct for sort / select / distinct, but it is unsound for filter across LIMIT/OFFSET. Flattening pushes the WHERE in front of the LIMIT, which changes the result set:

-- Input DataFrame (in Snowpark Connect compatible mode):
src.select(col("ID").alias("id_a"))
  .select(col("id_a").alias("id_b"))
  .limit(5)
  .filter(col("id_b") > 3)
-- Before 07689e09c (correct):            -- After 07689e09c (buggy):
SELECTWHERE "id_b" > 3                 SELECTWHERE "id_b" > 3 LIMIT 5
FROM (… LIMIT 5)                          --       ^^^^^ WHERE now applied
                                         --             BEFORE LIMIT

Repro SQL on Snowpark Connect: SELECT * FROM (SELECT * FROM range(10) LIMIT 5) WHERE id > 3 used to return [4], now returns [4, 5, 6, 7, 8].

Fix
Narrow the loosening that 07689e0 introduced instead of reverting it. Inside the NEW-column branch of can_clause_dependent_columns_flatten — which is only reachable when both compat-mode flags are True — add a single guard that rejects flattening when clause == "filter" and the subquery has LIMIT or OFFSET:

elif dc_state.change_state == ColumnChangeState.NEW:
  ...
  if not (
      context._is_snowpark_connect_compatible_mode
      and context._snowpark_connect_flatten_select_after_sort
  ):
      return False
  # NEW: WHERE cannot cross LIMIT/OFFSET without changing semantics.
  if clause == "filter" and subquery_has_limit_or_offset:
      return False

The helper gains one optional kwarg, subquery_has_limit_or_offset: bool = False, passed in from SelectStatement.filter. SelectStatement.sort already short-circuits on not self.limit_ / not self.offset at the call site, so no change is needed there.

Is this a BCR?
No. Existing Snowpark customers (legacy mode) observe no change in generated SQL. The only behavioral delta is for Snowpark Connect / compatible-mode callers on the exact pattern the bad commit newly enabled — where we are restoring pre-commit semantics.

@github-actions

Copy link
Copy Markdown


Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution. You can sign the CLA by just posting a Pull Request Comment same as the below format.


I have read the CLA Document and I hereby sign the CLA


You can retrigger this bot by commenting recheck in this Pull Request. Posted by the CLA Assistant Lite bot.

@sfc-gh-aling sfc-gh-aling changed the title try fix fix flatten regression Apr 23, 2026
@codecov-commenter

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.16%. Comparing base (0aeccd9) to head (e2fccaa).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4197      +/-   ##
==========================================
- Coverage   95.41%   95.16%   -0.26%     
==========================================
  Files         171      171              
  Lines       43786    43788       +2     
  Branches     7502     7503       +1     
==========================================
- Hits        41778    41670     -108     
- Misses       1227     1295      +68     
- Partials      781      823      +42     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sfc-gh-aling sfc-gh-aling marked this pull request as ready for review April 23, 2026 15:57
@sfc-gh-aling sfc-gh-aling requested review from a team as code owners April 23, 2026 15:57
@sfc-gh-aling sfc-gh-aling merged commit f9f540f into main Apr 23, 2026
31 of 34 checks passed
@sfc-gh-aling sfc-gh-aling deleted the aling-fix-flatten-fitler branch April 23, 2026 16:59
@github-actions github-actions Bot locked and limited conversation to collaborators Apr 23, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants