SNOW-3384967: reduce describe query generated by alias by sfc-gh-yuwang · Pull Request #4183 · snowflakedb/snowpark-python

sfc-gh-yuwang · 2026-04-16T23:11:42Z

Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

Fixes SNOW-NNNNNNN
Fill out the following pre-review checklist:
- I am adding a new automated test(s) to verify correctness of my new code
  - If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing
- I am adding new logging messages
- I am adding a new telemetry message
- I am adding new credentials
- I am adding a new dependency
- If this is a new feature/behavior, I'm adding the Local Testing parity changes.
- I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: Thread-safe Developer Guidelines
- If adding any arguments to public Snowpark APIs or creating new public Snowpark APIs, I acknowledge that I have ensured my changes include AST support. Follow the link for more information: AST Support Guidelines
Please describe how your code solves the related issue.

When describe reduction is on and the inner select already has resolved
attributes, infer new.attributes for this outer select by reusing datatype and
nullable from the subquery: (0) skip if parent column names collide, (1) index
attributes by normalized name, (2) walk new.projection, (3) only handle plain
columns or Alias(column), (4) resolve source via quoted-identifier-aware lookup,
(5) assign only if every output column was inferred (length matches projection).

https://snowpark-python-001.jenkinsdev1.us-west-2.aws-dev.app.snowflake.com/view/SnowparkPython/job/PythonStoredProcBuildSnowfortTest/860/
https://snowpark-python-001.jenkinsdev1.us-west-2.aws-dev.app.snowflake.com/job/SnowparkConnectRegressionRunner/486/
snowpark and scos regression test to verify the change

github-actions · 2026-04-16T23:11:55Z

All contributors have signed the CLA ✍️ ✅
_{Posted by the CLA Assistant Lite bot.}

codecov-commenter · 2026-04-16T23:47:23Z

Codecov Report

❌ Patch coverage is 97.43590% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 95.41%. Comparing base (160c418) to head (f198187).

Files with missing lines	Patch %	Lines
...ke/snowpark/_internal/analyzer/select_statement.py	97.43%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #4183      +/-   ##
==========================================
+ Coverage   95.17%   95.41%   +0.24%     
==========================================
  Files         171      171              
  Lines       43801    43840      +39     
  Branches     7505     7517      +12     
==========================================
+ Hits        41686    41832     +146     
+ Misses       1294     1227      -67     
+ Partials      821      781      -40

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

sfc-gh-aling · 2026-04-21T21:38:10Z

+        # column was inferred (length matches projection).
+        if self._session.reduce_describe_query_enabled and self.attributes is not None:
+            # subquery lookup by name
+            attributes_by_name = {attr.name: attr for attr in self.attributes}


could there a name collision?

https://github.com/snowflakedb/snowpark-python/pull/4175/changes#diff-ba9463440d935e1d5667a2909f7d2b8061f56de7bdabd8207a29903e9eae8f59

do we have conclusion on:

whether there will be name collision?

if there is collision, is the logic still safe?

by collision I mean in the last project, same name column showing up twice: "a": StringType , "a": IntType

sfc-gh-joshi · 2026-04-21T22:16:52Z

        _ = df.col("a")


+def test_project_alias_infers_attributes_from_parent_metadata(session):


Can you add tests for more complicated aliasing scenarios like what Adam described? Can you also test some cases where the describe query cannot be skipped, even with reduce_describe_query_enabled set?

sfc-gh-aling · 2026-04-24T18:02:37Z

+            # subquery lookup by name
+            attributes_by_name = {attr.name: attr for attr in self.attributes}
+            inferred_attributes: List[Attribute] = []
+            assert new.projection is not None


this potentially raises assertion error which will break customers

could new.project be None and what happen if new.project is None before?
should we just fall back/skip optimization in this case?

sfc-gh-aling · 2026-04-24T18:07:11Z

+                    inferred_attributes = []
+                    break
+
+                source_attr = attributes_by_name.get(source_column_name)


do we need to handle double quoted identifiers?

sfc-gh-aling · 2026-04-24T18:17:02Z

+        # else aborts without setting partial attributes, (4) map each case to an
+        # Attribute named for the projected column, (5) assign only if every output
+        # column was inferred (length matches projection).
+        if self._session.reduce_describe_query_enabled and self.attributes is not None:


we can call this out in the changelog. this is a snowpark public facing improvement

sfc-gh-aling

let's use a scos artificial workload to verify the improvement and paste the number here

sfc-gh-yuwang · 2026-04-24T20:39:59Z

tests/workload_tests/artificial_workloads/test_artificial_workload_09_26.py
for above workload, it reduce 6 queries in total

sfc-gh-aling · 2026-04-24T23:02:49Z

+            # Skip: no projection to walk (do not assert; leave new.attributes unchanged).
+            if projection is not None:
+                # Skip: duplicate output names on the parent — dict/lookup would be ambiguous.
+                if len(parent_attributes) == len({a.name for a in parent_attributes}):


is this if condition redundant?

you are already doing the collision check in the 1612-1618 loop.
this 1609 is another O(N) loop

originally included this if to determine whether there are collision before we normalize the attr name, but I think we can remove this check to save a O(N) loop

sfc-gh-aling · 2026-04-27T22:39:44Z

+def _normalized_snowflake_identifier_key(name: str) -> str:
+    """Canonical quoted key: delimited identifiers preserve case; unquoted follow Snowflake uppercasing."""
+    if ALREADY_QUOTED.match(name):
+        return quote_name_without_upper_casing(unquote_if_quoted(name))
+    return quote_name(name)


as per discussion, we do not need this util, quote_name is good enough

sfc-gh-aling · 2026-04-27T22:41:08Z

please update to use quoted_name for attribute key loop up before merging the PR

sfc-gh-yuwang marked this pull request as ready for review April 21, 2026 02:34

sfc-gh-yuwang requested review from a team as code owners April 21, 2026 02:34

sfc-gh-yuwang requested review from sfc-gh-aalam, sfc-gh-aling, sfc-gh-jdu and sfc-gh-yixie and removed request for sfc-gh-aalam, sfc-gh-jdu and sfc-gh-yixie April 21, 2026 02:34

sfc-gh-yuwang added the NO-CHANGELOG-UPDATES This pull request does not need to update CHANGELOG.md label Apr 21, 2026

sfc-gh-aling reviewed Apr 21, 2026

View reviewed changes

Comment thread src/snowflake/snowpark/_internal/analyzer/select_statement.py Outdated

sfc-gh-aling reviewed Apr 21, 2026

View reviewed changes

sfc-gh-joshi reviewed Apr 21, 2026

View reviewed changes

sfc-gh-yuwang requested review from sfc-gh-aling and sfc-gh-joshi April 22, 2026 20:30

sfc-gh-joshi approved these changes Apr 22, 2026

View reviewed changes

sfc-gh-aling reviewed Apr 24, 2026

View reviewed changes

sfc-gh-yuwang force-pushed the SNOW-3384967 branch from 0a7bc27 to 2d1cbf1 Compare April 24, 2026 20:42

sfc-gh-aling reviewed Apr 24, 2026

View reviewed changes

sfc-gh-aling reviewed Apr 27, 2026

View reviewed changes

sfc-gh-aling approved these changes Apr 27, 2026

View reviewed changes

test remove alias describe query

a9096fb

sfc-gh-yuwang added 8 commits April 28, 2026 09:59

fix test

858a609

add comment

d4a9279

add tests

80cba92

add more test

fed4876

more defensive code

58c7a32

add changelog and remove redundant code

7e6042a

increase coverage

c3d8746

remove redundant code

01c423e

sfc-gh-yuwang force-pushed the SNOW-3384967 branch from ad2bd4e to 01c423e Compare April 28, 2026 16:59

Merge branch 'main' into SNOW-3384967

f198187

sfc-gh-yuwang merged commit c3cc950 into main Apr 29, 2026
29 checks passed

sfc-gh-yuwang deleted the SNOW-3384967 branch April 29, 2026 01:21

github-actions Bot locked and limited conversation to collaborators Apr 29, 2026

sfc-gh-yuwang restored the SNOW-3384967 branch April 30, 2026 20:13

		_ = df.col("a")


		def test_project_alias_infers_attributes_from_parent_metadata(session):

Conversation

sfc-gh-yuwang commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sfc-gh-aling left a comment

Choose a reason for hiding this comment

Uh oh!

sfc-gh-yuwang commented Apr 24, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sfc-gh-aling commented Apr 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

sfc-gh-yuwang commented Apr 16, 2026 •

edited

Loading

github-actions Bot commented Apr 16, 2026 •

edited

Loading

codecov-commenter commented Apr 16, 2026 •

edited

Loading