Skip to content

Bump connector upper bound version #3835

Closed
sfc-gh-jdu wants to merge 2683 commits into
mainfrom
jdu-conn
Closed

Bump connector upper bound version #3835
sfc-gh-jdu wants to merge 2683 commits into
mainfrom
jdu-conn

Conversation

@sfc-gh-jdu

Copy link
Copy Markdown
Collaborator
  1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

    Fixes SNOW-NNNNNNN

  2. Fill out the following pre-review checklist:

    • I am adding a new automated test(s) to verify correctness of my new code
      • If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing
    • I am adding new logging messages
    • I am adding a new telemetry message
    • I am adding new credentials
    • I am adding a new dependency
    • If this is a new feature/behavior, I'm adding the Local Testing parity changes.
    • I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: Thread-safe Developer Guidelines
    • If adding any arguments to public Snowpark APIs or creating new public Snowpark APIs, I acknowledge that I have ensured my changes include AST support. Follow the link for more information: AST Support Guidelines
  3. Please describe how your code solves the related issue.

    Please write a short description of how your code change solves the related issue.

sfc-gh-joshi and others added 30 commits June 30, 2025 15:00
…nd native Series constructor switching bugs (#3498)

SNOW-2157873 occurs because upstream modin does not implement __array_function__, instead converting to numpy ndarrays via __array__ when a numpy function is called on it. The presence of the extension wrapper for __array_function__ introduced by Snowpark pandas confuses numpy dispatch, causing unexpected AttributeErrors. This is fixed upstream with modin-project/modin#7617, and will presumably become available in the next modin release. On the Snowpark side, this PR adds relevant tests, and adds a version-guarded flag to remove the extension function and push it down to the query compiler.

SNOW-2173644 occurs in specific circumstances when determining switching conditions for the DataFrame constructor. Series objects are treated as dict-like, but Series.values is a property rather than a function. We thus skip over native_pd.Series objects in the dict-like check in move_to_me_cost.
Signed-off-by: sfc-gh-mvashishtha <mahesh.vashishtha@snowflake.com>
Signed-off-by: sfc-gh-mvashishtha <mahesh.vashishtha@snowflake.com>
Co-authored-by: Jonathan Shi <149419494+sfc-gh-joshi@users.noreply.github.com>
…#3488)

Co-authored-by: Jonathan Shi <149419494+sfc-gh-joshi@users.noreply.github.com>
Co-authored-by: Hazem Elmeleegy <hazem.elmeleegy@snowflake.com>
…ndas methods (#3512)

When AutoSwitchBackend (hybrid execution PrPr) is enabled, the client will automatically switch to the pandas backend when a method registered via a register_*_not_implemented annotation is called.
Co-authored-by: Jonathan Shi <149419494+sfc-gh-joshi@users.noreply.github.com>
sfc-gh-yuwang and others added 19 commits September 26, 2025 11:04
…ctions ( part 1) (#3811)

Co-authored-by: Jamison Rose <Jamison.Rose@snowflake.com>
Add some changes that we need to support Modin 0.37.0:

- add a Snowflake extension for `Series.to_json` that raises `NotImplementedError`.
- `test_inplace_false_with_assignment_does_not_mutate_df` now passes
- change some expectations relating to merges

Signed-off-by: sfc-gh-mvashishtha <mahesh.vashishtha@snowflake.com>
Co-authored-by: Jonathan Shi <jonathan.shi@snowflake.com>
…t. (#3820)

To implement to_snowflake() for a Snowpark pandas dataframe on the pandas backend with large enough data, upload data via a parquet file instead of via a Snowpark dataframe. The Snowpark dataframe typically inserts values through parametrized SQL queries.

Benchmarking showed that a good threshold to switch to parquet was roughly 3 MB, so I've set that as the configurable default switching threshold. Performance of this approach seems to improve with dataset size. Exporting an 800 MB dataframe took about 55 seconds via parquet versus about 429 seconds via the old method, so we get over 7x speedup.

We can take a similar approach to speed up pandas_backend_df.move_to('snowflake').

Here are benchmark results with a 3XL warehouse:

<img width="571" height="432" alt="to_snowflake_timing_2" src="https://github.com/user-attachments/assets/0f652f6e-4510-4a94-9a75-028f5e09f2b0" />

Signed-off-by: sfc-gh-mvashishtha <mahesh.vashishtha@snowflake.com>
Co-authored-by: Jonathan Shi <149419494+sfc-gh-joshi@users.noreply.github.com>
@sfc-gh-jdu sfc-gh-jdu requested review from a team as code owners October 2, 2025 00:27
@github-actions github-actions Bot locked and limited conversation to collaborators Nov 12, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.