Skip to content

SNOW-2387227: Use session-scoped session.#3831

Closed
sfc-gh-mvashishtha wants to merge 2684 commits into
mainfrom
mvashishtha/SNOW-2387227/use-session-scoped-session
Closed

SNOW-2387227: Use session-scoped session.#3831
sfc-gh-mvashishtha wants to merge 2684 commits into
mainfrom
mvashishtha/SNOW-2387227/use-session-scoped-session

Conversation

@sfc-gh-mvashishtha

Copy link
Copy Markdown
Contributor

No description provided.

sfc-gh-lju and others added 30 commits June 30, 2025 17:12
Signed-off-by: sfc-gh-mvashishtha <mahesh.vashishtha@snowflake.com>
Signed-off-by: sfc-gh-mvashishtha <mahesh.vashishtha@snowflake.com>
Co-authored-by: Jonathan Shi <149419494+sfc-gh-joshi@users.noreply.github.com>
…#3488)

Co-authored-by: Jonathan Shi <149419494+sfc-gh-joshi@users.noreply.github.com>
Co-authored-by: Hazem Elmeleegy <hazem.elmeleegy@snowflake.com>
…ndas methods (#3512)

When AutoSwitchBackend (hybrid execution PrPr) is enabled, the client will automatically switch to the pandas backend when a method registered via a register_*_not_implemented annotation is called.
Co-authored-by: Jonathan Shi <149419494+sfc-gh-joshi@users.noreply.github.com>
sfc-gh-jrose and others added 23 commits September 25, 2025 00:44
Make `__len__()` an alias for `ngroups`.

Signed-off-by: sfc-gh-mvashishtha <mahesh.vashishtha@snowflake.com>
…ctions ( part 1) (#3811)

Co-authored-by: Jamison Rose <Jamison.Rose@snowflake.com>
Add some changes that we need to support Modin 0.37.0:

- add a Snowflake extension for `Series.to_json` that raises `NotImplementedError`.
- `test_inplace_false_with_assignment_does_not_mutate_df` now passes
- change some expectations relating to merges

Signed-off-by: sfc-gh-mvashishtha <mahesh.vashishtha@snowflake.com>
Co-authored-by: Jonathan Shi <jonathan.shi@snowflake.com>
…t. (#3820)

To implement to_snowflake() for a Snowpark pandas dataframe on the pandas backend with large enough data, upload data via a parquet file instead of via a Snowpark dataframe. The Snowpark dataframe typically inserts values through parametrized SQL queries.

Benchmarking showed that a good threshold to switch to parquet was roughly 3 MB, so I've set that as the configurable default switching threshold. Performance of this approach seems to improve with dataset size. Exporting an 800 MB dataframe took about 55 seconds via parquet versus about 429 seconds via the old method, so we get over 7x speedup.

We can take a similar approach to speed up pandas_backend_df.move_to('snowflake').

Here are benchmark results with a 3XL warehouse:

<img width="571" height="432" alt="to_snowflake_timing_2" src="https://github.com/user-attachments/assets/0f652f6e-4510-4a94-9a75-028f5e09f2b0" />

Signed-off-by: sfc-gh-mvashishtha <mahesh.vashishtha@snowflake.com>
Co-authored-by: Jonathan Shi <149419494+sfc-gh-joshi@users.noreply.github.com>
Signed-off-by: sfc-gh-mvashishtha <mahesh.vashishtha@snowflake.com>
@sfc-gh-mvashishtha sfc-gh-mvashishtha added NO-CHANGELOG-UPDATES This pull request does not need to update CHANGELOG.md NO-PANDAS-CHANGEDOC-UPDATES This PR does not update Snowpark pandas docs labels Oct 1, 2025
@sfc-gh-aalam sfc-gh-aalam force-pushed the mvashishtha/SNOW-2387227/use-session-scoped-session branch from 870c417 to 81b505f Compare November 12, 2025 23:20
@github-actions github-actions Bot locked and limited conversation to collaborators Nov 12, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

NO-CHANGELOG-UPDATES This pull request does not need to update CHANGELOG.md NO-PANDAS-CHANGEDOC-UPDATES This PR does not update Snowpark pandas docs snowpark-pandas

Projects

None yet

Development

Successfully merging this pull request may close these issues.