Commit e4584e2
SNOW-2372686: Write large_pandas_backend_df.to_snowflake() via parquet. (#3820)
To implement to_snowflake() for a Snowpark pandas dataframe on the pandas backend with large enough data, upload data via a parquet file instead of via a Snowpark dataframe. The Snowpark dataframe typically inserts values through parametrized SQL queries.
Benchmarking showed that a good threshold to switch to parquet was roughly 3 MB, so I've set that as the configurable default switching threshold. Performance of this approach seems to improve with dataset size. Exporting an 800 MB dataframe took about 55 seconds via parquet versus about 429 seconds via the old method, so we get over 7x speedup.
We can take a similar approach to speed up pandas_backend_df.move_to('snowflake').
Here are benchmark results with a 3XL warehouse:
<img width="571" height="432" alt="to_snowflake_timing_2" src="https://github.com/user-attachments/assets/0f652f6e-4510-4a94-9a75-028f5e09f2b0" />
Signed-off-by: sfc-gh-mvashishtha <mahesh.vashishtha@snowflake.com>
Co-authored-by: Jonathan Shi <149419494+sfc-gh-joshi@users.noreply.github.com>1 parent 183a392 commit e4584e2
8 files changed
Lines changed: 553 additions & 150 deletions
File tree
- src/snowflake/snowpark/modin
- config
- plugin
- _internal
- compiler
- extensions
- tests/integ/modin/frame
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
126 | 126 | | |
127 | 127 | | |
128 | 128 | | |
| 129 | + | |
129 | 130 | | |
130 | 131 | | |
131 | 132 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
116 | 116 | | |
117 | 117 | | |
118 | 118 | | |
119 | | - | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
120 | 134 | | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
121 | 138 | | |
122 | 139 | | |
123 | 140 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
980 | 980 | | |
981 | 981 | | |
982 | 982 | | |
| 983 | + | |
| 984 | + | |
| 985 | + | |
| 986 | + | |
| 987 | + | |
| 988 | + | |
| 989 | + | |
| 990 | + | |
| 991 | + | |
| 992 | + | |
| 993 | + | |
| 994 | + | |
| 995 | + | |
| 996 | + | |
| 997 | + | |
| 998 | + | |
| 999 | + | |
| 1000 | + | |
| 1001 | + | |
| 1002 | + | |
| 1003 | + | |
| 1004 | + | |
| 1005 | + | |
| 1006 | + | |
| 1007 | + | |
| 1008 | + | |
| 1009 | + | |
| 1010 | + | |
| 1011 | + | |
| 1012 | + | |
| 1013 | + | |
| 1014 | + | |
983 | 1015 | | |
984 | 1016 | | |
985 | 1017 | | |
| |||
2300 | 2332 | | |
2301 | 2333 | | |
2302 | 2334 | | |
| 2335 | + | |
| 2336 | + | |
| 2337 | + | |
| 2338 | + | |
| 2339 | + | |
| 2340 | + | |
| 2341 | + | |
| 2342 | + | |
| 2343 | + | |
| 2344 | + | |
| 2345 | + | |
| 2346 | + | |
| 2347 | + | |
| 2348 | + | |
| 2349 | + | |
| 2350 | + | |
| 2351 | + | |
| 2352 | + | |
| 2353 | + | |
| 2354 | + | |
| 2355 | + | |
| 2356 | + | |
| 2357 | + | |
| 2358 | + | |
| 2359 | + | |
| 2360 | + | |
| 2361 | + | |
| 2362 | + | |
| 2363 | + | |
| 2364 | + | |
| 2365 | + | |
| 2366 | + | |
| 2367 | + | |
| 2368 | + | |
| 2369 | + | |
| 2370 | + | |
| 2371 | + | |
| 2372 | + | |
| 2373 | + | |
| 2374 | + | |
| 2375 | + | |
| 2376 | + | |
| 2377 | + | |
| 2378 | + | |
| 2379 | + | |
| 2380 | + | |
| 2381 | + | |
| 2382 | + | |
| 2383 | + | |
Lines changed: 13 additions & 35 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
95 | 95 | | |
96 | 96 | | |
97 | 97 | | |
98 | | - | |
99 | 98 | | |
100 | 99 | | |
101 | 100 | | |
| |||
350 | 349 | | |
351 | 350 | | |
352 | 351 | | |
353 | | - | |
| 352 | + | |
| 353 | + | |
354 | 354 | | |
355 | 355 | | |
356 | 356 | | |
| |||
366 | 366 | | |
367 | 367 | | |
368 | 368 | | |
369 | | - | |
370 | 369 | | |
371 | 370 | | |
372 | 371 | | |
| |||
383 | 382 | | |
384 | 383 | | |
385 | 384 | | |
| 385 | + | |
| 386 | + | |
386 | 387 | | |
387 | 388 | | |
388 | 389 | | |
| |||
1896 | 1897 | | |
1897 | 1898 | | |
1898 | 1899 | | |
1899 | | - | |
1900 | | - | |
1901 | | - | |
1902 | | - | |
1903 | | - | |
| 1900 | + | |
| 1901 | + | |
| 1902 | + | |
1904 | 1903 | | |
| 1904 | + | |
1905 | 1905 | | |
1906 | 1906 | | |
1907 | 1907 | | |
| |||
1920 | 1920 | | |
1921 | 1921 | | |
1922 | 1922 | | |
1923 | | - | |
1924 | | - | |
1925 | | - | |
1926 | | - | |
1927 | | - | |
1928 | | - | |
1929 | | - | |
1930 | | - | |
1931 | | - | |
1932 | | - | |
1933 | | - | |
1934 | | - | |
1935 | | - | |
1936 | | - | |
1937 | | - | |
1938 | | - | |
1939 | | - | |
| 1923 | + | |
| 1924 | + | |
| 1925 | + | |
| 1926 | + | |
1940 | 1927 | | |
1941 | 1928 | | |
1942 | 1929 | | |
| |||
2038 | 2025 | | |
2039 | 2026 | | |
2040 | 2027 | | |
| 2028 | + | |
2041 | 2029 | | |
2042 | | - | |
2043 | | - | |
2044 | | - | |
2045 | 2030 | | |
2046 | 2031 | | |
2047 | 2032 | | |
2048 | 2033 | | |
2049 | 2034 | | |
2050 | 2035 | | |
2051 | 2036 | | |
2052 | | - | |
2053 | | - | |
2054 | | - | |
2055 | | - | |
2056 | | - | |
2057 | | - | |
2058 | | - | |
2059 | 2037 | | |
2060 | 2038 | | |
2061 | 2039 | | |
| |||
Lines changed: 127 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
11 | 11 | | |
12 | 12 | | |
13 | 13 | | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
14 | 19 | | |
15 | 20 | | |
16 | 21 | | |
| |||
21 | 26 | | |
22 | 27 | | |
23 | 28 | | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
24 | 40 | | |
25 | 41 | | |
26 | 42 | | |
| |||
37 | 53 | | |
38 | 54 | | |
39 | 55 | | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
40 | 79 | | |
41 | 80 | | |
42 | 81 | | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
43 | 170 | | |
44 | 171 | | |
45 | 172 | | |
| |||
0 commit comments