SNOW-2257191: Bugfix join bug due to dataframe alias#3685
Closed
sfc-gh-aalam wants to merge 2829 commits into
Closed
SNOW-2257191: Bugfix join bug due to dataframe alias#3685sfc-gh-aalam wants to merge 2829 commits into
sfc-gh-aalam wants to merge 2829 commits into
Conversation
…troduce `_spark_session_tz` param (#3659)
…function name is out of spec (#3691)
…ument` in functions (#3697)
…ine (#3610) Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>
…3706) This is working towards running most of our snowpandas tests with hybrid mode.
…sts for the integration module (#3715) The new test parameter is called '--enable_modin_hybrid_mode' which is only applied to the integ modin module. This is not used yet; but it allows for enabling hybrid in an adhoc way. Eventually there will be a new pre-commit test which enables hybrid just for the integration modin module. This change also disables the sql_counter when running under hybrid mode; because virtually no sql queries are issued.
…das (#3717) SNOW-2305345 - Eliminate duplicate casing parameter checks in snowpandas While working on SHOW OBJECT usage to see if we can fetch row size quickly I noticed we issue SHOW PARAMETERS LIKE 'QUOTED_IDENTIFIERS_IGNORE_CASE' IN SESSION queries every time we fetch the session. This is done to issue a warning, but we really only need to do this once.
…#3975) While testing #3973, I noticed that aggregations on single-column frames/series were producing queries with JSON serialization and unnecessary UNPIVOT operations. The QC's `transpose_single_row` helper method is used in aggregations to skip a PIVOT operation used in the general transpose case, but for transposing a 1x1 frame, we don't even need to UNPIVOT and need only re-label the index since we already know that the column's dtype will not change. This PR adds a fast path for 1x1 `transpose_single_row` operations, which replaces JSON/UNPIVOT operations with simple projections. It produces some modest performance improvements for operations on a 2000x1 frame: - `DataFrame.count`: 1.48s -> 1.31s (11.2% improvement) - `DataFrame.describe`: 2.64s -> 2.36s (10.9% improvement) - `DataFrame.nunique`: 1.25s -> 1.21s (3.4% improvement) These improvements are likely to be more noticeable on frame produced from more complex queries. This PR also adds explicit row count caching for the general transpose case. We currently cannot directly use the `transpose_single_row` path for the `transpose` API itself since the helper function drops the column labels of the result.
…of driver reference on top level (#3897)
…ces) in faster pandas (#3984)
…umn names like '"ab"' and 'ab' (#3986)
…k/weekday/dayofyear/isocalendar (already supported in faster pandas) (#3992)
50b5998 to
74f91f2
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.
Fixes SNOW-2257191
Fill out the following pre-review checklist:
Please describe how your code solves the related issue.
In this PR we fix update of
df_aliased_col_name_to_real_col_namechild to parent by making sure all dictionaries within the default dict are copied by value instead of reference.