-
Notifications
You must be signed in to change notification settings - Fork 525
Sanitized special character column name before writing to parquet #590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Fokko
merged 20 commits into
apache:main
from
kevinjqliu:kevinjqliu/special-character-column-parquet
Apr 17, 2024
Merged
Changes from 3 commits
Commits
Show all changes
20 commits
Select commit
Hold shift + click to select a range
06dd647
write with sanitized column names
kevinjqliu d7b5147
push down to when parquet writes
kevinjqliu d278ee5
add test for writing special character column name
kevinjqliu 168931f
parameterize format_version
kevinjqliu ca11640
use to_requested_schema
kevinjqliu ce9a587
refactor to_requested_schema
kevinjqliu bf87a8a
more refactor
kevinjqliu 25bf991
test nested schema
kevinjqliu 3b6ecad
special character inside nested field
kevinjqliu f6a5ac2
comment on why arrow is enabled
kevinjqliu b51b5ce
use existing variable
kevinjqliu 41f5354
Merge branch 'main' into kevinjqliu/special-character-column-parquet
kevinjqliu d264ac3
move spark config to conftest
kevinjqliu 5de0b1c
pyspark arrow turns pandas df from tuple to dict
kevinjqliu e81472e
Revert refactor to_requested_schema
kevinjqliu f6b72e9
reorder args
kevinjqliu 22be232
Merge branch 'main' into kevinjqliu/special-character-column-parquet
kevinjqliu 9ea64c6
refactor
kevinjqliu e5f2611
pushdown schema
kevinjqliu 177a6b7
only tranform when necessary
kevinjqliu File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to change the Schema (column names) of the arrow data frame, if there's a better way to do this, please let me know
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we extend the integration test to test the nested schema case? For example,
Updated: I got
when trying with the following dataset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch! the i dont think
rename_columnsworks well with nested schema