fix: Convert Polars container dtype entries to list in _get_cell (#778)#794
Conversation
machow
left a comment
There was a problem hiding this comment.
Thanks so much for working on this, and I'm so sorry for how long it took to review. I think this relates to #122 , and is incredibly useful!
I'm happy to pick things up if useful, but a few things I want to double check (mostly b/c I can't remember what happens in Great Tables right now for these cases, and even if they produce errors, we don't have to address them here necessarily)
- How do formatters respond to list cells? I think that formatters already might sometimes respond poorly to types they don't expect, so failing on lists might not be a problem
|
|
||
| # if container dtype, convert pl.Series to list | ||
| if isinstance(data[column].dtype, (pl.List, pl.Array)): | ||
| return data[column][row].to_list() |
There was a problem hiding this comment.
Right now, the code uses the dtype to infer when data[column][row] will be a Series. I wonder if it might be beneficial to check directly if the outcome is a series? rather than the dtype checks, something like...
res = data[column][row]
if isinstance(res, pl.Series): ...There was a problem hiding this comment.
Yeah, that sounds like a more readable/compact way. I assume efficiency differences would be negligible in this case.
There was a problem hiding this comment.
I see the pattern proposed here was partially implemented in 754f21c
|
@rich-iannone mentioned being down to investigate! |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #794 +/- ##
=======================================
Coverage 92.31% 92.32%
=======================================
Files 48 48
Lines 6039 6043 +4
=======================================
+ Hits 5575 5579 +4
Misses 464 464 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Thanks @machow Oh, didn't see #122, but glad it's useful.
Ah, you're quite right. Calling
Cool! Happy to think along if useful at all, though it sounds like a deeper bug. MREimport polars as pl
from great_tables import GT
import gt_extras as gte
df = pl.DataFrame({
"x": ["Example A", "Example B", "Example C"],
"col": [
[10, 40, 50],
[30, 30, 40],
[50, 20, 30],
],
"col2": [3, 5, 2]
})
gt = GT(df)
gt.fmt_number(columns="col", decimals=2) # <-- gives traceback belowShow traceback |
|
I ran a Claude review of this PR to get comments on the fix (and the issue with the failing documentation example). Here it is: I addressed issue 1 by updating |
rich-iannone
left a comment
There was a problem hiding this comment.
I added a few commits to address a test failure which exposed a larger issue. LGTM now!
Thanks @rich-iannone for looking into this and working on it! I was able to get this branch passing locally as well. I see one CI failing, but looks like that's unrelated to this PR (?). The only thing I notice is that the I don't have any more comments here at this point myself. But just let me know. |
machow
left a comment
There was a problem hiding this comment.
I've made the change I originally requested -- @rich-iannone do you mind merging if it looks okay (definitely feel free to push back, I made the change more to try and get this merged quickly :)
Summary
This pull request adds a small fix that ensures that
_get_cellreturns alistrather thanpl.Seriesif cell entry is a container dtype (pl.Listorpl.Array). This makes its behavior with polars dataframes consistent with _get_cell for pandas data frames which return lists for container dtypes.I went ahead and thought this was the best place to address #778 which surfaced as a syntax bug when running one of the
gt-extrastutorials with polars. Happy to consider other solutions in case it warrants a fix elsewhere in the code base!Changes
_tbl_data._get_cell: check if retrieved cell entry is either a container data type and convert to list.tests/test_tbl_data.py: addsdf_container_dtypesfixture andtest_get_cell_container_dtypestestRelated GitHub Issues and PRs
PlDataFrame._get_cell()type inconsistency breaks gt-extrasgt_plt_bar_stack#778Checklist