Skip to content

fix: Convert Polars container dtype entries to list in _get_cell (#778)#794

Merged
rich-iannone merged 12 commits into
posit-dev:mainfrom
KristijanArmeni:778-get-cell-polars
Mar 11, 2026
Merged

fix: Convert Polars container dtype entries to list in _get_cell (#778)#794
rich-iannone merged 12 commits into
posit-dev:mainfrom
KristijanArmeni:778-get-cell-polars

Conversation

@KristijanArmeni
Copy link
Copy Markdown
Contributor

Summary

This pull request adds a small fix that ensures that _get_cell returns a list rather than pl.Series if cell entry is a container dtype (pl.List or pl.Array). This makes its behavior with polars dataframes consistent with _get_cell for pandas data frames which return lists for container dtypes.

I went ahead and thought this was the best place to address #778 which surfaced as a syntax bug when running one of the gt-extras tutorials with polars. Happy to consider other solutions in case it warrants a fix elsewhere in the code base!

Changes

  • _tbl_data._get_cell: check if retrieved cell entry is either a container data type and convert to list.
  • tests/test_tbl_data.py: adds df_container_dtypes fixture and test_get_cell_container_dtypes test

Related GitHub Issues and PRs

Checklist

Copy link
Copy Markdown
Collaborator

@machow machow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for working on this, and I'm so sorry for how long it took to review. I think this relates to #122 , and is incredibly useful!

I'm happy to pick things up if useful, but a few things I want to double check (mostly b/c I can't remember what happens in Great Tables right now for these cases, and even if they produce errors, we don't have to address them here necessarily)

  • How do formatters respond to list cells? I think that formatters already might sometimes respond poorly to types they don't expect, so failing on lists might not be a problem

Comment thread great_tables/_tbl_data.py Outdated

# if container dtype, convert pl.Series to list
if isinstance(data[column].dtype, (pl.List, pl.Array)):
return data[column][row].to_list()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, the code uses the dtype to infer when data[column][row] will be a Series. I wonder if it might be beneficial to check directly if the outcome is a series? rather than the dtype checks, something like...

res = data[column][row]
if isinstance(res, pl.Series): ...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that sounds like a more readable/compact way. I assume efficiency differences would be negligible in this case.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the pattern proposed here was partially implemented in 754f21c

@machow
Copy link
Copy Markdown
Collaborator

machow commented Feb 5, 2026

@rich-iannone mentioned being down to investigate!

@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 5, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 92.32%. Comparing base (3596881) to head (7770f55).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #794   +/-   ##
=======================================
  Coverage   92.31%   92.32%           
=======================================
  Files          48       48           
  Lines        6039     6043    +4     
=======================================
+ Hits         5575     5579    +4     
  Misses        464      464           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@KristijanArmeni
Copy link
Copy Markdown
Contributor Author

Thanks so much for working on this, and I'm so sorry for how long it took to review. I think this relates to #122 , and is incredibly useful!

Thanks @machow Oh, didn't see #122, but glad it's useful.

How do formatters respond to list cells? I think that formatters already might sometimes respond poorly to types they don't expect, so failing on lists might not be a problem

Ah, you're quite right. Calling .fmt_number on a pl column with nested cell data, yields traceback below (see MRE). And it persists on this branch to my quick testing, so the the fix proposed in this PR doesn't cover this scenario and might need to be done elsewhere?

@rich-iannone mentioned being down to investigate!

Cool! Happy to think along if useful at all, though it sounds like a deeper bug.

MRE

import polars as pl
from great_tables import GT
import gt_extras as gte


df = pl.DataFrame({
    "x": ["Example A", "Example B", "Example C"],
    "col": [
        [10, 40, 50],
        [30, 30, 40],
        [50, 20, 30],
    ],
    "col2": [3, 5, 2]
})

gt = GT(df)

gt.fmt_number(columns="col", decimals=2)     # <-- gives traceback below
Show traceback
TypeError: unsupported format string passed to Series.__format__
File [/Applications/Positron.app/Contents/Resources/app/extensions/positron-python/python_files/lib/ipykernel/py3/IPython/core/formatters.py:344](vscode-file://vscode-app/Applications/Positron.app/Contents/Resources/app/out/vs/code/electron-browser/workbench/workbench.html#), in BaseFormatter.__call__(self, obj)
    342     method = get_real_method(obj, self.print_method)
    343     if method is not None:
--> 344         return method()
    345     return None
    346 else:
Hide Traceback
File [~/code/great-tables/great_tables/gt.py:306](vscode-file://vscode-app/Applications/Positron.app/Contents/Resources/app/out/vs/code/electron-browser/workbench/workbench.html#), in GT._repr_html_(self)
    303 make_page = defaults["make_page"]
    304 all_important = defaults["all_important"]
--> 306 rendered = self.as_raw_html(
    307     make_page=make_page,
    308     all_important=all_important,
    309 )
    311 return rendered

File [~/code/great-tables/great_tables/_export.py:229](vscode-file://vscode-app/Applications/Positron.app/Contents/Resources/app/out/vs/code/electron-browser/workbench/workbench.html#), in as_raw_html(self, inline_css, make_page, all_important)
    129 def as_raw_html(
    130     self: GT,
    131     inline_css: bool = False,
    132     make_page: bool = False,
    133     all_important: bool = False,
    134 ) -> str:
    135     """
    136     Get the HTML content of a GT object.
    137 
   (...)
    226     ```
    227     """
--> 229     built_table = self._build_data(context="html")
    231     table_html = built_table._render_as_html(
    232         make_page=make_page,
    233         all_important=all_important,
    234     )
    236     if inline_css:

File [~/code/great-tables/great_tables/gt.py:324](vscode-file://vscode-app/Applications/Positron.app/Contents/Resources/app/out/vs/code/electron-browser/workbench/workbench.html#), in GT._build_data(self, context)
    321 def _build_data(self, context: str) -> Self:
    322     # Build the body of the table by generating a dictionary
    323     # of lists with cells initially set to nan values
--> 324     built = self._render_formats(context)
    326     if context == "latex":
    327         built = _migrate_unformatted_to_output(
    328             data=built, data_tbl=self._tbl_data, formats=self._formats, context=context
    329         )

File [~/code/great-tables/great_tables/gt.py:317](vscode-file://vscode-app/Applications/Positron.app/Contents/Resources/app/out/vs/code/electron-browser/workbench/workbench.html#), in GT._render_formats(self, context)
    314 new_body = self._body.copy()
    316 # TODO: this body method performs a mutation. Should we make a copy of body?
--> 317 new_body.render_formats(self._tbl_data, self._formats, context)
    318 new_body.render_formats(self._tbl_data, self._substitutions, context)
    319 return self._replace(_body=new_body)

File [~/code/great-tables/great_tables/_gt_data.py:196](vscode-file://vscode-app/Applications/Positron.app/Contents/Resources/app/out/vs/code/electron-browser/workbench/workbench.html#), in Body.render_formats(self, data_tbl, formats, context)
    194     raise Exception("Internal Error")
    195 for col, row in fmt.cells.resolve():
--> 196     result = eval_func(_get_cell(data_tbl, row, col))
    197     if isinstance(result, FormatterSkipElement):
    198         continue

File [~/code/great-tables/great_tables/_formats.py:368](vscode-file://vscode-app/Applications/Positron.app/Contents/Resources/app/out/vs/code/electron-browser/workbench/workbench.html#), in fmt_number_context(x, data, decimals, n_sigfig, drop_trailing_zeros, drop_trailing_dec_mark, use_seps, accounting, scale_by, compact, sep_mark, dec_mark, force_sign, pattern, context)
    356     x_formatted = _format_number_compactly(
    357         value=x,
    358         decimals=decimals,
   (...)
    365         force_sign=force_sign,
    366     )
    367 else:
--> 368     x_formatted = _value_to_decimal_notation(
    369         value=x,
    370         decimals=decimals,
    371         n_sigfig=n_sigfig,
    372         drop_trailing_zeros=drop_trailing_zeros,
    373         drop_trailing_dec_mark=drop_trailing_dec_mark,
    374         use_seps=use_seps,
    375         sep_mark=sep_mark,
    376         dec_mark=dec_mark,
    377         force_sign=force_sign,
    378     )
    380 # Implement minus sign replacement for `x_formatted` or use accounting style
    381 if is_negative:

File [~/code/great-tables/great_tables/_formats.py:3224](vscode-file://vscode-app/Applications/Positron.app/Contents/Resources/app/out/vs/code/electron-browser/workbench/workbench.html#), in _value_to_decimal_notation(value, decimals, n_sigfig, drop_trailing_zeros, drop_trailing_dec_mark, use_seps, sep_mark, dec_mark, force_sign)
   3211     result = _format_number_n_sigfig(
   3212         value=value,
   3213         n_sigfig=n_sigfig,
   (...)
   3217         preserve_integer=False,
   3218     )
   3220 else:
   3221     # If there is nothing provided to `n_sigfig` then the conventional decimal number formatting
   3222     # pathway is taken; this formats to a specific number of decimal places and removal of
   3223     # trailing zero values can be undertaken
-> 3224     result = _format_number_fixed_decimals(
   3225         value=value,
   3226         decimals=decimals,
   3227         drop_trailing_zeros=drop_trailing_zeros,
   3228         use_seps=use_seps,
   3229         sep_mark=sep_mark,
   3230         dec_mark=dec_mark,
   3231     )
   3233 # Drop the trailing decimal mark if it is present
   3234 if drop_trailing_dec_mark:

File [~/code/great-tables/great_tables/_formats.py:3344](vscode-file://vscode-app/Applications/Positron.app/Contents/Resources/app/out/vs/code/electron-browser/workbench/workbench.html#), in _format_number_fixed_decimals(value, decimals, drop_trailing_zeros, use_seps, sep_mark, dec_mark)
   3341 fmt_spec = f".{decimals}f"
   3343 # Get the formatted `x` value
-> 3344 value_str = format(value, fmt_spec)
   3346 # Very small or very large numbers can be represented in exponential
   3347 # notation but we don't want that; we need the string value to be fully
   3348 # expanded with zeros so if an 'e' is detected we'll use a helper function
   3349 # to ensure it's back to a number
   3350 if "e" in value_str or "E" in value_str:

@rich-iannone
Copy link
Copy Markdown
Member

I ran a Claude review of this PR to get comments on the fix (and the issue with the failing documentation example). Here it is:

Code Review: PR #794        
                                                                                                                                                               
  "fix: Convert Polars container dtype entries to list in _get_cell (#778)" by @KristijanArmeni                                                              
                                                                                                                                                               
  Summary                                                                                                                                                      

  The PR adds a 6-line fix to _get_cell so that Polars container dtypes (pl.List, pl.Array) return Python lists instead of pl.Series, matching the Pandas      
  behavior. Includes a parametrized test covering both dtype variants. Also includes minor import reordering in the test file.                                 
                                                                                                                                                               
  ---                                                                                                                                                          
  Issues Found (all scores, unfiltered)                                                                                                                        

  1. cast_frame_to_string does not handle pl.Array — Score: 75 (Standard 8)

  This is the likely cause of the CI docs-building error. The PR adds pl.Array handling to _get_cell but does not update cast_frame_to_string, which only
  checks for pl.List:

  great_tables/_tbl_data.py:583:
  list_cols = [
      name for name, dtype in df.schema.items() if issubclass(dtype.base_type(), pl.List)
  ]

  When as_latex() (or as_raw_html()) calls cast_frame_to_string on a DataFrame with pl.Array columns, it will attempt to cast(pl.String) on the Array column,
  which raises InvalidOperationError: cannot cast Array type. The fix would be to also check for pl.Array here, mirroring the _get_cell change.

  Standard 8 (Code Organization) says: "Use existing methods instead of reimplementing behavior... This keeps behavior consistent and avoids divergent code
  paths." The inconsistency between what _get_cell handles and what cast_frame_to_string handles is exactly this kind of divergent behavior.

  Severity: High — causes runtime failure for any table with pl.Array columns rendered to LaTeX or HTML.

  ---
  2. Test only asserts type, not values — Score: 60 (Standard 1)

  tests/test_tbl_data.py:104-106:
  def test_get_cell_container_dtypes(df_container_dtypes: pl.DataFrame):
      "Checks that container dtype entries in polars dfs are returned as lists"
      assert isinstance(_get_cell(df_container_dtypes, 1, "col2"), list)

  Standard 1 anti-pattern: "Asserting only that 'output exists' vs checking specific values." The test could also assert the actual content (e.g., assert
  result == [4, 5] for the List case). This would catch bugs where .to_list() returns incorrect values.

  Severity: Low — the test works as a regression test for the type issue, but is weaker than it could be.

  ---
  3. Import reordering mixed with feature — Score: 65 (Standard 7)

  The test file diff includes import alphabetization and blank-line changes unrelated to the bug fix. Standard 7 says: "One PR = one logical change." This is a
   very minor instance — likely auto-formatter output — but technically mixes refactoring with the feature.

  Severity: Negligible — does not affect reviewability in practice.

  ---
  4. Reviewer's alternative approach (check result type directly) — Score: 25

  @machow suggested on the PR: res = data[column][row]; if isinstance(res, pl.Series): return res.to_list(). This is more future-proof (handles any dtype
  returning a Series) and avoids double column access. Already discussed on the PR; not a new finding.

  Severity: N/A — existing discussion point, not a bug.

  ---
  5. isinstance inside dispatched function (Standard 3) — Score: 15 (false positive)

  Agent 6 flagged this as a Standard 3 violation, but the patterns audit (Agent 7) correctly identified it as consistent with existing codebase patterns. The
  isinstance check examines a column's dtype property, not the type of the function's input parameter. The same file uses this pattern in cast_frame_to_string
  (line 583) and in the Pandas _get_cell (line 236). Not a Standard 3 violation.

  ---
  Recommendation

  Issue #1 is the only actionable item. The cast_frame_to_string gap for pl.Array is almost certainly the root cause of the CI docs-building error. The fix is
  straightforward — update line 583 to also check pl.Array:

  list_cols = [
      name for name, dtype in df.schema.items()
      if issubclass(dtype.base_type(), (pl.List, pl.Array))
  ]

  No issues scored above the 80 threshold for auto-posting as a PR comment.

I addressed issue 1 by updating cast_frame_to_string in _tbl_data.py to check for both pl.List and pl.Array when identifying container-dtype columns. Previously only pl.List was checked and this would cause InvalidOperationError when rendering tables with pl.Array columns via as_latex() or as_raw_html(). This was the cause of the CI docs-building failure. Verified locally by running the exact failing coffee-sales LaTeX example (and passes now in CI).

@rich-iannone rich-iannone requested a review from machow February 19, 2026 16:54
@machow machow assigned machow and unassigned rich-iannone Mar 3, 2026
Copy link
Copy Markdown
Member

@rich-iannone rich-iannone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a few commits to address a test failure which exposed a larger issue. LGTM now!

@KristijanArmeni
Copy link
Copy Markdown
Contributor Author

I added a few commits to address a test failure which exposed a larger issue. LGTM now!

Thanks @rich-iannone for looking into this and working on it! I was able to get this branch passing locally as well. I see one CI failing, but looks like that's unrelated to this PR (?).

The only thing I notice is that the cell = data[column][row] assignment could be placed before the if condition check in _get_cell as @machow suggested in revision. Thus performing indexing only once, rather than twice. Possibly negligible effect.

I don't have any more comments here at this point myself. But just let me know.

Copy link
Copy Markdown
Collaborator

@machow machow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made the change I originally requested -- @rich-iannone do you mind merging if it looks okay (definitely feel free to push back, I made the change more to try and get this merged quickly :)

@rich-iannone rich-iannone merged commit ff3709c into posit-dev:main Mar 11, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants