Summary
Working towards support of 3.12 in openscmrunner hits some issues in scmdata.
ScmRun.groupby raises TypeError: Cannot interpret '<StringDtype(...)>' as a data type on Python 3.12 with pandas 3.x and numpy 2.x. This blocks any caller that goes through ScmRun.convert_unit (which internally calls groupby), so it breaks downstream code that worked on Python 3.11 / numpy 1.x.
Repro
import scmdata
import pandas as pd
df = pd.DataFrame(
[[1.0, 2.0]],
index=pd.MultiIndex.from_tuples(
[("FaIR", "ssp245", "m", "World", "Emissions|CO2", "GtC/yr", 0)],
names=[
"climate_model", "scenario", "model", "region",
"variable", "unit", "run_id",
],
),
columns=[2020, 2021],
)
run = scmdata.ScmRun(df)
run.convert_unit("PgC/yr", variable="Emissions|CO2")
Traceback:
File ".../scmdata/groupby.py", line 61, in __init__
if any([np.issubdtype(m[c].dtype, np.number) for c in m]):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../numpy/_core/numerictypes.py", line 534, in issubdtype
arg1 = dtype(arg1).type
^^^^^^^^^^^
TypeError: Cannot interpret '<StringDtype(storage='python', na_value=nan)>' as a data type
Environment
- Python 3.12.12
- scmdata 0.18.0
- numpy 2.4.6
- pandas 3.0.3
- macOS (also reproduces on Linux per CI of a downstream project)
Root cause
scmdata/groupby.py:61 calls np.issubdtype(m[c].dtype, np.number) for each meta column. Under pandas 3.x, string-valued meta columns default to StringDtype rather than object, and numpy 2.x rejects StringDtype as an argument to np.issubdtype (it cannot be coerced via dtype()). On Python 3.11 / numpy 1.x the same call returned False silently, so the bug only surfaces on the newer stack.
Suggested fix
Guard the issubdtype call against non-numpy-coercible dtypes. Two options:
# (a) Use the dtype.kind shortcut, which is well-defined for all pandas dtypes:
if any(getattr(m[c].dtype, "kind", "O") in "biufc" for c in m):
# (b) Or wrap in try/except and treat unknown dtypes as non-numeric (semantically
# correct: a StringDtype is not numeric):
def _is_numeric(dtype):
try:
return np.issubdtype(dtype, np.number)
except TypeError:
return False
if any(_is_numeric(m[c].dtype) for c in m):
(a) is cheaper and more idiomatic. Happy to PR whichever you'd prefer.
Downstream impact
This blocks Python 3.12 support in openscm/openscm-runner (any path that round-trips through convert_unit). Filed from work at github.com/benmsanderson/openscm-runner (AR7 modernisation fork).
Summary
Working towards support of 3.12 in openscmrunner hits some issues in scmdata.
ScmRun.groupbyraisesTypeError: Cannot interpret '<StringDtype(...)>' as a data typeon Python 3.12 with pandas 3.x and numpy 2.x. This blocks any caller that goes throughScmRun.convert_unit(which internally callsgroupby), so it breaks downstream code that worked on Python 3.11 / numpy 1.x.Repro
Traceback:
Environment
Root cause
scmdata/groupby.py:61callsnp.issubdtype(m[c].dtype, np.number)for each meta column. Under pandas 3.x, string-valued meta columns default toStringDtyperather thanobject, and numpy 2.x rejectsStringDtypeas an argument tonp.issubdtype(it cannot be coerced viadtype()). On Python 3.11 / numpy 1.x the same call returnedFalsesilently, so the bug only surfaces on the newer stack.Suggested fix
Guard the
issubdtypecall against non-numpy-coercible dtypes. Two options:(a) is cheaper and more idiomatic. Happy to PR whichever you'd prefer.
Downstream impact
This blocks Python 3.12 support in
openscm/openscm-runner(any path that round-trips throughconvert_unit). Filed from work at github.com/benmsanderson/openscm-runner (AR7 modernisation fork).