Skip to content

Commit c95bcfb

Browse files
Support dask 2025.4.0
Dask 2025.4.0 optimizes multiple DataFrames together, which exposes division mismatches when assigning a pandas Series to a dask DataFrame column. The old reset_index/set_index workaround no longer avoids this. Replacing it with compute-assign-rewrap via dd.from_pandas, which builds a clean expression graph. This is safe because __getitem__ already computes the DataFrame to produce the Series being assigned.
1 parent 3fb77e7 commit c95bcfb

2 files changed

Lines changed: 4 additions & 4 deletions

File tree

cdisc_rules_engine/models/dataset/dask_dataset.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -81,9 +81,9 @@ def __setitem__(self, key, value):
8181
array_values = da.from_array(value, chunks=tuple(chunks))
8282
self._data[key] = array_values
8383
elif isinstance(value, pd.Series):
84-
self._data = self._data.reset_index()
85-
self._data = self._data.set_index("index")
86-
self._data[key] = value
84+
pdf = self._data.compute()
85+
pdf[key] = value.values
86+
self._data = dd.from_pandas(pdf, npartitions=1)
8787
elif isinstance(value, dd.DataFrame):
8888
for column in value:
8989
self._data[column] = value[column]

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ dependencies = [
1515
"cachetools >=6.1.0",
1616
"cdisc-library-client >=0.1.6",
1717
"click >=8.1.7",
18-
"dask[dataframe,array] >=2024.6.0, <2025.4.0",
18+
"dask[dataframe,array] >=2024.6.0",
1919
"fastparquet >=2024.2.0",
2020
"importlib-metadata >=8.5.0",
2121
"jsonata-python >=0.6.0",

0 commit comments

Comments
 (0)