fix: handle pandas 3.0 default StringDtype by filippsatverily · Pull Request #1777 · cdisc-org/cdisc-rules-engine

filippsatverily · 2026-06-22T21:35:28Z

Pandas 3.0 changes the default string dtype from object to StringDtype, which requires these changes:

Regex operators: .map() now returns nullable BooleanDtype, where pd.NA & True raises instead of returning False. Adds a _map_regex() helper that normalizes to numpy bool via .fillna(False).astype(bool), used by all prefix/suffix/matches regex operators.
Case-insensitive comparisons: .lower() on a non-string value (e.g. pd.NA) raises AttributeError. Guards with isinstance(target_val, str) before calling .lower().
Empty-column detection in record_count: checks dtype == "object" to identify string columns, which misses StringDtype. Uses pd.api.types.is_string_dtype() instead.
Date validation: simplifies the is_valid_date guard to not isinstance(date_string, str), which already handles None, pd.NA, and any other non-string type.

Tested scenarios:

Full pytest suite: 1746 passed, 11 skipped, 0 failed (pandas 2.3.3, dask 2025.12.0)
Ran validation on CDISC_Pilot_Study_v4_FIXED.json: 201 SUCCESS, 6 SKIPPED, 0 errors

filippsatverily · 2026-06-22T21:40:45Z

@SFJohnson24 another commit from #1745

fix: handle pandas 3.0 default StringDtype

b1eaee8

filippsatverily marked this pull request as ready for review June 22, 2026 21:40

filippsatverily mentioned this pull request Jun 22, 2026

Support pandas 3.0 #1745

Draft

Provide feedback