Optimize cleaning providers for faster type fixes#2
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| updated_df = df.with_columns([ | ||
| pl.when(converted.is_not_null()) | ||
| .then(converted) | ||
| .when(invalid_mask) | ||
| .then(pl.lit(None, dtype=pl.Boolean)) | ||
| .otherwise(original) | ||
| .alias(col) |
There was a problem hiding this comment.
Boolean cleaner leaves column as strings
The refactored _clean_boolean_column builds a with_columns expression that mixes Boolean literals with the original string values but never casts the result. In Polars, combining Boolean and string branches produces a UTF-8 column, so the cleaned dataframe still contains strings like "true"/"false" rather than True/False, despite the report claiming type fixes. Downstream code that expects an actual Boolean dtype will continue to see strings, meaning the type-cleaning stage no longer enforces boolean typing. Consider casting the final expression to pl.Boolean (or casting the original branch) so the column becomes a real boolean series.
Useful? React with 👍 / 👎.
Summary
Testing
https://chatgpt.com/codex/tasks/task_e_6902641b0434832995bfa7ad87b3d6f9