Optimize cleaning providers for faster type fixes by LinklyLuck · Pull Request #2 · LinklyLuck/DataSearchAndCleanTool

LinklyLuck · 2025-10-29T19:44:15Z

Summary

refactor the type-based cleaning provider to use vectorized Polars expressions for numeric, date, and boolean normalization
skip unnecessary work for empty or already-typed columns while keeping detailed change reports intact
ensure the cleaning helpers expose clean returns with newline-terminated modules

Testing

python -m compileall cleaning_providers/type_cleaner.py cleaning_providers/init.py

https://chatgpt.com/codex/tasks/task_e_6902641b0434832995bfa7ad87b3d6f9

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2025-10-29T19:49:00Z

+        updated_df = df.with_columns([
+            pl.when(converted.is_not_null())
+            .then(converted)
+            .when(invalid_mask)
+            .then(pl.lit(None, dtype=pl.Boolean))
+            .otherwise(original)
+            .alias(col)


Boolean cleaner leaves column as strings

The refactored _clean_boolean_column builds a with_columns expression that mixes Boolean literals with the original string values but never casts the result. In Polars, combining Boolean and string branches produces a UTF-8 column, so the cleaned dataframe still contains strings like "true"/"false" rather than True/False, despite the report claiming type fixes. Downstream code that expects an actual Boolean dtype will continue to see strings, meaning the type-cleaning stage no longer enforces boolean typing. Consider casting the final expression to pl.Boolean (or casting the original branch) so the column becomes a real boolean series.

Useful? React with 👍 / 👎.

Optimize cleaning providers for faster type fixes

aced7c8

LinklyLuck added the codex label Oct 29, 2025 — with ChatGPT Codex Connector

chatgpt-codex-connector Bot reviewed Oct 29, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize cleaning providers for faster type fixes#2

Optimize cleaning providers for faster type fixes#2
LinklyLuck wants to merge 1 commit into
masterfrom
c17e2p-codex/modify-app.py-and-the_pipeline_v2.py

LinklyLuck commented Oct 29, 2025

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Oct 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

LinklyLuck commented Oct 29, 2025

Summary

Testing

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant