Skip to content

Commit a119226

Browse files
author
The TensorFlow Datasets Authors
committed
Fix missing value handling in Radon dataset.
PiperOrigin-RevId: 911903022
1 parent b4c81e2 commit a119226

1 file changed

Lines changed: 7 additions & 3 deletions

File tree

tensorflow_datasets/datasets/radon/radon_dataset_builder.py

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -127,9 +127,13 @@ def _generate_examples(self, file_path_srrs2, file_path_cty):
127127
df = df.drop_duplicates(subset='idnum')
128128
df.drop('fips', axis=1, inplace=True)
129129

130-
df['wave'].replace({' .': '-1'}, inplace=True)
131-
df['rep'].replace({' .': '-1'}, inplace=True)
132-
df['zip'].replace({' ': '-1'}, inplace=True)
130+
# The raw data uses whitespace padding for missing values (e.g., " ." or
131+
# " "). We cast to string, strip whitespace, and explicitly assign the
132+
# result back to avoid pandas silent mutation failures with inplace dict
133+
# replacements on object columns.
134+
df['wave'] = df['wave'].astype(str).str.strip().replace('.', '-1')
135+
df['rep'] = df['rep'].astype(str).str.strip().replace('.', '-1')
136+
df['zip'] = df['zip'].astype(str).str.strip().replace('', '-1')
133137

134138
for i, (_, row) in enumerate(df.iterrows()):
135139
radon_val = row.pop('activity')

0 commit comments

Comments
 (0)