Commit 89c6d21
fix(datalake): _resolve_col_type uses frequency-first majority vote (open-metadata#28093)
A single date-parseable token (e.g. the surname "May") was enough to
flip an entire string column to DATETIME because _TYPE_PRECEDENCE puts
datetime64[ns] above str. The fix counts occurrences of each inferred
type in the sample and picks the most frequent one, breaking ties with
_TYPE_PRECEDENCE. A column with hundreds of plain strings and a handful
of month-name values now correctly resolves to STRING.
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>1 parent 4cef6a6 commit 89c6d21
2 files changed
Lines changed: 42 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
| 21 | + | |
21 | 22 | | |
22 | 23 | | |
23 | 24 | | |
| |||
53 | 54 | | |
54 | 55 | | |
55 | 56 | | |
56 | | - | |
57 | | - | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
58 | 69 | | |
59 | | - | |
| 70 | + | |
60 | 71 | | |
61 | | - | |
| 72 | + | |
62 | 73 | | |
63 | 74 | | |
64 | 75 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
237 | 237 | | |
238 | 238 | | |
239 | 239 | | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
240 | 267 | | |
241 | 268 | | |
242 | 269 | | |
| |||
0 commit comments