Commit 87a1f86
authored
[feat] support messages column as JSON string in iterable datasets (#147)
Add automatic JSON string decoding for the messages column in iterable
dataset loaders. This allows parquet files to store chat messages as
JSON strings instead of nested Arrow structs, avoiding schema inference
issues with deeply nested message formats.
Co-authored-by: mwxely <mwxely@users.noreply.github.com>1 parent 749eeef commit 87a1f86
2 files changed
Lines changed: 10 additions & 0 deletions
Lines changed: 4 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
1 | 2 | | |
2 | 3 | | |
3 | 4 | | |
| |||
22 | 23 | | |
23 | 24 | | |
24 | 25 | | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
25 | 29 | | |
26 | 30 | | |
27 | 31 | | |
| |||
Lines changed: 6 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
1 | 2 | | |
2 | 3 | | |
3 | 4 | | |
| |||
63 | 64 | | |
64 | 65 | | |
65 | 66 | | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
66 | 70 | | |
67 | 71 | | |
68 | 72 | | |
| |||
92 | 96 | | |
93 | 97 | | |
94 | 98 | | |
| 99 | + | |
| 100 | + | |
95 | 101 | | |
96 | 102 | | |
97 | 103 | | |
| |||
0 commit comments