why I got a list instead of datasets.arrow_dataset.Column after the map?
#7838
Unanswered
Septemberlemon
asked this question in
Q&A
Replies: 1 comment
-
|
This is expected behavior — After calling How to work with the result correctly# After map, column access returns a list
result = dataset.map(lambda x: {"new_col": x["text"].upper()})
print(type(result["new_col"])) # <class list>
# To get a proper column object, use .data
import pyarrow as pa
col = result.data["new_col"] # pyarrow.ChunkedArray
# Or convert to pandas
df = result.to_pandas()
print(df["new_col"]) # pandas SeriesIf you need the Arrow column object# Access via the underlying Arrow table
arrow_col = result._data["new_col"] # ChunkedArrayWhy this happensHuggingFace datasets stores data in Arrow format internally. The For most use cases, the list access is what you want. Use |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Beta Was this translation helpful? Give feedback.
All reactions