-
Notifications
You must be signed in to change notification settings - Fork 461
When writing data from a PyArrow DataFrame, how should we handle 'null' Fields? #2119
Copy link
Copy link
Closed
Description
Question
import pyarrow as pa
# table created with the below pyarrow schema
schema = pa.schema(
[
pa.field("col1", pa.string(), nullable=True),
]
)
df = pa.Table.from_pylist(
[
{"col1": None}
]
)
table.overwrite(df)
In the above example, we encounter an error like this UnsupportedPyArrowTypeException: Column 'col1' has an unsupported type: null, with underlying cause
in _ConvertToIceberg.primitive(self, primitive)
1211 return FixedType(primitive.byte_width)
-> 1213 raise TypeError(f"Unsupported type: {primitive}")
TypeError: Unsupported type: null
Is there any reason we wouldn't want to support the case where pyarrow has marked a Field as null? As a workaround/fix, I was thinking that we could exclude pa.null() Fields in visit_pyarrow(obj: pa.StructType, visitor: PyArrowSchemaVisitor[T]). This way, the column would effectively be missing and any required/nullable enforcement would be performed accordingly. Would this have any undesired consequences?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels
Type
Fields
Give feedbackNo fields configured for issues without a type.