When writing data from a PyArrow DataFrame, how should we handle 'null' Fields?

### Question

```
import pyarrow as pa

# table created with the below pyarrow schema
schema = pa.schema(
    [
        pa.field("col1", pa.string(), nullable=True),
    ]
)

df = pa.Table.from_pylist(
    [
        {"col1": None}
    ]
)

table.overwrite(df)
```

In the above example, we encounter an error like this `UnsupportedPyArrowTypeException: Column 'col1' has an unsupported type: null`, with underlying cause 
```
in _ConvertToIceberg.primitive(self, primitive)
   1211     return FixedType(primitive.byte_width)
-> 1213 raise TypeError(f"Unsupported type: {primitive}")

TypeError: Unsupported type: null
```

Is there any reason we wouldn't want to support the case where pyarrow has marked a Field as `null`? As a workaround/fix, I was thinking that we could exclude `pa.null()` Fields in `visit_pyarrow(obj: pa.StructType, visitor: PyArrowSchemaVisitor[T])`. This way, the column would effectively be missing and any required/nullable enforcement would be performed accordingly. Would this have any undesired consequences?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When writing data from a PyArrow DataFrame, how should we handle 'null' Fields? #2119

Question

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

When writing data from a PyArrow DataFrame, how should we handle 'null' Fields? #2119

Description

Question

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions