@@ -2039,3 +2039,82 @@ DataFrame()
20392039| 3 | 6 |
20402040+ -- -+ -- -+
20412041```
2042+
2043+ ## Type mapping
2044+
2045+ ### PyArrow
2046+
2047+ The Iceberg specification only specifies type mapping for Avro, Parquet, and ORC:
2048+
2049+ - [ Iceberg to Avro] ( https://iceberg.apache.org/spec/#avro )
2050+
2051+ - [ Iceberg to Parquet] ( https://iceberg.apache.org/spec/#parquet )
2052+
2053+ - [ Iceberg to ORC] ( https://iceberg.apache.org/spec/#orc )
2054+
2055+ The following tables describe the type mappings between PyIceberg and PyArrow. In the tables below, ` pa ` refers to the ` pyarrow ` module:
2056+
2057+ ``` python
2058+ import pyarrow as pa
2059+ ```
2060+
2061+ #### PyIceberg to PyArrow type mapping
2062+
2063+ | PyIceberg type class | PyArrow type | Notes |
2064+ | ---------------------------------| -------------------------------------| ----------------------------------------|
2065+ | ` BooleanType ` | ` pa.bool_() ` | |
2066+ | ` IntegerType ` | ` pa.int32() ` | |
2067+ | ` LongType ` | ` pa.int64() ` | |
2068+ | ` FloatType ` | ` pa.float32() ` | |
2069+ | ` DoubleType ` | ` pa.float64() ` | |
2070+ | ` DecimalType(p, s) ` | ` pa.decimal128(p, s) ` | |
2071+ | ` DateType ` | ` pa.date32() ` | |
2072+ | ` TimeType ` | ` pa.time64("us") ` | |
2073+ | ` TimestampType ` | ` pa.timestamp("us") ` | |
2074+ | ` TimestampNanoType ` | ` pa.timestamp("ns") ` | |
2075+ | ` TimestamptzType ` | ` pa.timestamp("us", tz="UTC") ` | |
2076+ | ` TimestamptzNanoType ` | ` pa.timestamp("ns", tz="UTC") ` | |
2077+ | ` StringType ` | ` pa.large_string() ` | |
2078+ | ` UUIDType ` | ` pa.uuid() ` | |
2079+ | ` BinaryType ` | ` pa.large_binary() ` | |
2080+ | ` FixedType(L) ` | ` pa.binary(L) ` | |
2081+ | ` StructType ` | ` pa.struct() ` | |
2082+ | ` ListType(e) ` | ` pa.large_list(e) ` | |
2083+ | ` MapType(k, v) ` | ` pa.map_(k, v) ` | |
2084+ | ` UnknownType ` | ` pa.null() ` | |
2085+
2086+ ---
2087+
2088+ #### PyArrow to PyIceberg type mapping
2089+
2090+ | PyArrow type | PyIceberg type class | Notes |
2091+ | ------------------------------------| -----------------------------| --------------------------------|
2092+ | ` pa.bool_() ` | ` BooleanType ` | |
2093+ | ` pa.int32() ` | ` IntegerType ` | |
2094+ | ` pa.int64() ` | ` LongType ` | |
2095+ | ` pa.float32() ` | ` FloatType ` | |
2096+ | ` pa.float64() ` | ` DoubleType ` | |
2097+ | ` pa.decimal128(p, s) ` | ` DecimalType(p, s) ` | |
2098+ | ` pa.decimal256(p, s) ` | Unsupported | |
2099+ | ` pa.date32() ` | ` DateType ` | |
2100+ | ` pa.date64() ` | Unsupported | |
2101+ | ` pa.time64("us") ` | ` TimeType ` | |
2102+ | ` pa.timestamp("us") ` | ` TimestampType ` | |
2103+ | ` pa.timestamp("ns") ` | ` TimestampNanoType ` | |
2104+ | ` pa.timestamp("us", tz="UTC") ` | ` TimestamptzType ` | |
2105+ | ` pa.timestamp("ns", tz="UTC") ` | ` TimestamptzNanoType ` | |
2106+ | ` pa.string() ` / ` pa.large_string() ` | ` StringType ` | |
2107+ | ` pa.uuid() ` | ` UUIDType ` | |
2108+ | ` pa.binary() ` / ` pa.large_binary() ` | ` BinaryType ` | |
2109+ | ` pa.binary(L) ` | ` FixedType(L) ` | Fixed-length byte arrays |
2110+ | ` pa.struct([...]) ` | ` StructType ` | |
2111+ | ` pa.list_(e) ` / ` pa.large_list(e) ` | ` ListType(e) ` | |
2112+ | ` pa.map_(k, v) ` | ` MapType(k, v) ` | |
2113+ | ` pa.null() ` | ` UnknownType ` | |
2114+
2115+ ---
2116+
2117+ *** Notes***
2118+
2119+ - PyIceberg ` GeometryType ` and ` GeographyType ` types are mapped to a GeoArrow WKB extension type.
2120+ Otherwise, falls back to ` pa.large_binary() ` which stores WKB bytes.
0 commit comments