Skip to content

Commit 87fbca6

Browse files
committed
docs: type mapping between pyiceberg and pyarrow
1 parent 4173ef7 commit 87fbca6

File tree

1 file changed

+72
-0
lines changed

1 file changed

+72
-0
lines changed

pyiceberg/type_mapping.py

Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,72 @@
1+
"""Type mapping between PyArrow and Iceberg types.
2+
3+
## PyArrow
4+
The Iceberg specification only specifies type mapping for Avro, Parquet, and ORC:
5+
6+
- [Iceberg to Avro](https://iceberg.apache.org/spec/#avro)
7+
8+
- [Iceberg to Parquet](https://iceberg.apache.org/spec/#parquet)
9+
10+
- [Iceberg to ORC](https://iceberg.apache.org/spec/#orc)
11+
12+
Refer to the following tables for type mapping in both direction for PyIceberg types and PyArrow types.
13+
14+
### PyIceberg to PyArrow type mapping
15+
16+
| PyIceberg type class | PyArrow type | Notes |
17+
|---------------------------------|-------------------------------------|----------------------------------------|
18+
| `BooleanType` | `pa.bool_()` | |
19+
| `IntegerType` | `pa.int32()` | |
20+
| `LongType` | `pa.int64()` | |
21+
| `FloatType` | `pa.float32()` | |
22+
| `DoubleType` | `pa.float64()` | |
23+
| `DecimalType(p, s)` | `pa.decimal128(p, s)` | |
24+
| `DateType` | `pa.date32()` | |
25+
| `TimeType` | `pa.time64("us")` | |
26+
| `TimestampType` | `pa.timestamp("us")` | |
27+
| `TimestampNanoType` | `pa.timestamp("ns")` | |
28+
| `TimestamptzType` | `pa.timestamp("us", tz="UTC")` | |
29+
| `TimestamptzNanoType` | `pa.timestamp("ns", tz="UTC")` | |
30+
| `StringType` | `pa.large_string()` | |
31+
| `UUIDType` | `pa.uuid()` | |
32+
| `BinaryType` | `pa.large_binary()` | |
33+
| `FixedType(L)` | `pa.binary(L)` | |
34+
| `StructType` | `pa.struct()` | |
35+
| `ListType(e)` | `pa.large_list(e)` | |
36+
| `MapType(k, v)` | `pa.map_(k, v)` | |
37+
| `UnknownType` | `pa.null()` | |
38+
39+
---
40+
### PyArrow to PyIceberg type mapping
41+
42+
| PyArrow type | PyIceberg type class | Notes |
43+
|------------------------------------|-----------------------------|--------------------------------|
44+
| `pa.bool_()` | `BooleanType` | |
45+
| `pa.int32()` | `IntegerType` | |
46+
| `pa.int64()` | `LongType` | |
47+
| `pa.float32()` | `FloatType` | |
48+
| `pa.float64()` | `DoubleType` | |
49+
| `pa.decimal128(p, s)` | `DecimalType(p, s)` | |
50+
| `pa.decimal256(p, s)` | Unsupported | |
51+
| `pa.date32()` | `DateType` | |
52+
| `pa.date64()` | Unsupported | |
53+
| `pa.time64("us")` | `TimeType` | |
54+
| `pa.timestamp("us")` | `TimestampType` | |
55+
| `pa.timestamp("ns")` | `TimestampNanoType` | |
56+
| `pa.timestamp("us", tz="UTC")` | `TimestamptzType` | |
57+
| `pa.timestamp("ns", tz="UTC")` | `TimestamptzNanoType` | |
58+
| `pa.string()` / `pa.large_string()`| `StringType` | |
59+
| `pa.uuid()` | `UUIDType` | |
60+
| `pa.binary()` / `pa.large_binary()`| `BinaryType` | |
61+
| `pa.binary(L)` | `FixedType(L)` | Fixed-length byte arrays |
62+
| `pa.struct([...])` | `StructType` | |
63+
| `pa.list_(e)` / `pa.large_list(e)` | `ListType(e)` | |
64+
| `pa.map_(k, v)` | `MapType(k, v)` | |
65+
| `pa.null()` | `UnknownType` | |
66+
67+
---
68+
69+
### Notes
70+
- PyIceberg `GeometryType` and `GeographyType` types are mapped to a GeoArrow WKB extension type.
71+
Otherwise, falls back to `pa.large_binary()` which stores WKB bytes.
72+
"""

0 commit comments

Comments
 (0)