Skip to content

Commit 26b12e0

Browse files
committed
move md file to API section
1 parent 54905d1 commit 26b12e0

File tree

2 files changed

+79
-88
lines changed

2 files changed

+79
-88
lines changed

mkdocs/docs/api.md

Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2039,3 +2039,82 @@ DataFrame()
20392039
| 3 | 6 |
20402040
+---+---+
20412041
```
2042+
2043+
## Type mapping
2044+
2045+
### PyArrow
2046+
2047+
The Iceberg specification only specifies type mapping for Avro, Parquet, and ORC:
2048+
2049+
- [Iceberg to Avro](https://iceberg.apache.org/spec/#avro)
2050+
2051+
- [Iceberg to Parquet](https://iceberg.apache.org/spec/#parquet)
2052+
2053+
- [Iceberg to ORC](https://iceberg.apache.org/spec/#orc)
2054+
2055+
The following tables describe the type mappings between PyIceberg and PyArrow. In the tables below, `pa` refers to the `pyarrow` module:
2056+
2057+
```python
2058+
import pyarrow as pa
2059+
```
2060+
2061+
#### PyIceberg to PyArrow type mapping
2062+
2063+
| PyIceberg type class | PyArrow type | Notes |
2064+
|---------------------------------|-------------------------------------|----------------------------------------|
2065+
| `BooleanType` | `pa.bool_()` | |
2066+
| `IntegerType` | `pa.int32()` | |
2067+
| `LongType` | `pa.int64()` | |
2068+
| `FloatType` | `pa.float32()` | |
2069+
| `DoubleType` | `pa.float64()` | |
2070+
| `DecimalType(p, s)` | `pa.decimal128(p, s)` | |
2071+
| `DateType` | `pa.date32()` | |
2072+
| `TimeType` | `pa.time64("us")` | |
2073+
| `TimestampType` | `pa.timestamp("us")` | |
2074+
| `TimestampNanoType` | `pa.timestamp("ns")` | |
2075+
| `TimestamptzType` | `pa.timestamp("us", tz="UTC")` | |
2076+
| `TimestamptzNanoType` | `pa.timestamp("ns", tz="UTC")` | |
2077+
| `StringType` | `pa.large_string()` | |
2078+
| `UUIDType` | `pa.uuid()` | |
2079+
| `BinaryType` | `pa.large_binary()` | |
2080+
| `FixedType(L)` | `pa.binary(L)` | |
2081+
| `StructType` | `pa.struct()` | |
2082+
| `ListType(e)` | `pa.large_list(e)` | |
2083+
| `MapType(k, v)` | `pa.map_(k, v)` | |
2084+
| `UnknownType` | `pa.null()` | |
2085+
2086+
---
2087+
2088+
#### PyArrow to PyIceberg type mapping
2089+
2090+
| PyArrow type | PyIceberg type class | Notes |
2091+
|------------------------------------|-----------------------------|--------------------------------|
2092+
| `pa.bool_()` | `BooleanType` | |
2093+
| `pa.int32()` | `IntegerType` | |
2094+
| `pa.int64()` | `LongType` | |
2095+
| `pa.float32()` | `FloatType` | |
2096+
| `pa.float64()` | `DoubleType` | |
2097+
| `pa.decimal128(p, s)` | `DecimalType(p, s)` | |
2098+
| `pa.decimal256(p, s)` | Unsupported | |
2099+
| `pa.date32()` | `DateType` | |
2100+
| `pa.date64()` | Unsupported | |
2101+
| `pa.time64("us")` | `TimeType` | |
2102+
| `pa.timestamp("us")` | `TimestampType` | |
2103+
| `pa.timestamp("ns")` | `TimestampNanoType` | |
2104+
| `pa.timestamp("us", tz="UTC")` | `TimestamptzType` | |
2105+
| `pa.timestamp("ns", tz="UTC")` | `TimestamptzNanoType` | |
2106+
| `pa.string()` / `pa.large_string()`| `StringType` | |
2107+
| `pa.uuid()` | `UUIDType` | |
2108+
| `pa.binary()` / `pa.large_binary()`| `BinaryType` | |
2109+
| `pa.binary(L)` | `FixedType(L)` | Fixed-length byte arrays |
2110+
| `pa.struct([...])` | `StructType` | |
2111+
| `pa.list_(e)` / `pa.large_list(e)` | `ListType(e)` | |
2112+
| `pa.map_(k, v)` | `MapType(k, v)` | |
2113+
| `pa.null()` | `UnknownType` | |
2114+
2115+
---
2116+
2117+
***Notes***
2118+
2119+
- PyIceberg `GeometryType` and `GeographyType` types are mapped to a GeoArrow WKB extension type.
2120+
Otherwise, falls back to `pa.large_binary()` which stores WKB bytes.

pyiceberg/type_mapping.py

Lines changed: 0 additions & 88 deletions
This file was deleted.

0 commit comments

Comments
 (0)