Commit 10fbafc
committed
This PR extends PyIceberg geospatial support in three areas:
1. Adds geospatial bounds metric computation from WKB values (geometry + geography).
2. Adds spatial predicate expression/binding support (`st-contains`, `st-intersects`, `st-within`, `st-overlaps`) with conservative evaluator behavior.
3. Improves Arrow/Parquet interoperability for GeoArrow WKB, including explicit handling of geometry vs planar-geography ambiguity at the schema-compatibility boundary.
This increment is compatibility-first and does **not** introduce new runtime dependencies.
Base `geometry`/`geography` types existed, but there were still practical gaps:
- Geospatial columns were not contributing spec-encoded bounds in data-file metrics.
- Spatial predicates were not modeled end-to-end in expression binding/visitor plumbing.
- GeoArrow metadata can be ambiguous for `geometry` vs `geography(..., "planar")`, causing false compatibility failures during import/add-files flows.
- Added pure-Python geospatial utilities in `pyiceberg/utils/geospatial.py`:
- WKB envelope extraction
- antimeridian-aware geography envelope merge
- Iceberg geospatial bound serialization/deserialization
- Added `GeospatialStatsAggregator` and geospatial aggregate helpers in `pyiceberg/io/pyarrow.py`.
- Updated write/import paths to compute geospatial bounds from actual row values (not Parquet binary min/max stats):
- `write_file(...)`
- `parquet_file_to_data_file(...)`
- Prevented incorrect partition inference from geospatial envelope bounds.
- Added expression types in `pyiceberg/expressions/__init__.py`:
- `STContains`, `STIntersects`, `STWithin`, `STOverlaps`
- bound counterparts and JSON parsing support
- Added visitor dispatch/plumbing in `pyiceberg/expressions/visitors.py`.
- Behavior intentionally conservative in this increment:
- row-level expression evaluator raises `NotImplementedError`
- manifest/metrics evaluators return conservative might-match defaults
- translation paths preserve spatial predicates where possible
- Added GeoArrow WKB decoding helper in `pyiceberg/io/pyarrow.py` to map extension metadata to Iceberg geospatial types.
- Added boundary-only compatibility option in `pyiceberg/schema.py`:
- `_check_schema_compatible(..., allow_planar_geospatial_equivalence=False)`
- Enabled that option only in `_check_pyarrow_schema_compatible(...)` to allow:
- `geometry` <-> `geography(..., "planar")` when CRS strings match
- while still rejecting spherical geography mismatches
- Added one-time warning log when `geoarrow-pyarrow` is unavailable and code falls back to binary.
- Updated user docs: `mkdocs/docs/geospatial.md`
- Added decisions record: `mkdocs/docs/dev/geospatial-types-decisions-v1.md`
Added/updated tests across:
- `tests/utils/test_geospatial.py`
- `tests/io/test_pyarrow_stats.py`
- `tests/io/test_pyarrow.py`
- `tests/expressions/test_spatial_predicates.py`
- `tests/integration/test_geospatial.py`
Coverage includes:
- geospatial bound encoding/decoding (XY/XYZ/XYM/XYZM)
- geography antimeridian behavior
- geospatial metrics generation from write/import paths
- spatial predicate modeling/binding/translation behavior
- planar ambiguity compatibility guardrails
- warning behavior for missing `geoarrow-pyarrow`
- No user-facing API removals.
- New compatibility relaxation is intentionally scoped to Arrow/Parquet schema-compatibility boundary only.
- Core schema/type compatibility remains strict elsewhere.
- No spatial pushdown/row execution implementation in this PR.
- Spatial predicate execution semantics.
- Spatial predicate pushdown/pruning.
- Runtime WKB <-> WKT conversion strategy.1 parent 9687d08 commit 10fbafc
File tree
11 files changed
+1588
-11
lines changed- mkdocs/docs
- pyiceberg
- expressions
- io
- utils
- tests
- expressions
- integration
- io
- utils
11 files changed
+1588
-11
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
84 | 84 | | |
85 | 85 | | |
86 | 86 | | |
87 | | - | |
| 87 | + | |
88 | 88 | | |
89 | | - | |
| 89 | + | |
90 | 90 | | |
91 | | - | |
| 91 | + | |
92 | 92 | | |
93 | 93 | | |
94 | 94 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
29 | 29 | | |
30 | 30 | | |
31 | 31 | | |
32 | | - | |
| 32 | + | |
33 | 33 | | |
34 | 34 | | |
35 | 35 | | |
| |||
48 | 48 | | |
49 | 49 | | |
50 | 50 | | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
51 | 61 | | |
52 | 62 | | |
53 | 63 | | |
| |||
109 | 119 | | |
110 | 120 | | |
111 | 121 | | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
112 | 130 | | |
113 | 131 | | |
114 | 132 | | |
| |||
1106 | 1124 | | |
1107 | 1125 | | |
1108 | 1126 | | |
| 1127 | + | |
| 1128 | + | |
| 1129 | + | |
| 1130 | + | |
| 1131 | + | |
| 1132 | + | |
| 1133 | + | |
| 1134 | + | |
| 1135 | + | |
| 1136 | + | |
| 1137 | + | |
| 1138 | + | |
| 1139 | + | |
| 1140 | + | |
| 1141 | + | |
| 1142 | + | |
| 1143 | + | |
| 1144 | + | |
| 1145 | + | |
| 1146 | + | |
| 1147 | + | |
| 1148 | + | |
| 1149 | + | |
| 1150 | + | |
| 1151 | + | |
| 1152 | + | |
| 1153 | + | |
| 1154 | + | |
| 1155 | + | |
| 1156 | + | |
| 1157 | + | |
| 1158 | + | |
| 1159 | + | |
| 1160 | + | |
| 1161 | + | |
| 1162 | + | |
| 1163 | + | |
| 1164 | + | |
| 1165 | + | |
| 1166 | + | |
| 1167 | + | |
| 1168 | + | |
| 1169 | + | |
| 1170 | + | |
| 1171 | + | |
| 1172 | + | |
| 1173 | + | |
| 1174 | + | |
| 1175 | + | |
| 1176 | + | |
| 1177 | + | |
| 1178 | + | |
| 1179 | + | |
| 1180 | + | |
| 1181 | + | |
| 1182 | + | |
| 1183 | + | |
| 1184 | + | |
| 1185 | + | |
| 1186 | + | |
| 1187 | + | |
| 1188 | + | |
| 1189 | + | |
| 1190 | + | |
| 1191 | + | |
| 1192 | + | |
| 1193 | + | |
| 1194 | + | |
| 1195 | + | |
| 1196 | + | |
| 1197 | + | |
| 1198 | + | |
| 1199 | + | |
| 1200 | + | |
| 1201 | + | |
| 1202 | + | |
| 1203 | + | |
| 1204 | + | |
| 1205 | + | |
| 1206 | + | |
| 1207 | + | |
| 1208 | + | |
| 1209 | + | |
| 1210 | + | |
| 1211 | + | |
| 1212 | + | |
| 1213 | + | |
| 1214 | + | |
| 1215 | + | |
| 1216 | + | |
| 1217 | + | |
| 1218 | + | |
| 1219 | + | |
| 1220 | + | |
| 1221 | + | |
| 1222 | + | |
| 1223 | + | |
| 1224 | + | |
| 1225 | + | |
| 1226 | + | |
| 1227 | + | |
| 1228 | + | |
| 1229 | + | |
| 1230 | + | |
| 1231 | + | |
| 1232 | + | |
| 1233 | + | |
| 1234 | + | |
| 1235 | + | |
| 1236 | + | |
| 1237 | + | |
| 1238 | + | |
| 1239 | + | |
| 1240 | + | |
| 1241 | + | |
| 1242 | + | |
| 1243 | + | |
| 1244 | + | |
| 1245 | + | |
| 1246 | + | |
| 1247 | + | |
| 1248 | + | |
| 1249 | + | |
| 1250 | + | |
| 1251 | + | |
| 1252 | + | |
| 1253 | + | |
| 1254 | + | |
| 1255 | + | |
| 1256 | + | |
| 1257 | + | |
| 1258 | + | |
| 1259 | + | |
| 1260 | + | |
| 1261 | + | |
| 1262 | + | |
| 1263 | + | |
| 1264 | + | |
| 1265 | + | |
| 1266 | + | |
| 1267 | + | |
| 1268 | + | |
| 1269 | + | |
| 1270 | + | |
| 1271 | + | |
| 1272 | + | |
| 1273 | + | |
| 1274 | + | |
| 1275 | + | |
| 1276 | + | |
| 1277 | + | |
| 1278 | + | |
| 1279 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
47 | 47 | | |
48 | 48 | | |
49 | 49 | | |
| 50 | + | |
50 | 51 | | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
51 | 56 | | |
52 | 57 | | |
53 | 58 | | |
| |||
326 | 331 | | |
327 | 332 | | |
328 | 333 | | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
329 | 346 | | |
330 | 347 | | |
331 | 348 | | |
| |||
421 | 438 | | |
422 | 439 | | |
423 | 440 | | |
| 441 | + | |
| 442 | + | |
| 443 | + | |
| 444 | + | |
| 445 | + | |
| 446 | + | |
| 447 | + | |
| 448 | + | |
| 449 | + | |
| 450 | + | |
| 451 | + | |
| 452 | + | |
| 453 | + | |
| 454 | + | |
| 455 | + | |
| 456 | + | |
| 457 | + | |
| 458 | + | |
| 459 | + | |
| 460 | + | |
424 | 461 | | |
425 | 462 | | |
426 | 463 | | |
| |||
514 | 551 | | |
515 | 552 | | |
516 | 553 | | |
| 554 | + | |
| 555 | + | |
| 556 | + | |
| 557 | + | |
| 558 | + | |
| 559 | + | |
| 560 | + | |
| 561 | + | |
| 562 | + | |
| 563 | + | |
| 564 | + | |
| 565 | + | |
517 | 566 | | |
518 | 567 | | |
519 | 568 | | |
| |||
762 | 811 | | |
763 | 812 | | |
764 | 813 | | |
| 814 | + | |
| 815 | + | |
| 816 | + | |
| 817 | + | |
| 818 | + | |
| 819 | + | |
| 820 | + | |
| 821 | + | |
| 822 | + | |
| 823 | + | |
| 824 | + | |
| 825 | + | |
765 | 826 | | |
766 | 827 | | |
767 | 828 | | |
| |||
905 | 966 | | |
906 | 967 | | |
907 | 968 | | |
| 969 | + | |
| 970 | + | |
908 | 971 | | |
909 | 972 | | |
910 | 973 | | |
| |||
926 | 989 | | |
927 | 990 | | |
928 | 991 | | |
| 992 | + | |
| 993 | + | |
929 | 994 | | |
930 | 995 | | |
931 | 996 | | |
| |||
1065 | 1130 | | |
1066 | 1131 | | |
1067 | 1132 | | |
| 1133 | + | |
| 1134 | + | |
| 1135 | + | |
| 1136 | + | |
| 1137 | + | |
| 1138 | + | |
| 1139 | + | |
| 1140 | + | |
| 1141 | + | |
| 1142 | + | |
| 1143 | + | |
| 1144 | + | |
1068 | 1145 | | |
1069 | 1146 | | |
1070 | 1147 | | |
| |||
1153 | 1230 | | |
1154 | 1231 | | |
1155 | 1232 | | |
| 1233 | + | |
| 1234 | + | |
| 1235 | + | |
| 1236 | + | |
| 1237 | + | |
| 1238 | + | |
| 1239 | + | |
| 1240 | + | |
| 1241 | + | |
| 1242 | + | |
| 1243 | + | |
| 1244 | + | |
1156 | 1245 | | |
1157 | 1246 | | |
1158 | 1247 | | |
| |||
1739 | 1828 | | |
1740 | 1829 | | |
1741 | 1830 | | |
| 1831 | + | |
| 1832 | + | |
| 1833 | + | |
| 1834 | + | |
| 1835 | + | |
| 1836 | + | |
| 1837 | + | |
| 1838 | + | |
| 1839 | + | |
| 1840 | + | |
| 1841 | + | |
| 1842 | + | |
1742 | 1843 | | |
1743 | 1844 | | |
1744 | 1845 | | |
| |||
0 commit comments