Skip to content

Commit 18ee579

Browse files
masawdahpre-commit-ci[bot]martinfleis
authored
ENH: add zonal_stats (#52)
* example * assign the pixels ids based on their polygons then group them using their ids * fix importing xagg and run an example * xagg using openEO data , it needs to implement parallel processing to make it faster * clean up * spatial_aggregation * clean up * rioxarray * packages * deps * deps * deps * fix the notes * zonal stats in pytest * handle long lines * handle long lines * fix pre-commit * updated files * support lat&lon names * fix the dimension names on the fly * testpy * dask support just axis as integer * Dimension Mapping * updated files * [pre-commit.ci] pre-commit autoupdate (#53) * MAINT: test on Python 3.12, update actions, use ruff format (#54) * DOC: fix formatting * MAINT: add PYthon 3.12, use ruff formatter * ignore conflicting rules * Update xvec/accessor.py Co-authored-by: Martin Fleischmann <martin@martinfleischmann.net> * Update xvec/accessor.py Co-authored-by: Martin Fleischmann <martin@martinfleischmann.net> * Update xvec/accessor.py Co-authored-by: Martin Fleischmann <martin@martinfleischmann.net> * Update xvec/accessor.py Co-authored-by: Martin Fleischmann <martin@martinfleischmann.net> * clean uo * clean accessor * remove dask option * clean * internal funcs * pytest * clean up * example * assign the pixels ids based on their polygons then group them using their ids * fix importing xagg and run an example * xagg using openEO data , it needs to implement parallel processing to make it faster * clean up * spatial_aggregation * clean up * rioxarray * packages * deps * deps * deps * fix the notes * zonal stats in pytest * handle long lines * handle long lines * fix pre-commit * updated files * support lat&lon names * fix the dimension names on the fly * testpy * dask support just axis as integer * Dimension Mapping * updated files * clean uo * clean accessor * remove dask option * clean * internal funcs * pytest * clean up * Update xvec/accessor.py Co-authored-by: Martin Fleischmann <martin@martinfleischmann.net> * accessor * dimension-agnostic * clean up * update pytest * refactor * refactor structure * geodatasets * pyogrio for IO * api * include vectorized rasterize-based method * rasterio link * fix docstring * fmt * fix for DataArray * testing * stat -> stats * fix keyword --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Martin Fleischmann <martin@martinfleischmann.net>
1 parent 718c414 commit 18ee579

12 files changed

Lines changed: 584 additions & 3 deletions

File tree

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -144,3 +144,5 @@ doc/source/generated
144144
.ruff_cache
145145
doc/source/cube.joblib.compressed
146146
doc/source/cube.pickle
147+
148+
cache/

ci/310.yaml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,10 +6,17 @@ dependencies:
66
# required
77
- shapely >=2
88
- xarray
9+
- rioxarray
10+
- joblib
11+
- rasterio
12+
- tqdm
913
- pyproj
1014
# testing
1115
- pytest
1216
- pytest-cov
1317
- pytest-xdist
1418
- pytest-reportlog
1519
- geopandas-base
20+
- geodatasets
21+
- pyogrio
22+

ci/311.yaml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,10 +6,17 @@ dependencies:
66
# required
77
- shapely >=2
88
- xarray
9+
- rioxarray
10+
- joblib
11+
- rasterio
12+
- tqdm
913
- pyproj
1014
# testing
1115
- pytest
1216
- pytest-cov
1317
- pytest-xdist
1418
- pytest-reportlog
1519
- geopandas-base
20+
- geodatasets
21+
- pyogrio
22+

ci/312.yaml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,10 +6,17 @@ dependencies:
66
# required
77
- shapely >=2
88
- xarray
9+
- rioxarray
10+
- joblib
11+
- rasterio
12+
- tqdm
913
- pyproj
1014
# testing
1115
- pytest
1216
- pytest-cov
1317
- pytest-xdist
1418
- pytest-reportlog
1519
- geopandas-base
20+
- geodatasets
21+
- pyogrio
22+

ci/39.yaml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,10 +6,16 @@ dependencies:
66
# required
77
- shapely >=2
88
- xarray
9+
- rioxarray
10+
- joblib
11+
- rasterio
12+
- tqdm
913
- pyproj
1014
# testing
1115
- pytest
1216
- pytest-cov
1317
- pytest-xdist
1418
- pytest-reportlog
1519
- geopandas-base
20+
- geodatasets
21+
- pyogrio

ci/dev.yaml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,12 +7,18 @@ dependencies:
77
- cython
88
- geos
99
- proj
10+
- rioxarray
11+
- joblib
12+
- rasterio
13+
- tqdm
1014
# testing
1115
- pytest
1216
- pytest-cov
1317
- pytest-xdist
1418
- pytest-reportlog
1519
- geopandas-base
20+
- geodatasets
21+
- pyogrio
1622
- pip
1723
- pip:
1824
- git+https://github.com/shapely/shapely.git@main

doc/source/api.rst

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,7 @@ Methods
5555
Dataset.xvec.to_geodataframe
5656
Dataset.xvec.to_geopandas
5757
Dataset.xvec.extract_points
58+
Dataset.xvec.zonal_stats
5859

5960

6061
DataArray.xvec
@@ -89,4 +90,5 @@ Methods
8990
DataArray.xvec.query
9091
DataArray.xvec.to_geodataframe
9192
DataArray.xvec.to_geopandas
92-
DataArray.xvec.extract_points
93+
DataArray.xvec.extract_points
94+
DataArray.xvec.zonal_stats

doc/source/conf.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,8 @@
4444
"xarray": ("https://docs.xarray.dev/en/latest/", None),
4545
"geopandas": ("https://geopandas.org/en/latest", None),
4646
"pandas": ("https://pandas.pydata.org/docs", None),
47+
"rasterio": ("https://rasterio.readthedocs.io/en/latest/", None),
48+
4749
}
4850

4951
# -- Options for HTML output -------------------------------------------------

xvec/accessor.py

Lines changed: 115 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111
from pyproj import CRS, Transformer
1212

1313
from .index import GeometryIndex
14+
from .zonal import _zonal_stats_iterative, _zonal_stats_rasterize
1415

1516

1617
@xr.register_dataarray_accessor("xvec")
@@ -918,6 +919,119 @@ def to_geodataframe(
918919
)
919920
return df
920921

922+
def zonal_stats(
923+
self,
924+
polygons: Sequence[shapely.Geometry],
925+
x_coords: Hashable,
926+
y_coords: Hashable,
927+
stats: str = "mean",
928+
name: Hashable = "geometry",
929+
index: bool = None,
930+
method: str = "rasterize",
931+
all_touched: bool = False,
932+
n_jobs: int = -1,
933+
**kwargs,
934+
):
935+
"""Extract the values from a dataset indexed by a set of geometries
936+
937+
The CRS of the raster and that of polygons need to be equal.
938+
Xvec does not verify their equality.
939+
940+
Parameters
941+
----------
942+
polygons : Sequence[shapely.Geometry]
943+
An arrray-like (1-D) of shapely geometries, like a numpy array or
944+
:class:`geopandas.GeoSeries`.
945+
x_coords : Hashable
946+
name of the coordinates containing ``x`` coordinates (i.e. the first value
947+
in the coordinate pair encoding the vertex of the polygon)
948+
y_coords : Hashable
949+
name of the coordinates containing ``y`` coordinates (i.e. the second value
950+
in the coordinate pair encoding the vertex of the polygon)
951+
stats : string
952+
Spatial aggregation statistic method, by default "mean". It supports the
953+
following statistcs: ['mean', 'median', 'min', 'max', 'sum']
954+
name : Hashable, optional
955+
Name of the dimension that will hold the ``polygons``, by default "geometry"
956+
index : bool, optional
957+
If `polygons` is a GeoSeries, ``index=True`` will attach its index as another
958+
coordinate to the geometry dimension in the resulting object. If
959+
``index=None``, the index will be stored if the `polygons.index` is a named
960+
or non-default index. If ``index=False``, it will never be stored. This is
961+
useful as an attribute link between the resulting array and the GeoPandas
962+
object from which the polygons are sourced.
963+
method : str, optional
964+
The method of data extraction. The default is ``"rasterize"``, which uses
965+
:func:`rasterio.features.rasterize` and is faster, but can lead to loss
966+
of information in case of small polygons. Other option is ``"iterate"``, which
967+
iterates over polygons and uses :func:`rasterio.features.geometry_mask`.
968+
all_touched : bool, optional
969+
If True, all pixels touched by geometries will be considered. If False, only
970+
pixels whose center is within the polygon or that are selected by
971+
Bresenham’s line algorithm will be considered.
972+
n_jobs : int, optional
973+
Number of parallel threads to use. It is recommended to set this to the
974+
number of physical cores of the CPU. ``-1`` uses all available cores. Applies
975+
only if ``method="iterate"``.
976+
**kwargs : optional
977+
Keyword arguments to be passed to the aggregation function
978+
(e.g., ``Dataset.mean(**kwargs)``).
979+
980+
Returns
981+
-------
982+
Dataset
983+
A subset of the original object with N-1 dimensions indexed by
984+
the the GeometryIndex.
985+
986+
"""
987+
# TODO: allow multiple stats at the same time (concat along a new axis),
988+
# TODO: possibly as a list of tuples to include names?
989+
# TODO: allow callable in stat (via .reduce())
990+
if method == "rasterize":
991+
result = _zonal_stats_rasterize(
992+
self,
993+
polygons=polygons,
994+
x_coords=x_coords,
995+
y_coords=y_coords,
996+
stats=stats,
997+
name=name,
998+
all_touched=all_touched,
999+
**kwargs,
1000+
)
1001+
elif method == "iterate":
1002+
result = _zonal_stats_iterative(
1003+
self,
1004+
polygons=polygons,
1005+
x_coords=x_coords,
1006+
y_coords=y_coords,
1007+
stats=stats,
1008+
name=name,
1009+
all_touched=all_touched,
1010+
n_jobs=n_jobs,
1011+
**kwargs,
1012+
)
1013+
else:
1014+
raise ValueError(
1015+
f"method '{method}' is not supported. Allowed options are 'rasterize' "
1016+
"and 'iterate'."
1017+
)
1018+
1019+
# save the index as a data variable
1020+
if isinstance(polygons, pd.Series):
1021+
if index is None:
1022+
if polygons.index.name is not None or not polygons.index.equals(
1023+
pd.RangeIndex(0, len(polygons))
1024+
):
1025+
index = True
1026+
if index:
1027+
index_name = polygons.index.name if polygons.index.name else "index"
1028+
result = result.assign_coords({index_name: (name, polygons.index)})
1029+
1030+
# standardize the shape - each method comes with a different one
1031+
return result.transpose(
1032+
name, *tuple(d for d in self._obj.dims if d not in [x_coords, y_coords])
1033+
)
1034+
9211035
def extract_points(
9221036
self,
9231037
points: Sequence[shapely.Geometry],
@@ -965,7 +1079,7 @@ def extract_points(
9651079
``index=None``, the index will be stored if the `points.index` is a named
9661080
or non-default index. If ``index=False``, it will never be stored. This is
9671081
useful as an attribute link between the resulting array and the GeoPandas
968-
object from which the points are sourced from.
1082+
object from which the points are sourced.
9691083
9701084
Returns
9711085
-------

xvec/tests/test_accessor.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
from geopandas.testing import assert_geodataframe_equal
88
from pandas.testing import assert_frame_equal
99

10-
import xvec # noqa
10+
import xvec # noqa: F401
1111
from xvec import GeometryIndex
1212

1313

0 commit comments

Comments
 (0)