feat(GeoDataFrame support): geopandas support for shapefile exporting#2671
Conversation
…frame()` routines
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #2671 +/- ##
===========================================
+ Coverage 55.5% 72.3% +16.7%
===========================================
Files 644 667 +23
Lines 124135 130120 +5985
===========================================
+ Hits 68947 94127 +25180
+ Misses 55188 35993 -19195
🚀 New features to boost your workflow:
|
|
@wpbonelli it looks like the smoke tests are failing on |
There was a problem hiding this comment.
Great to be able to lean on geopandas for shapefile export. I guess we should look at using xarray for netcdf export similarly
General comment re: deprecations, we usually put an expected removal version in the deprecation warning which I think is typically ~2 minor releases ahead?
| @property | ||
| def geo_dataframe(self): | ||
| """ | ||
| Returns a geopandas GeoDataFrame of the model grid |
There was a problem hiding this comment.
I think the running convention is to add deprecation notes to docstrings as well as the warning in the code itself
| filename : str or PathLike | ||
| path of the shapefile to write | ||
| mg : flopy.discretization.grid.Grid object | ||
| modelgrid : flopy.discretization.grid.Grid object |
There was a problem hiding this comment.
suggest keeping/deprecating mg just making it optional and introducing modelgrid alongside it to avoid breakage. or, since this function will go away soon, is it worth renaming this parameter at all?
There was a problem hiding this comment.
Changed back to mg and added deprecation notes
|
|
||
| def model_attributes_to_shapefile( | ||
| path: Union[str, PathLike], | ||
| filename: Union[str, PathLike], |
There was a problem hiding this comment.
same comment as mg/modelgrid above
There was a problem hiding this comment.
reverted naming convention and added deprecation notes
| Parameters | ||
| ---------- | ||
| shpname : str or PathLike | ||
| filename : str or PathLike |
There was a problem hiding this comment.
reverted naming convention and added deprecation notes
| geoms, | ||
| shpname: Union[str, PathLike] = "recarray.shp", | ||
| mg=None, | ||
| filename: Union[str, PathLike] = "recarray.shp", |
There was a problem hiding this comment.
reverted naming convention and added deprecation notes
|
|
||
| return gdf | ||
|
|
||
| def export(self, f, **kwargs): |
There was a problem hiding this comment.
I guess we would ideally deprecate export() now that the geodataframe approach is the recommended way to export to shapefile, but since export() also handles netcdf we have to wait til there is a similar netcdf export capability via e.g. xarray? working in the same way as the geopandas approach in this PR like to_xarray().to_netcdf()?
There was a problem hiding this comment.
In the long term I think we should deprecate export() for methods like .to_netcdf(), .to_vtk(), .to_raster() to be explicit and follow similar conventions as pandas, geopandas, etc...
| self.repeating = True | ||
| self.empty_keys = {} | ||
|
|
||
| def to_geodataframe(self, gdf=None, kper=0, sparse=False, truncate_attrs=False, **kwargs): |
There was a problem hiding this comment.
What's the use case you have in mind for sparse=False? Since stress period data is sparse by default (at least in 3.x) it seems reasonable to return a sparse geodataframe too, either exclusively or at least by default?
Kind of like what you're doing in shapefile_export_example.py for WEL.
I guess I'm wondering when someone calling to_geodataframe() on stress period data would want one containing the whole grid, rather than just the relevant boundary conditions. In that case couldn't they get the grid geodataframe and .merge() it manually with the SPD frame or pass the grid frame into the gdf parameter here.
Performance wise, it seems expensive to always do the full grid then dropna().
There was a problem hiding this comment.
sparse=False is useful when exporting from higher level objects like gwf.to_geodataframe(), from multiple individual packages into the same geodataframe, and if a user is exporting multiple stress periods with different boundary conditions from the same package. Because the geodataframe is constructed in the modelgrid object, it is created as a single layer representation of the whole grid and then we filter after the fact. For operations like gwf.to_geodataframe() there is a single construction of geometries and the geodataframe is then passed through to each package/datatype export.
Additionally sparse operates a little differently depending on if it's used on a TransientList or a MFList like object. On a TransientList the dataframe and attributes are constructed using a full grid representation and then filtered after all stress-period data is added. This is a much simpler operation than trying to reconcile multiple sparse dataframes from many stress-periods into a single dataframe.
We could cache a copy of the polygons after creation to improve performance; however, we don't currently give an option on the grid to construct a GeoDataFrame with only certain cells. We could add a nodes= parameter or kwarg to the grid .to_geodataframe() method which would only construct geometries for those nodes, however this will most likely be used in limited cases like exporting data from a single stress period or exporting SFR/LAK/UZF packagedata.
|
|
||
| @property | ||
| def geo_dataframe(self): | ||
| def geodataframe(self): |
There was a problem hiding this comment.
why to_geodataframe... elsewhere and not here, other than the fact the old property didn't have to_...?
suggest also deprecating the old one to avoid breakage?
There was a problem hiding this comment.
Two comments here:
-
geodataframewas used to keep the convention consistent with other properties on theGeoSpatialUtilclass (e.g.,shapely,geojson,points, etc...). -
I did not add a deprecation here because this is a low-level object that generally shouldn't be called by end users. This serves as a "catch all" in methods that handle vector geospatial data and converts it to the type that the function was originally written for. I can keep the old method around and add a deprecation warning in the off chance someone is using this method to convert vector data outside of the intended use case.
| optional attribute name, default uses util2d name | ||
| forgive : bool | ||
| optional flag to continue if data shape not compatible with GeoDataFrame | ||
| truncate_attrs : bool |
There was a problem hiding this comment.
I think GeoDataFrame.to_file() already truncates for you when writing to shapefile, is this parameter necessary? It seems like having our own API for this would be useful if we let the user customize the truncation but otherwise redundant?
There was a problem hiding this comment.
The truncate_attrs flag is definitely useful. to_file() does truncate the attributes, but it just cuts off the end and can potentially write a dbf file with multiple attributes that have identical names. truncate_attrs maintains the way we previously constructed attributes in flopy by removing text from the middle of the attribute name and keeping the layer and stress period number.
There was a problem hiding this comment.
got it- didn't realize this. thanks!
| geoms = GeoSpatialCollection(geoms, "Point") | ||
| gdf = gpd.GeoDataFrame.from_features(geoms) | ||
|
|
||
| for col in list(welldata): |
There was a problem hiding this comment.
instead of looping to add columns, could this just be
gdf = gpd.GeoDataFrame(welldata, geometry=geoms)There was a problem hiding this comment.
updated this, need to use:
geoms = GeoSpatialCollection(geoms, "Point").shape
gdf = gpd.GeoDataFrame(welldata, geometry=geoms)for this type of GeoDataFrame construction because it expects an arraylike object
christianlangevin
left a comment
There was a problem hiding this comment.
I took a quick look through this PR. Really nice, @jlarsen-usgs. Excited to see it rolled out.
…full_grid` and `shorten_attr`
…flopy into conflict_resolution
Improve the shared face finding algorithm for HFB (Horizontal Flow Barrier) plotting. Factor out @jlarsen-usgs's index-based approach from #2671 into reusable function(s) and use it instead of floating point comparisons as before. Should be faster. I moved the new face-related functions to a new module faceutil.py instead of plotutil.py as they are used not only for plotting but for export, and may be useful in other cases. I considered putting them in gridutil.py but that is used by the grid classes, while the face functions use the grid classes, so it would have created a circular import. I also changed the name of is_vertical_barrier -> is_vertical and inverted the semantics so it works on faces, not barriers (a face between horizontally adjacent cells is a vertical polygon, and a face between vertically adjacent cells is a horizontal polygon, but we call the former a "horizontal" flow barrier and the latter a "vertical" flow barrier). I figure writing these utils in terms of faces for the general case makes sense, and just interpreting the results for the specific case of HFBs. This PR builds on - #2671's shared face finding approach - #2680 which gives all grid types a consistent get_node() method - #2681 which adds an ihc init parameter/attribute to UnstructuredGrid Some ad hoc cell ID -> node number conversion utilities are removed and get_node() used instead. I am classifying all this a refactor though it introduces a new utility module, as the new utils are in service of the refactor.
This PR supersedes the draft PR #2573 and continues the cleanup of legacy shapefile exporting code and integration of geopandas GeoDataFrame support.