[GH-2043] Geopandas.GeoSeries: Implement is_valid_reason, make_valid#2044
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR adds full implementations for GeoSeries.is_valid_reason and GeoSeries.make_valid and covers them with new tests.
- Implemented
is_valid_reasonto call the underlying Spark functionST_IsValidReason. - Implemented
make_validsupporting thestructuremethod viaST_MakeValid. - Added tests for both methods in two test modules, with version guards.
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| python/sedona/geopandas/geoseries.py | Added implementations for is_valid_reason and make_valid. |
| python/tests/geopandas/test_match_geopandas_series.py | Added tests for is_valid_reason and make_valid with version guards. |
| python/tests/geopandas/test_geoseries.py | Added direct tests of is_valid_reason and make_valid. |
Comments suppressed due to low confidence (2)
python/sedona/geopandas/geoseries.py:707
- [nitpick] The example docblock for
is_valid_reasonshows 5 input geometries but only indexes 0–3 beforeNone. Update the printed snippet to include the fifth index (4) for clarity.
>>> s = GeoSeries(
python/tests/geopandas/test_match_geopandas_series.py:370
- The alias
psis not imported in this test, sops.Serieswill raise a NameError. Addimport pyspark.pandas as psor use the existing alias for pandas-on-Spark.
assert isinstance(sgpd_result, ps.Series)
| def test_is_valid_reason(self): | ||
| pass | ||
| # is_valid_reason was added in geopandas 1.0.0 | ||
| if gpd.__version__ < "1.0.0": |
There was a problem hiding this comment.
Comparing versions as plain strings can fail for multi-digit versions (e.g. "1.10" < "1.2"). Consider using packaging.version.parse or distutils.version.LooseVersion for reliable version checks.
| import shapely | ||
|
|
||
| # 'structure' method requires shapely >= 2.1.0 | ||
| if shapely.__version__ < "2.1.0": |
There was a problem hiding this comment.
Using plain string comparison for shapely.__version__ can be incorrect; switch to a version parser (e.g., packaging.version.parse) to ensure accurate comparisons.
| if shapely.__version__ < "2.1.0": | |
| if parse(shapely.__version__) < parse("2.1.0"): |
| sgpd_result = GeoSeries(geom).make_valid(method="structure") | ||
| gpd_result = gpd.GeoSeries(geom).make_valid(method="structure") | ||
| self.check_sgpd_equals_gpd(sgpd_result, gpd_result) | ||
|
|
||
| for _, geom in self.geoms: | ||
| sgpd_result = GeoSeries(geom).make_valid( |
There was a problem hiding this comment.
Passing a single geometry object directly to GeoSeries treats it as a sequence of coords. Wrap geom in a list (e.g. GeoSeries([geom])) to construct a series with one element.
| sgpd_result = GeoSeries(geom).make_valid(method="structure") | |
| gpd_result = gpd.GeoSeries(geom).make_valid(method="structure") | |
| self.check_sgpd_equals_gpd(sgpd_result, gpd_result) | |
| for _, geom in self.geoms: | |
| sgpd_result = GeoSeries(geom).make_valid( | |
| sgpd_result = GeoSeries([geom]).make_valid(method="structure") | |
| gpd_result = gpd.GeoSeries(geom).make_valid(method="structure") | |
| self.check_sgpd_equals_gpd(sgpd_result, gpd_result) | |
| for _, geom in self.geoms: | |
| sgpd_result = GeoSeries([geom]).make_valid( |
…valid (apache#2044) * Implement is_valid_reason * Implement make_valid * Skip match make_valid tests for shapely < 2.1.0 * Skip match is_valid_reason for gpd version < 1.0.0 * Copy over modified check_sgpd_equals_gpd * Use sedona instead of geopandas in doc strings
Did you read the Contributor Guide?
Is this PR related to a ticket?
[GH-XXX] my subject. Closes Geopandas.GeoSeries: Implement is_valid_reason, make_valid #2043What changes were proposed in this PR?
How was this patch tested?
Added tests
Did this PR include necessary documentation updates?