Add match_wsp_to_tracks for WSP polygon-to-track matching#39
Open
t-downing wants to merge 15 commits into
Open
Add match_wsp_to_tracks for WSP polygon-to-track matching#39t-downing wants to merge 15 commits into
t-downing wants to merge 15 commits into
Conversation
Adds three functions to utils/storm.py: - _filled_geom: strips interior rings so containment tests work on donut-shaped probability-band polygons - _best_atcf_for_polygon: scores atcf_id candidates by track-point containment, with centroid fallback - match_wsp_to_tracks: matches storms.nhc_wsp_polygon rows to storms.nhc_tracks_geo by issued_time, handling single-storm (direct merge), multi-storm (explode + spatial), and no-track cases
…world intersection
… before world intersection" This reverts commit 19fb07c.
match_wsp_to_tracks now matches a WSP polygon part to an atcf_id only when that storm's track LineString (the polyline through its track points) intersects the hole-filled polygon, at either the WSP's issued_time or +3h (WSPs publish ~3h after the nominal advisory cycle). No centroid-distance fallback — unmatched parts return atcf_id=None. Why: track points are sparse (12–24h apart) while inner probability bands can be narrow ribbons. Point-in-polygon matching was leaking multi-part 60–80% bands to the wrong storm. Also adds pandera schemas for the downstream WSP tables defined in ds-storms-pipeline: WSP_POLYGON_MATCHED_SCHEMA, WSP_FCASTONLY_POLYGON_SCHEMA, WSP_EXPOSURE_SCHEMA, WSP_FCASTONLY_EXPOSURE_SCHEMA. Adds a tqdm progress bar to _load_nhc_wsp_archive for the long historical loads.
…acks After the strict line-intersection pass, an unmatched polygon part now falls back to checking whether it sits fully inside any already-matched polygon at the same (issued_time, wind_threshold_kt). The smallest qualifying container wins; its atcf_id is inherited. Parts are processed in ascending percentage order within each (issued_time, wind_threshold_kt) so the big outer bands get assigned via line-intersection first and are available as containers for the small inner bands that follow. _filled_geom is used for both sides of the containment test so a part sitting in the donut hole of an annular band of the same storm is treated as contained — important because NHC WSP bands are nested annuli (90% in the hole of 70% in the hole of 50%...). New keyword arg `extra_containers`: a caller-supplied GeoDataFrame of already-matched polygons that participate as containment donors only, never re-matched. Enables surgical "fill-NULLs" rebuilds where only the currently-unmatched parts of an existing matched table are re-processed.
…litting calculate_wind_buffers_gdf finishes by reprojecting from a basin-specific lon_wrap CRS back to EPSG:4326, then naively clipped the result with `.intersection(box(-180, -90, 180, 90))`. Two problems: 1. For storms whose cone crosses the dateline, the reprojected polygon has vertices near both ±180 that shapely interprets as the long-way- around polygon — a wraparound shape covering the entire globe. `.intersection(world)` doesn't fix this; downstream `.intersects()` then false-matches every country on Earth. 2. Some inputs are also self-intersecting at the dateline, so the intersection itself throws `GEOSException: TopologyException: side location conflict`. Replace with `antimeridian.fix_shape(make_valid(geom))`. make_valid repairs the self-intersection; fix_shape splits a dateline-crossing polygon into a MultiPolygon with parts on each side of ±180. Adds `antimeridian>=0.4.0` to dependencies. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
antimeridian.fix_polygon only accepts Polygon; on a MultiPolygon (which calculate_wind_buffers_gdf can emit when projecting back from the lon_wrap CRS) it raises AttributeError. Dispatch on geom_type so MultiPolygons go through fix_multi_polygon. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
make_valid run first on a dateline-crossing polygon (which shapely misreads as self-intersecting at ±180) was slicing the polygon along latitude lines to "repair" the apparent self-intersection. Result: extra MultiPolygon parts on lat boundaries instead of just the two parts on each side of ±180. Reorder: antimeridian.fix_(multi_)polygon first — it understands the dateline-crossing semantics and splits cleanly at ±180 — then make_valid only as a fallback if the result still isn't valid. Verified: EP102018 34kt buffer now stores as a 2-part MultiPolygon (one west of dateline, one east), down from 4 parts with spurious latitude-line splits. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
antimeridian.fix_multi_polygon iterates parts and calls fix_polygon on each — if one sub-polygon has <4 unique vertices after dedup (very tiny or collinear buffers from low-intensity storms), the whole MultiPolygon call raises ValueError, killing the buffer-pipeline run. Iterate parts ourselves; catch ValueError per part and pass the bad part through unchanged. Degenerate polygons have ~zero area and contribute ~zero to downstream exposure either way. Also pass fix_winding=True to silence the FixWindingWarning that fires on every clockwise-wound polygon (most of them) — the silenced behavior is what we want. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
match_wsp_to_tracks(gdf_wsp, gdf_tracks)toutils/storm.pyto assign anatcf_idto each row instorms.nhc_wsp_polygonby matching againststorms.nhc_tracks_geo_filled_geom) to handle the donut-shaped probability-band polygonsatcf_id=NoneTest plan
nhc_wsp_polygonandnhc_tracks_geofrom the dev DB and callmatch_wsp_to_tracks; verify all rows with tracks get a non-nullatcf_idatcf_ids