Skip to content

Add match_wsp_to_tracks for WSP polygon-to-track matching#39

Open
t-downing wants to merge 15 commits into
mainfrom
wsp-exp-calc
Open

Add match_wsp_to_tracks for WSP polygon-to-track matching#39
t-downing wants to merge 15 commits into
mainfrom
wsp-exp-calc

Conversation

@t-downing
Copy link
Copy Markdown

Summary

  • Adds match_wsp_to_tracks(gdf_wsp, gdf_tracks) to utils/storm.py to assign an atcf_id to each row in storms.nhc_wsp_polygon by matching against storms.nhc_tracks_geo
  • For issued_times with a single active storm: direct merge, no geometry work needed
  • For issued_times with multiple active storms: explodes MultiPolygon components and matches each to the track with the most forecast points inside — using hole-filled geometry (_filled_geom) to handle the donut-shaped probability-band polygons
  • Rows with no corresponding tracks are returned with atcf_id=None

Test plan

  • Load nhc_wsp_polygon and nhc_tracks_geo from the dev DB and call match_wsp_to_tracks; verify all rows with tracks get a non-null atcf_id
  • For a known multi-storm issued_time, confirm the exploded polygon components are assigned to the correct atcf_ids
  • Confirm the returned GeoDataFrame has ≥ as many rows as the input (MultiPolygons only expand, never shrink)

Adds three functions to utils/storm.py:
- _filled_geom: strips interior rings so containment tests work on donut-shaped probability-band polygons
- _best_atcf_for_polygon: scores atcf_id candidates by track-point containment, with centroid fallback
- match_wsp_to_tracks: matches storms.nhc_wsp_polygon rows to storms.nhc_tracks_geo by issued_time, handling single-storm (direct merge), multi-storm (explode + spatial), and no-track cases
@t-downing t-downing changed the base branch from usa-wind-buffers to main April 30, 2026 22:48
t-downing and others added 7 commits May 14, 2026 13:41
match_wsp_to_tracks now matches a WSP polygon part to an atcf_id only
when that storm's track LineString (the polyline through its track
points) intersects the hole-filled polygon, at either the WSP's
issued_time or +3h (WSPs publish ~3h after the nominal advisory cycle).
No centroid-distance fallback — unmatched parts return atcf_id=None.

Why: track points are sparse (12–24h apart) while inner probability
bands can be narrow ribbons. Point-in-polygon matching was leaking
multi-part 60–80% bands to the wrong storm.

Also adds pandera schemas for the downstream WSP tables defined in
ds-storms-pipeline: WSP_POLYGON_MATCHED_SCHEMA,
WSP_FCASTONLY_POLYGON_SCHEMA, WSP_EXPOSURE_SCHEMA,
WSP_FCASTONLY_EXPOSURE_SCHEMA. Adds a tqdm progress bar to
_load_nhc_wsp_archive for the long historical loads.
…acks

After the strict line-intersection pass, an unmatched polygon part now
falls back to checking whether it sits fully inside any already-matched
polygon at the same (issued_time, wind_threshold_kt). The smallest
qualifying container wins; its atcf_id is inherited.

Parts are processed in ascending percentage order within each
(issued_time, wind_threshold_kt) so the big outer bands get assigned via
line-intersection first and are available as containers for the small
inner bands that follow. _filled_geom is used for both sides of the
containment test so a part sitting in the donut hole of an annular band
of the same storm is treated as contained — important because NHC WSP
bands are nested annuli (90% in the hole of 70% in the hole of 50%...).

New keyword arg `extra_containers`: a caller-supplied GeoDataFrame of
already-matched polygons that participate as containment donors only,
never re-matched. Enables surgical "fill-NULLs" rebuilds where only the
currently-unmatched parts of an existing matched table are re-processed.
…litting

calculate_wind_buffers_gdf finishes by reprojecting from a basin-specific
lon_wrap CRS back to EPSG:4326, then naively clipped the result with
`.intersection(box(-180, -90, 180, 90))`. Two problems:

1. For storms whose cone crosses the dateline, the reprojected polygon
   has vertices near both ±180 that shapely interprets as the long-way-
   around polygon — a wraparound shape covering the entire globe.
   `.intersection(world)` doesn't fix this; downstream `.intersects()`
   then false-matches every country on Earth.
2. Some inputs are also self-intersecting at the dateline, so the
   intersection itself throws `GEOSException: TopologyException: side
   location conflict`.

Replace with `antimeridian.fix_shape(make_valid(geom))`. make_valid
repairs the self-intersection; fix_shape splits a dateline-crossing
polygon into a MultiPolygon with parts on each side of ±180.

Adds `antimeridian>=0.4.0` to dependencies.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
antimeridian.fix_polygon only accepts Polygon; on a MultiPolygon
(which calculate_wind_buffers_gdf can emit when projecting back from
the lon_wrap CRS) it raises AttributeError. Dispatch on geom_type so
MultiPolygons go through fix_multi_polygon.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
make_valid run first on a dateline-crossing polygon (which shapely
misreads as self-intersecting at ±180) was slicing the polygon along
latitude lines to "repair" the apparent self-intersection. Result:
extra MultiPolygon parts on lat boundaries instead of just the two
parts on each side of ±180.

Reorder: antimeridian.fix_(multi_)polygon first — it understands the
dateline-crossing semantics and splits cleanly at ±180 — then
make_valid only as a fallback if the result still isn't valid.

Verified: EP102018 34kt buffer now stores as a 2-part MultiPolygon
(one west of dateline, one east), down from 4 parts with spurious
latitude-line splits.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
antimeridian.fix_multi_polygon iterates parts and calls fix_polygon on
each — if one sub-polygon has <4 unique vertices after dedup (very tiny
or collinear buffers from low-intensity storms), the whole MultiPolygon
call raises ValueError, killing the buffer-pipeline run.

Iterate parts ourselves; catch ValueError per part and pass the bad
part through unchanged. Degenerate polygons have ~zero area and
contribute ~zero to downstream exposure either way.

Also pass fix_winding=True to silence the FixWindingWarning that fires
on every clockwise-wound polygon (most of them) — the silenced behavior
is what we want.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant