Skip to content

DRAFT: Construct a spatialpandas.spatialindex.HilbertRtree for fast bounding box queries #1215

Closed
philipc2 wants to merge 8 commits into
mainfrom
bounds-rtree
Closed

DRAFT: Construct a spatialpandas.spatialindex.HilbertRtree for fast bounding box queries #1215
philipc2 wants to merge 8 commits into
mainfrom
bounds-rtree

Conversation

@philipc2
Copy link
Copy Markdown
Member

@philipc2 philipc2 commented Apr 18, 2025

Closes #XXX

Overview

  • Constructs a spatialpandas.spatialindex.HilbertRTree for doing spatial queries on the bounds of each face.
  • Chosen over the rtree package since the implementation in SpatialPandas is Numba compatible.

@philipc2 philipc2 self-assigned this Apr 18, 2025
@philipc2
Copy link
Copy Markdown
Member Author

@fluidnumerics-joe

I wonder if this would be of any use for your applications, since this should allow for similar functionality to the SpatialHash.

In addition to point containment tests, this would allow us to determine the intersections between two bounds, which is especially usefull if you want to quickly determine candidate faces between a source and destination grid.

@aaronzedwick

Since this is written with Numba, we should be able to integrate this into your Point in Polygon functionality to significantly improve performance.

@philipc2 philipc2 changed the title DRAFT: Construct spatialpandas.spatialindex.HilbertRtree for bounding box queries DRAFT: Construct a spatialpandas.spatialindex.HilbertRtree for fast bounding box queries Apr 18, 2025
@aaronzedwick
Copy link
Copy Markdown
Member

@fluidnumerics-joe

I wonder if this would be of any use for your applications, since this should allow for similar functionality to the SpatialHash.

In addition to point containment tests, this would allow us to determine the intersections between two bounds, which is especially usefull if you want to quickly determine candidate faces between a source and destination grid.

@aaronzedwick

Since this is written with Numba, we should be able to integrate this into your Point in Polygon functionality to significantly improve performance.

Awesome! I will check out the branch and see if I can integrate it with my function. This looks really helpful tho!

@aaronzedwick
Copy link
Copy Markdown
Member

How quickly can this be built on a larger file? I know that KDTree is slow, so I am interested to know how quick this one is compared to our other methods. Mind if I add a benchmark to the already created tree construction benchmarks, just to compare?

Comment thread uxarray/grid/grid.py
else:
setattr(self, var_name, grid_var.chunk())

def get_rtree(self, p: int = 10, page_size: int = 512, reconstruct: bool = False):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we name this get_r_tree just to be consistent with how we name our other tree functions?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestions.

@philipc2
Copy link
Copy Markdown
Member Author

How quickly can this be built on a larger file? I know that KDTree is slow, so I am interested to know how quick this one is compared to our other methods. Mind if I add a benchmark to the already created tree construction benchmarks, just to compare?

This is a good idea!

It takes about 400ms to construct it for a 30km MPAS grid (655,362 faces)

The kd-tree takes 170ms

@aaronzedwick
Copy link
Copy Markdown
Member

aaronzedwick commented Apr 19, 2025

How quickly can this be built on a larger file? I know that KDTree is slow, so I am interested to know how quick this one is compared to our other methods. Mind if I add a benchmark to the already created tree construction benchmarks, just to compare?

This is a good idea!

It takes about 400ms to construct it for a 30km MPAS grid (655,362 faces)

The kd-tree takes 170ms

I added some benchmarks:) Interesting results, I noticed the same, that the construction of the KDTree takes less time. However, the query time is faster for the RTree by a little bit compared to the other trees.

@aaronzedwick aaronzedwick added the run-benchmark Run ASV benchmark workflow label Apr 19, 2025
@github-actions
Copy link
Copy Markdown

ASV Benchmarking

Benchmark Comparison Results

Benchmarks that have improved:

Change Before [68d14f3] After [4b35596] Ratio Benchmark (Parameter)
- 767M 437M 0.57 face_bounds.FaceBounds.peakmem_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/ugrid/geoflow-small/grid.nc'))
failed 435M n/a face_bounds.FaceBounds.peakmem_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/ugrid/quad-hexagon/grid.nc'))
- 828±20ns 752±6ns 0.91 mpas_ocean.ConstructTreeStructures.time_ball_tree('120km')
- 663±20ns 555±10ns 0.84 mpas_ocean.ConstructTreeStructures.time_kd_tree('120km')
failed 253±0.9ms n/a mpas_ocean.ConstructTreeStructures.time_r_tree('120km')
failed 22.0±0.2ms n/a mpas_ocean.ConstructTreeStructures.time_r_tree('480km')
- 500M 390M 0.78 mpas_ocean.Integrate.peakmem_integrate('480km')
failed 86.6±1μs n/a mpas_ocean.QueryTreeStructures.time_ball_tree
failed 65.6±0.4μs n/a mpas_ocean.QueryTreeStructures.time_kd_tree
failed 72.3±5μs n/a mpas_ocean.QueryTreeStructures.time_r_tree

Benchmarks that have stayed the same:

Change Before [68d14f3] After [4b35596] Ratio Benchmark (Parameter)
435M 435M 1 face_bounds.FaceBounds.peakmem_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/mpas/QU/oQU480.231010.nc'))
465M 467M 1 face_bounds.FaceBounds.peakmem_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/scrip/outCSne8/outCSne8.nc'))
19.9±0.1ms 19.7±0.1ms 0.99 face_bounds.FaceBounds.time_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/mpas/QU/oQU480.231010.nc'))
7.35±0.04ms 7.34±0.04ms 1 face_bounds.FaceBounds.time_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/scrip/outCSne8/outCSne8.nc'))
43.6±0.2ms 43.3±0.1ms 0.99 face_bounds.FaceBounds.time_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/ugrid/geoflow-small/grid.nc'))
3.65±0.1ms 3.69±0.1ms 1.01 face_bounds.FaceBounds.time_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/ugrid/quad-hexagon/grid.nc'))
7.81±0.01s 7.82±0.01s 1 import.Imports.timeraw_import_uxarray
757±10μs 754±3μs 1 mpas_ocean.CheckNorm.time_check_norm('120km')
497±3μs 512±3μs 1.03 mpas_ocean.CheckNorm.time_check_norm('480km')
657±6ms 649±7ms 0.99 mpas_ocean.ConnectivityConstruction.time_face_face_connectivity('120km')
42.6±1ms 41.3±0.5ms 0.97 mpas_ocean.ConnectivityConstruction.time_face_face_connectivity('480km')
5.68±0.05μs 5.68±0.1μs 1 mpas_ocean.ConnectivityConstruction.time_n_nodes_per_face('120km')
5.74±0.06μs 5.63±0.06μs 0.98 mpas_ocean.ConnectivityConstruction.time_n_nodes_per_face('480km')
5.08±0.03ms 5.06±0.01ms 1 mpas_ocean.ConstructFaceLatLon.time_cartesian_averaging('120km')
3.81±0.02ms 3.80±0.02ms 1 mpas_ocean.ConstructFaceLatLon.time_cartesian_averaging('480km')
3.49±0.01s 3.57±0.02s 1.02 mpas_ocean.ConstructFaceLatLon.time_welzl('120km')
223±2ms 227±2ms 1.02 mpas_ocean.ConstructFaceLatLon.time_welzl('480km')
314±20ns 290±1ns 0.92 mpas_ocean.ConstructTreeStructures.time_ball_tree('480km')
324±20ns 302±5ns 0.93 mpas_ocean.ConstructTreeStructures.time_kd_tree('480km')
449±3ms 452±2ms 1.01 mpas_ocean.CrossSections.time_const_lat('120km', 1)
231±1ms 231±2ms 1 mpas_ocean.CrossSections.time_const_lat('120km', 2)
119±2ms 119±1ms 1 mpas_ocean.CrossSections.time_const_lat('120km', 4)
377±3ms 375±3ms 1 mpas_ocean.CrossSections.time_const_lat('480km', 1)
192±1ms 189±0.6ms 0.98 mpas_ocean.CrossSections.time_const_lat('480km', 2)
97.6±1ms 96.6±0.3ms 0.99 mpas_ocean.CrossSections.time_const_lat('480km', 4)
129±4ms 127±0.9ms 0.98 mpas_ocean.DualMesh.time_dual_mesh_construction('120km')
9.71±0.3ms 9.29±0.04ms 0.96 mpas_ocean.DualMesh.time_dual_mesh_construction('480km')
1.63±0.01s 1.60±0.02s 0.98 mpas_ocean.GeoDataFrame.time_to_geodataframe('120km', False)
1.24±0.01ms 1.20±0.01ms 0.97 mpas_ocean.GeoDataFrame.time_to_geodataframe('120km', True)
128±1ms 125±1ms 0.97 mpas_ocean.GeoDataFrame.time_to_geodataframe('480km', False)
4.98±0.1ms 4.92±0.05ms 0.99 mpas_ocean.GeoDataFrame.time_to_geodataframe('480km', True)
394M 394M 1 mpas_ocean.Gradient.peakmem_gradient('120km')
376M 376M 1 mpas_ocean.Gradient.peakmem_gradient('480km')
2.70±0ms 2.70±0.01ms 1 mpas_ocean.Gradient.time_gradient('120km')
317±2μs 313±1μs 0.99 mpas_ocean.Gradient.time_gradient('480km')
212±0.9μs 212±3μs 1 mpas_ocean.HoleEdgeIndices.time_construct_hole_edge_indices('120km')
118±1μs 119±1μs 1.01 mpas_ocean.HoleEdgeIndices.time_construct_hole_edge_indices('480km')
404M 404M 1 mpas_ocean.Integrate.peakmem_integrate('120km')
146±0.5ms 145±0.7ms 0.99 mpas_ocean.Integrate.time_integrate('120km')
10.3±0.3ms 10.5±0.4ms 1.02 mpas_ocean.Integrate.time_integrate('480km')
348±2ms 340±2ms 0.98 mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('120km', 'exclude')
345±2ms 342±2ms 0.99 mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('120km', 'include')
352±7ms 341±0.8ms 0.97 mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('120km', 'split')
22.4±0.2ms 22.2±0.1ms 0.99 mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('480km', 'exclude')
23.0±0.5ms 22.5±0.2ms 0.97 mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('480km', 'include')
22.7±0.3ms 22.0±0.1ms 0.97 mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('480km', 'split')
4.53±0.03ms 4.60±0.05ms 1.01 mpas_ocean.PointInPolygon.time_face_search('120km')
4.64±0.04ms 4.65±0.06ms 1 mpas_ocean.PointInPolygon.time_face_search('480km')
55.6±0.3ms 55.2±0.2ms 0.99 mpas_ocean.RemapDownsample.time_inverse_distance_weighted_remapping
44.7±0.05ms 44.8±0.2ms 1 mpas_ocean.RemapDownsample.time_nearest_neighbor_remapping
355±1ms 354±0.7ms 1 mpas_ocean.RemapUpsample.time_inverse_distance_weighted_remapping
261±0.7ms 260±0.8ms 1 mpas_ocean.RemapUpsample.time_nearest_neighbor_remapping
24.7±0.09ms 24.5±0.3ms 0.99 mpas_ocean.ZonalAverage.time_zonal_average('120km')
4.74±0.01ms 4.51±0.06ms 0.95 mpas_ocean.ZonalAverage.time_zonal_average('480km')
376M 373M 0.99 quad_hexagon.QuadHexagon.peakmem_open_dataset
372M 372M 1 quad_hexagon.QuadHexagon.peakmem_open_grid
7.40±0.04ms 7.57±0.02ms 1.02 quad_hexagon.QuadHexagon.time_open_dataset
6.50±0.07ms 6.49±0.1ms 1 quad_hexagon.QuadHexagon.time_open_grid

@philipc2 philipc2 removed the run-benchmark Run ASV benchmark workflow label Apr 19, 2025
@philipc2
Copy link
Copy Markdown
Member Author

Closing this as not planned for now

@philipc2 philipc2 closed this Jul 10, 2025
@erogluorhan erogluorhan deleted the bounds-rtree branch September 26, 2025 17:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants