Skip to content

Memory explosion when disaggregating shapefiles with many regions #15

@irm-codebase

Description

@irm-codebase

I ran into this while trying to disaggregate Mexico's rooftop PV potential on a decently sized laptop.
Attempting to disaggregate the raster will result on a large amount of memory consumption, leading to crashes.

The files themselves are not that large (GeoTiff is ~30 MB, Geoparquet is ~40 MB).
The difference here is the number of regions (2000+).

I suspect this may be caused by rioxarray exploding its dimensionality to the number of shapes during processing.

Specs

  • OS: Fedora 42
  • RAM: 16 GB DDR5 SODIMM
  • Swap: 8 GB
  • CPU: i7 13k w/ 20 CPU threads

Reproduction

Input files

Can be found here
https://surfdrive.surf.nl/files/index.php/s/Z4bLHF38tc87T6J

Script

# %%
import math

import geopandas as gpd
import gregor
import pandas as pd
import rioxarray as rxr
from matplotlib import pyplot as plt

case = "MEX"
year = 2023

# %%
shapes_df = gpd.read_parquet(f"downloads/{case}/{case}.parquet")
countries_df = shapes_df[["country_id", "geometry"]].dissolve("country_id").reset_index()
case_df = countries_df[countries_df["country_id"] == case]
case_df.plot()

# %%
area_potential = rxr.open_rasterio(
    f"downloads/{case}/{case}.tif",
    chunks={"x": 1024, "y": 1024},
).squeeze()
area_potential = area_potential.rio.write_crs("EPSG:4326")

# %%
# Decide on a maximum number of pixels in the final plot
max_pixels = 50_000_000  # tweak this to taste

# Compute needed coarsening factor
nx, ny = area_potential.sizes["x"], area_potential.sizes["y"]
factor = math.ceil(math.sqrt((nx * ny) / max_pixels))
pixel_count = (nx // factor) * (ny // factor)
print(
    f"Downsampling factor: {factor} (output will be ~{pixel_count} pixels)"
)

# Coarsen (block-average) the data
coarse = area_potential.coarsen(x=factor, y=factor, boundary="trim").mean()

# Set up plot
fig, ax = plt.subplots(figsize=(8, 6), layout="constrained")


# Plot full extent of the coarsened raster
case_df.to_crs(area_potential.rio.crs).geometry.boundary.plot(
    ax=ax, color="black", aspect=None, linewidth=0.3, alpha=0.2
)
coarse.plot.imshow(
    ax=ax,
    cmap="Oranges",
    vmax=500,
    add_colorbar=True,
    cbar_kwargs={"location": "bottom", "label": "Area potential for PV"},
    alpha=1
)
ax.set_aspect("equal")
ax.set_xlabel("Longitude")
ax.set_ylabel("Latitude")
ax.set_title(f"{case} Rooftop PV Potential used for aggregation\n"
             f"(figure coarsened to ~{pixel_count:.1e} pixels)")

plt.show()

# %%
# WARNING: crash below!
case_df["to_disaggregate"] = 1282 # dummy value
aggregated_pv = gregor.disaggregate.disaggregate_polygon_to_raster(case_df, column="to_disaggregate", proxy=area_potential)

Dependencies

❯ pixi list --explicit
Package                Version     Build               Size       Kind   Source
cartopy                0.24.0      py312hf9745cd_0     1.5 MiB    conda  https://conda.anaconda.org/conda-forge/
click                  8.2.1       pyh707e725_0        85.7 KiB   conda  https://conda.anaconda.org/conda-forge/
clio-tools             2025.03.03  pyhd8ed1ab_0        14.2 KiB   conda  https://conda.anaconda.org/conda-forge/
conda                  25.3.1      py312h7900ff3_0     1.1 MiB    conda  https://conda.anaconda.org/conda-forge/
contextily             1.6.2       pyhd8ed1ab_1        20.3 KiB   conda  https://conda.anaconda.org/conda-forge/
dask                   2025.5.1    pyhe01879c_1        11.1 KiB   conda  https://conda.anaconda.org/conda-forge/
gdal                   3.10.3      py312hf1b357c_11    1.7 MiB    conda  https://conda.anaconda.org/conda-forge/
geopandas              1.0.1       pyhd8ed1ab_3        7.4 KiB    conda  https://conda.anaconda.org/conda-forge/
gregor                 0.0.3.dev0                                 pypi   git+https://github.com/jnnr/gregor.git?rev=4d54d11#4d54d1167ebb78de553c0439374ab936c03923ad
ipdb                   0.13.13     pyhd8ed1ab_1        18.3 KiB   conda  https://conda.anaconda.org/conda-forge/
ipykernel              6.29.5      pyh3099207_0        116.3 KiB  conda  https://conda.anaconda.org/conda-forge/
libgdal-arrow-parquet  3.10.3      h8ae71d8_11         807.9 KiB  conda  https://conda.anaconda.org/conda-forge/
libgdal-core           3.10.3      hcac4edf_11         10.3 MiB   conda  https://conda.anaconda.org/conda-forge/
mypy                   1.15.0      py312h66e93f0_0     17.8 MiB   conda  https://conda.anaconda.org/conda-forge/
pandera-geopandas      0.24.0      hd8ed1ab_2          7.3 KiB    conda  https://conda.anaconda.org/conda-forge/
pandera-pandas         0.24.0      hd8ed1ab_2          7.3 KiB    conda  https://conda.anaconda.org/conda-forge/
powerplantmatching     0.7.1       pyhd8ed1ab_0        661.1 KiB  conda  https://conda.anaconda.org/conda-forge/
pyarrow                19.0.1      py312h7900ff3_0     24.7 KiB   conda  https://conda.anaconda.org/conda-forge/
pycountry              24.6.1      pyhd8ed1ab_0        3 MiB      conda  https://conda.anaconda.org/conda-forge/
pystac-client          0.8.6       pyhd8ed1ab_0        35 KiB     conda  https://conda.anaconda.org/conda-forge/
pytest                 8.3.5       pyhd8ed1ab_0        253.7 KiB  conda  https://conda.anaconda.org/conda-forge/
python                 3.12.9      h9e4cc4f_1_cpython  30.2 MiB   conda  https://conda.anaconda.org/conda-forge/
rasterio               1.4.3       py312h021bea1_1     7.6 MiB    conda  https://conda.anaconda.org/conda-forge/
richdem                2.3.0       py312h546fd74_12    5.1 MiB    conda  https://conda.anaconda.org/conda-forge/
ruff                   0.11.4      py312h286b59f_0     8.6 MiB    conda  https://conda.anaconda.org/conda-forge/
snakefmt               0.11.0      pyhdfd78af_0        31.2 KiB   conda  https://conda.anaconda.org/bioconda/
snakemake-minimal      9.1.9       pyhdfd78af_0        848.4 KiB  conda  https://conda.anaconda.org/bioconda/

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions