Skip to content

POST /statistics causes excessive RAM usage due to hardcoded cover_scale=10 in get_coverage_array #1398

@rsemlal-murmuration

Description

@rsemlal-murmuration

Problem description

While using the endpoint POST /statistics of TiTiler on a 7.6MB COG file, we noticed that the memory consumption of TiTiler peaks at ~2GB of RAM, which seems excessive to us.

Expected Output

We expect the call to consume at worst something in the order of magnitude of the uncompressed size of the COG.

Problem Analysis

After using a profiler (memray) & some additional investigation it turns out that the memory consumption peak happens during the call to the method get_coverage_array in rio-tiler (call site)

This function, when called without explicitly specifying a value for cover_scale, falls back to 10, which is the suspected multiplicative factor causing the memory consumption peak.

Details

This means that the implementation, when trying to assess the pixel coverage on the boundaries of the provided GeoJSON features, will allocate an intermediate array of shape h x cover_scale, w x cover_scale as uint8, then immediately convert it to float32: both arrays being live simultaneously during .astype().

Here h x w is the pixel extent of the image cropped to the feature's bounding box at native resolution.

In our case it explained the ~2GB memory peak:

  • COG native resolution: 2346 x 1633 px (feature covering almost the full extent => h=1633, w=2346)
  • uint8 array: 1633 x 10 x 2346 x 10 x 1 byte ~= 383 MB
  • float32 array (live at the same time): x 4 ~= 1.53 GB
  • Total peak: ~1.9 GB, for a raster whose uncompressed size is ~15 MB

Why this is an issue

Obviously, there is a tradeoff here between memory consumption and edge-coverage accuracy.

For the case of the /statistics endpoint, we can argue that the edge-coverage accuracy is not worth the memory peak, or at least that the adequate accuracy is use case specific.

Suggested change

  • Expose cover_scale as query param on the endpoint POST /statistics.
  • Defaulting to cover_scale=1 to reduce peak consumption, callers requiring higher precision can opt in explicitly.

Environment Information

  • Python: 3.11
  • titiler: 2.0.2
  • rio-tiler: 9.0.6
  • rasterio: 1.4.4

PS: Thanks for your work on TiTiler, it is a great tool!

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions