Problem description
While using the endpoint POST /statistics of TiTiler on a 7.6MB COG file, we noticed that the memory consumption of TiTiler peaks at ~2GB of RAM, which seems excessive to us.
Expected Output
We expect the call to consume at worst something in the order of magnitude of the uncompressed size of the COG.
Problem Analysis
After using a profiler (memray) & some additional investigation it turns out that the memory consumption peak happens during the call to the method get_coverage_array in rio-tiler (call site)
This function, when called without explicitly specifying a value for cover_scale, falls back to 10, which is the suspected multiplicative factor causing the memory consumption peak.
Details
This means that the implementation, when trying to assess the pixel coverage on the boundaries of the provided GeoJSON features, will allocate an intermediate array of shape h x cover_scale, w x cover_scale as uint8, then immediately convert it to float32: both arrays being live simultaneously during .astype().
Here h x w is the pixel extent of the image cropped to the feature's bounding box at native resolution.
In our case it explained the ~2GB memory peak:
- COG native resolution: 2346 x 1633 px (feature covering almost the full extent => h=1633, w=2346)
uint8 array: 1633 x 10 x 2346 x 10 x 1 byte ~= 383 MB
float32 array (live at the same time): x 4 ~= 1.53 GB
- Total peak: ~1.9 GB, for a raster whose uncompressed size is ~15 MB
Why this is an issue
Obviously, there is a tradeoff here between memory consumption and edge-coverage accuracy.
For the case of the /statistics endpoint, we can argue that the edge-coverage accuracy is not worth the memory peak, or at least that the adequate accuracy is use case specific.
Suggested change
- Expose
cover_scale as query param on the endpoint POST /statistics.
- Defaulting to cover_scale=1 to reduce peak consumption, callers requiring higher precision can opt in explicitly.
Environment Information
- Python: 3.11
- titiler: 2.0.2
- rio-tiler: 9.0.6
- rasterio: 1.4.4
PS: Thanks for your work on TiTiler, it is a great tool!
Problem description
While using the endpoint
POST /statisticsof TiTiler on a 7.6MB COG file, we noticed that the memory consumption of TiTiler peaks at ~2GB of RAM, which seems excessive to us.Expected Output
We expect the call to consume at worst something in the order of magnitude of the uncompressed size of the COG.
Problem Analysis
After using a profiler (memray) & some additional investigation it turns out that the memory consumption peak happens during the call to the method
get_coverage_arrayinrio-tiler(call site)This function, when called without explicitly specifying a value for
cover_scale, falls back to10, which is the suspected multiplicative factor causing the memory consumption peak.Details
This means that the implementation, when trying to assess the pixel coverage on the boundaries of the provided GeoJSON features, will allocate an intermediate array of shape
h x cover_scale, w x cover_scaleasuint8, then immediately convert it tofloat32: both arrays being live simultaneously during .astype().Here
h x wis the pixel extent of the image cropped to the feature's bounding box at native resolution.In our case it explained the ~2GB memory peak:
uint8array: 1633 x 10 x 2346 x 10 x 1 byte ~= 383 MBfloat32array (live at the same time): x 4 ~= 1.53 GBWhy this is an issue
Obviously, there is a tradeoff here between memory consumption and edge-coverage accuracy.
For the case of the /statistics endpoint, we can argue that the edge-coverage accuracy is not worth the memory peak, or at least that the adequate accuracy is use case specific.
Suggested change
cover_scaleas query param on the endpointPOST /statistics.Environment Information
PS: Thanks for your work on TiTiler, it is a great tool!