POST `/statistics` causes excessive RAM usage due to hardcoded cover_scale=10 in get_coverage_array

#### Problem description

While using the endpoint `POST /statistics` of TiTiler on a 7.6MB COG file, we noticed that the memory consumption of TiTiler peaks at ~2GB of RAM, which seems excessive to us.

#### Expected Output

We expect the call to consume **at worst** something in the order of magnitude of the uncompressed size of the COG.

#### Problem Analysis

After using a profiler (memray) & some additional investigation it turns out that the memory consumption peak happens during the call to the method `get_coverage_array` in `rio-tiler` ([call site](https://github.com/developmentseed/titiler/blob/main/src/titiler/core/titiler/core/factory.py#L535))

This function, when called without explicitly specifying a value for `cover_scale`, falls back to `10`, which is the suspected multiplicative factor causing the memory consumption peak.

<details>
<summary>Details</summary>

This means that the implementation, when trying to assess the pixel coverage on the boundaries of the provided GeoJSON features, will allocate an intermediate array of shape `h x cover_scale, w x cover_scale` as `uint8`, then immediately convert it to `float32`: both arrays being live simultaneously during .astype().

Here `h x w` is the pixel extent of the image cropped to the feature's bounding box at native resolution.

In our case it explained the ~2GB memory peak:

- COG native resolution: 2346 x 1633 px (feature covering almost the full extent => h=1633, w=2346)
- `uint8` array: 1633 x 10 x 2346 x 10 x 1 byte ~= 383 MB
- `float32` array (live at the same time): x 4 ~= 1.53 GB
- Total peak: ~1.9 GB, for a raster whose uncompressed size is ~15 MB

</details>


#### Why this is an issue

Obviously, there is a tradeoff here between memory consumption and edge-coverage accuracy.

For the case of the /statistics endpoint, we can argue that the edge-coverage accuracy is not worth the memory peak, or at least that the adequate accuracy is use case specific.

#### Suggested change

- Expose `cover_scale` as query param on the endpoint `POST /statistics`.
- Defaulting to cover_scale=1 to reduce peak consumption, callers requiring higher precision can opt in explicitly.

#### Environment Information

- Python: 3.11
- titiler: 2.0.2
- rio-tiler: 9.0.6
- rasterio: 1.4.4

PS: Thanks for your work on TiTiler, it is a great tool!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

POST `/statistics` causes excessive RAM usage due to hardcoded cover_scale=10 in get_coverage_array #1398

Problem description

Expected Output

Problem Analysis

Why this is an issue

Suggested change

Environment Information

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

POST /statistics causes excessive RAM usage due to hardcoded cover_scale=10 in get_coverage_array #1398

Description

Problem description

Expected Output

Problem Analysis

Why this is an issue

Suggested change

Environment Information

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

POST `/statistics` causes excessive RAM usage due to hardcoded cover_scale=10 in get_coverage_array #1398