bootstrap_matrix is too large even for Stage III-sized catalogs

Two of the core methods, the naive stacker and the point estimate histogram, use a bootstrap matrix to compute indices for bootstrap resampling.  This matrix is size `(n_gal, n_bootstrap)` and a copy is sent to every MPI process.

The memory use for this is infeasible even for Stage III-sized catalogs.  DES Y3 is 400M objects, and even for a modest bootstrap size of 20, this is 8B integers stored on every process. One improvement would be to use an MPI window to share a single matrix over every process, but even then it's still quite large, about 30GB.

I can't immediately see a way to restructure the bootstrap calculation to be per-chunk.

The best option will depend on the numbers involved, especially the number of bootstraps. If you can fit one copy of the matrix on the node then sharing it would probably be easiest.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bootstrap_matrix is too large even for Stage III-sized catalogs #259

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

bootstrap_matrix is too large even for Stage III-sized catalogs #259

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions