Skip to content

bootstrap_matrix is too large even for Stage III-sized catalogs #259

@joezuntz

Description

@joezuntz

Two of the core methods, the naive stacker and the point estimate histogram, use a bootstrap matrix to compute indices for bootstrap resampling. This matrix is size (n_gal, n_bootstrap) and a copy is sent to every MPI process.

The memory use for this is infeasible even for Stage III-sized catalogs. DES Y3 is 400M objects, and even for a modest bootstrap size of 20, this is 8B integers stored on every process. One improvement would be to use an MPI window to share a single matrix over every process, but even then it's still quite large, about 30GB.

I can't immediately see a way to restructure the bootstrap calculation to be per-chunk.

The best option will depend on the numbers involved, especially the number of bootstraps. If you can fit one copy of the matrix on the node then sharing it would probably be easiest.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions