Two of the core methods, the naive stacker and the point estimate histogram, use a bootstrap matrix to compute indices for bootstrap resampling. This matrix is size (n_gal, n_bootstrap) and a copy is sent to every MPI process.
The memory use for this is infeasible even for Stage III-sized catalogs. DES Y3 is 400M objects, and even for a modest bootstrap size of 20, this is 8B integers stored on every process. One improvement would be to use an MPI window to share a single matrix over every process, but even then it's still quite large, about 30GB.
I can't immediately see a way to restructure the bootstrap calculation to be per-chunk.
The best option will depend on the numbers involved, especially the number of bootstraps. If you can fit one copy of the matrix on the node then sharing it would probably be easiest.
Two of the core methods, the naive stacker and the point estimate histogram, use a bootstrap matrix to compute indices for bootstrap resampling. This matrix is size
(n_gal, n_bootstrap)and a copy is sent to every MPI process.The memory use for this is infeasible even for Stage III-sized catalogs. DES Y3 is 400M objects, and even for a modest bootstrap size of 20, this is 8B integers stored on every process. One improvement would be to use an MPI window to share a single matrix over every process, but even then it's still quite large, about 30GB.
I can't immediately see a way to restructure the bootstrap calculation to be per-chunk.
The best option will depend on the numbers involved, especially the number of bootstraps. If you can fit one copy of the matrix on the node then sharing it would probably be easiest.