Merge pull request #700 from ochase10/seedfix

sbird · web-flow · commit c29f379b3b58 · 2025-11-07T14:20:37.000-08:00
What I am updating here is the code for generating log normal mock catalogs.

How those work is by first using a random number to create a density field which follows some power spectrum. You need the random number because there are many possible random fields for a given power spectrum. Once you have a density field, you then look in each grid cell and do Poisson sampling to determine whether there is a source there. This is, of course, random.

Therefore, there are 2 separate stages of randomness involved in making a mock catalog. First, one needs the initial conditions to determine the exact density field, and two, the density field needs to be randomly sampled to generate sources.

The way the code currently works is to use the same seed for both of these random processes (making the density field and sampling it). Now suppose I make a mock catalog and realize it was too small. If I use a new seed for the density field (with the same power spectrum), the two mocks will be incompatible at the field level because the specific locations of the density peaks would be uncorrelated between them. Their power spectra would match, but if you combined them and computed a power spectrum you would not get the right answer. However, in the current implementation, the only way to use the same initial conditions (same density field) again for a new mock is to also use the same Poisson sampling. In other words, there is only 1 random sample of sources possible from each random density field. So, if I ever want to increase the size of my mock, I have to recreate all the sources I already have. This is an issue not only due to wasted compute, it limits the size of mocks I can possibly make to the density I can fit in the RAM. If I want anything bigger (more dense), I simply cannot do it because I will always get the same source catalog from a given set of initial conditions.

What my update does is separate the seeds for these two random processes to allow for each set of initial conditions to result in myriad possible source instantiations. If only the standard 'seed' is provided or no seed is provided at all, the behavior is identical to before and should preserve the functionality of all legacy code (I think). There is simply a new argument which allows me to fix the initial conditions (using the random seed for the density field) while leaving the sampling seed free to vary. Or, at least, that is its intention.
diff --git a/nbodykit/source/catalog/lognormal.py b/nbodykit/source/catalog/lognormal.py
@@ -29,6 +29,9 @@ class LogNormalCatalog(CatalogSource):
     seed : int, optional
         the global random seed; if set to ``None``, the seed will be set
         randomly
+    cosmo_seed : int, optional
+        the random seed used for constructing the density box; if set to ``None``,
+        this will be set to 'seed'
     cosmo : :class:`nbodykit.cosmology.core.Cosmology`, optional
         this must be supplied if ``Plin`` does not carry ``cosmo`` attribute
     redshift : float, optional
@@ -43,13 +46,13 @@ class LogNormalCatalog(CatalogSource):
     `Agrawal et al. 2017 <https://arxiv.org/abs/1706.09195>`_
     """
     def __repr__(self):
-        return "LogNormalCatalog(seed=%(seed)d, bias=%(bias)g)" %self.attrs
+        return "LogNormalCatalog(seed=%(seed)d, cosmo_seed=%(cosmo_seed)d, bias=%(bias)g)" %self.attrs
 
     logger = logging.getLogger("LogNormalCatalog")
 
     @CurrentMPIComm.enable
     def __init__(self, Plin, nbar, BoxSize, Nmesh, bias=2., seed=None,
-                    cosmo=None, redshift=None,
+                    cosmo_seed=None, cosmo=None, redshift=None,
                     unitary_amplitude=False, inverted_phase=False, comm=None):
 
         self.comm = comm
@@ -83,7 +86,14 @@ def __init__(self, Plin, nbar, BoxSize, Nmesh, bias=2., seed=None,
         if seed is None:
             if self.comm.rank == 0:
                 seed = numpy.random.randint(0, 4294967295)
+                if cosmo_seed is None:
+                    cosmo_seed = seed
+            cosmo_seed = self.comm.bcast(cosmo_seed)
             seed = self.comm.bcast(seed)
+        elif cosmo_seed is None:
+            cosmo_seed = seed
+
+        self.attrs['cosmo_seed'] = cosmo_seed
         self.attrs['seed'] = seed
 
         # make the actual source
@@ -141,7 +151,7 @@ def _makesource(self, BoxSize, Nmesh):
             self.logger.info("Growth Rate is %g" % f)
 
         # compute the linear overdensity and displacement fields
-        delta, disp = mockmaker.gaussian_real_fields(pm, self.Plin, self.attrs['seed'],
+        delta, disp = mockmaker.gaussian_real_fields(pm, self.Plin, self.attrs['cosmo_seed'],
                     unitary_amplitude=self.attrs['unitary_amplitude'],
                     inverted_phase=self.attrs['inverted_phase'],
                     compute_displacement=True,