nr-catalog-tools provides a stable, unified Python interface to three publicly available
NR binary black-hole (BBH) waveform catalogs, designed to serve a broad range of
gravitational-wave research:
| Catalog | Code | Naming convention |
|---|---|---|
| SXS | SpEC (Spectral Einstein Code) | SXS:BBH:0001, SXS:BBH:0002, … |
| RIT | LazEv | RIT:BBH:0001-n100-id3, RIT:eBBH:1843-n100-ecc, … |
| MAYA/GT | MayaKranc | GT0001, GT0355, … |
The package is designed to serve three overlapping communities:
LIGO-Virgo-KAGRA data analysis. Provides PyCBC-compatible waveform time series, physical
parameter dicts (get_parameters()), and consistent unit/epoch conventions across all backends,
so that injection studies, template banks, and parameter estimation workflows are not sensitive
to which NR catalog is used as the source.
Waveform modeling. Provides the data-loading, physical scaling, and frame-alignment
infrastructure needed for calibrating and validating EOB, phenomenological, and surrogate
models against NR. This includes Wigner D-matrix rotation, f_lower extraction at any epoch,
and apply_wigner_rotation_to_mode_dict() for rotating surrogate output into the NR frame.
Cross-catalog accuracy studies. Enables rigorous comparison of waveforms from independent
NR codes via noise-weighted mismatch minimization over source-frame ambiguities — rotations
nrcatalogtools/
├── __init__.py # Public exports: MayaCatalog, RITCatalog, SXSCatalog,
│ # WaveformModes, apply_wigner_rotation_to_mode_dict
├── catalog.py # Abstract base class CatalogABC + CatalogBase
├── rit.py # RITCatalog + RITCatalogHelper
├── sxs.py # SXSCatalog
├── maya.py # MayaCatalog
├── waveform/ # WaveformModes sub-package
│ ├── modes.py # WaveformModes class
│ ├── loaders.py # load_from_h5, load_from_targz
│ ├── matching.py # match_sphere_averaged, bms_maximized variant
│ └── units.py # waveform-level unit helpers
├── metadata.py # get_source_parameters_from_metadata()
├── lvc.py # Frame rotation helpers (check_interp_req,
│ # get_nr_to_lal_rotation_angles, get_ref_vals)
└── utils.py # Cache paths, download helpers, unit-conversion factors
CatalogABC (nrcatalogtools/catalog.py)
└── CatalogBase (nrcatalogtools/catalog.py)
├── RITCatalog (nrcatalogtools/rit.py)
├── SXSCatalog (nrcatalogtools/sxs.py)
└── MayaCatalog (nrcatalogtools/maya.py)
sxs.WaveformModes (from the sxs package)
└── WaveformModes (nrcatalogtools/waveform/modes.py)
CatalogBase inherits from CatalogABC and owns a plain dict simulation registry stored directly in self._simulations keyed by simulation name, removing the dependency on sxs.Catalog.
WaveformModes inherits from sxs.WaveformModes, which is itself an ndarray-like object storing complex mode data with shape (n_times, n_modes).
The shared interface that all three catalog classes implement. Key methods:
| Method | Description |
|---|---|
load(download=None, ...) |
Class method. Load catalog metadata (from cache or web). Returns catalog instance. |
get(sim_name, quantity='waveform') |
Load waveform or psi4 data for one simulation. Returns WaveformModes. |
get_metadata(sim_name) |
Return simulation metadata as a dict. |
get_parameters(sim_name, total_mass=1.0) |
Return PyCBC-compatible intrinsic parameter dict (masses, spins, f_lower). |
simulations_list |
List of all simulation names. |
simulations_dataframe |
Pandas DataFrame indexed by simulation name. |
waveform_filepath_from_simname(sim_name) |
Local cache path to the waveform HDF5 file. |
download_waveform_data(sim_name) |
Download waveform data to local cache. |
psi4_filepath_from_simname(sim_name) |
Local cache path to psi4 data. |
download_psi4_data(sim_name) |
Download psi4 data to local cache. |
The get() method in CatalogBase handles download-on-demand logic: it checks whether the local file exists, downloads if needed, then calls WaveformModes.load_from_h5(). SXSCatalog overrides get() entirely because SXS data access goes through the sxs package's own download and caching infrastructure, not through local HDF5 files.
Metadata source: Scraped from https://ccrgpages.rit.edu/~RITCatalog/Metadata/. Text files (.txt) with key = value lines. Cached as ~/.cache/RIT/metadata/metadata.csv.
Waveform files: HDF5 files, e.g. ExtrapStrain_RIT-BBH-0001-n100.h5, hosted at https://ccrgpages.rit.edu/~RITCatalog/Data/. The HDF5 format uses datasets named amp_l{ell}_m{em} and phase_l{ell}_m{em}, each with sub-datasets X (time) and Y (amplitude or phase).
Psi4 files: Tar-gzip archives, e.g. ExtrapPsi4_RIT-BBH-0001-n100-id3.tar.gz. Loaded via WaveformModes.load_from_targz().
Loading pattern:
ritcat = nrcat.RITCatalog.load(verbosity=0) # loads metadata.csv from cache
wfm = ritcat.get("RIT:BBH:0001-n100-id3", quantity="waveform")Key metadata column names (DataFrame columns use hyphens):
| Parameter | Column name |
|---|---|
| Mass ratio |
relaxed-mass-ratio-1-over-2 |
| Spin |
relaxed-chi1x |
| Spin |
relaxed-chi1y |
| Spin |
relaxed-chi1z |
| Spin |
relaxed-chi2x, relaxed-chi2y, relaxed-chi2z
|
| Eccentricity | eccentricity |
| Relaxation time (dimless M) | relaxed-time |
| Initial GW frequency (dimless) | freq-start-22 |
Note: The metadata.py parser normalises hyphenated keys to underscored keys (e.g. relaxed_mass_ratio_1_over_2) when building the PyCBC parameter dict.
Helper class: RITCatalogHelper handles all web-crawling, file naming, caching, and download logic. The naming conventions for RIT files are encoded in utils.rit_catalog_info.
Metadata source: Loaded via sxs.load("simulations", download=None) (sxs package ≥ 2024.0.0), which fetches the new simulations metadata from Zenodo/GitHub.
Waveform files: Managed entirely by the sxs package (Zenodo-backed). Accessed via sxs.load(sim_name, auto_supersede=True) which returns a Simulation object; .strain gives the sxs.WaveformModes.
Loading pattern:
sxscat = nrcat.SXSCatalog.load(download=False, verbosity=0)
wfm = sxscat.get("SXS:BBH:0001") # uses sxs.load() internallyKey metadata column names (underscores):
| Parameter | Column name |
|---|---|
| Mass ratio | reference_mass_ratio |
| Spin |
reference_dimensionless_spin1 (3-element list) |
| Spin |
reference_dimensionless_spin2 (3-element list) |
| Spin |
reference_chi1_mag |
| Spin |
reference_chi2_mag |
| Orbital frequency |
reference_orbital_frequency (3-vector) |
| Reference time (dimless) | reference_time |
| Relaxation time (dimless) | relaxation_time |
| Eccentricity | reference_eccentricity |
Important design note: _add_paths_to_metadata() in SXSCatalog uses lazy stub strings (empty strings) for all path columns (waveform_data_location, metadata_location, etc.) because resolving real paths requires calling sxs.load(sim_name) per simulation, which would trigger ~2000 downloads. Paths are resolved on demand via the sxs package when get() is called.
API difference: SXSCatalog.get() does not call WaveformModes.load_from_h5(); it calls sxs.load(sim_name, auto_supersede=True) to get the simulation object, then accesses .strain to get the sxs.WaveformModes, and wraps it into a nrcatalogtools.WaveformModes.
Metadata source: A pickle file (MAYAmetadata.pkl) downloaded from https://cgpstorage.ph.utexas.edu/. Cached locally at ~/.cache/MAYA/catalog.zip.
Waveform files: Loaded via the mayawaves package (maya_coalescence.Coalescence).
Loading pattern:
mayacat = nrcat.MayaCatalog.load(verbosity=0)
wfm = mayacat.get("GT0001", quantity="waveform")Key metadata column names:
| Parameter | Column name |
|---|---|
| Mass ratio |
q |
| Spin |
a1x, a1y, a1z
|
| Spin |
a2x, a2y, a2z
|
| Eccentricity | eccentricity |
| Orbital frequency |
Momega |
| GT simulation ID | GTID |
The central data object. Inherits from sxs.WaveformModes, which in turn is an ndarray subclass. The internal data array has shape (n_times, n_modes) with complex dtype.
Three construction paths:
-
From HDF5 (RIT/MAYA waveform files):
WaveformModes.load_from_h5(filepath, metadata=metadata_dict)
Reads
amp_l{ell}_m{em}+phase_l{ell}_m{em}datasets, interpolates all modes onto a common uniform time grid, builds complex data array. -
From tar.gz (RIT psi4 files):
WaveformModes.load_from_targz(filepath, metadata=metadata_dict)
Reads ASCII
.asc/.dat/.txtfiles inside the archive, each containing(time, real, imag)columns. -
From sxs.WaveformModes (SXS catalog):
WaveformModes(raw_obj.data, raw_obj.time, sim_metadata=..., **meta)
Wraps the sxs native object directly; used in
SXSCatalog.get().
| Property | Description |
|---|---|
time |
1-D array of dimensionless NR times (in units of |
data |
Complex 2-D array, shape (n_times, n_modes)
|
LM |
List of [ell, m] pairs for each column in data
|
ell_min, ell_max
|
Minimum and maximum |
filepath |
Path to the source data file |
sim_metadata |
The raw metadata dict (catalog-specific keys) |
metadata |
Alias for sim_metadata
|
peak_time_22 |
Dimensionless time of peak ` |
label |
LaTeX string summarizing |
label_nolatex |
Plain-text version of the same label |
Returns the raw NR data for mode (ell, em) as a 2-column real array [:, 0] = real part, [:, 1] = imaginary part, in dimensionless units. Uses sxs.WaveformModes.index(ell, em) to locate the column.
The primary method for retrieving a physically-scaled mode. Returns a PyCBC TimeSeries (complex) with:
-
Amplitude scaled by
$G M_\mathrm{tot} / (c^2 \cdot d)$ where$d$ is in Mpc -
Time resampled to
delta_t(see convention below) -
Epoch set so that
$t = 0$ coincides with the peak of the$(2,2)$ mode
delta_t convention:
-
delta_t > 1/128→ interpreted as dimensionless M units (NR native; e.g.0.5means$0.5,GM/c^3$ ) -
delta_t ≤ 1/128→ interpreted as physical seconds (e.g.1/4096for detector-band data)
The returned TimeSeries.delta_t is always in physical seconds.
Returns the instantaneous GW frequency of the (2,2) mode, in Hz, scaled to a 1 M☉ system. The argument t must be in dimensionless M units (same grid as self.time). To get physical Hz:
f_hz = wfm.f_lower_at_1Msun(t=t_dimless) / total_mass_msunIf t=None, returns the frequency at the start of the waveform.
Returns the sky-averaged strain TimeSeries; .real() gives .imag() gives
Polarization convention: The function returns conjugate(h) to align with LAL conventions, meaning real() = h_+ and imag() = +h_\times (note: LAL defines imag() = -h_\times; the conjugation effectively flips the sign).
Returns a slice of the waveform starting from the relaxation epoch. Reads t_relax from metadata (tries keys 'relaxed-time', 'relaxation_time', 'reference_time').
Returns the GW frequency (Hz) at the relaxation time for the given total mass.
Compute the sky-averaged match between two WaveformModes objects. Optimizes over time shift, coalescence phase, and source-frame rotation
Extended version of the above that additionally optimizes over BMS supertranslations l_max_alpha. Uses spin-weighted Gaunt coefficients from the scri package to compute mode mixing.
WaveformModes.rotated(R) — applies a Wigner D-matrix rotation. The standalone function apply_wigner_rotation_to_mode_dict(mode_dict, R, ell_max) applies the same transformation to a dict of PyCBC TimeSeries (as returned by gwsurrogate or pycbc.waveform.get_td_waveform_modes).
All internally stored NR data is in geometrized, mass-scaled dimensionless units:
| Quantity | Dimensionless unit |
|---|---|
| Time | |
| Amplitude ( |
|
Conversion factors (from utils.py):
# Physical time step for 1 sample at NR time step of 1 M
m_secs = total_mass * lal.MTSUN_SI # utils.time_to_physical(total_mass)
# Strain amplitude scaling factor
amp_scale = lal.G_SI * total_mass * lal.MSUN_SI / (lal.C_SI**2 * distance_mpc * 1e6 * lal.PC_SI)
# = utils.amp_to_physical(total_mass, distance_mpc)Cross-catalog validation (q=1, chi=0, M=60 M☉, d=100 Mpc):
All three catalogs agree to within ~0.75% on the peak amplitude of the
get_source_parameters_from_metadata(metadata, total_mass) converts catalog-specific metadata into a PyCBC-compatible parameter dict. The catalog is identified by the explicit catalog_type metadata key ("RIT", "MAYA", or "SXS"), which is injected dynamically during load:
catalog_type value |
Catalog | Description |
|---|---|---|
"RIT" |
RIT | reads relaxed-mass-ratio-1-over-2, relaxed-chi1x, …, freq-start-22 |
"MAYA" |
MAYA/GT | reads q, a1x/a1y/a1z, a2x/a2y/a2z, Momega |
"SXS" |
SXS | reads reference_mass_ratio, reference_dimensionless_spin1/2, reference_orbital_frequency |
Spin key output: All three paths write spin components as spin1x, spin1y, spin1z, spin2x, spin2y, spin2z — compatible with PyCBC's get_td_waveform_modes().
Initial frequency: Converted to physical Hz via:
f_lower [Hz] = f_dimless / (total_mass [M☉] × lal.MTSUN_SI)
For SXS, the magnitude of reference_orbital_frequency (a 3-vector) is used as
Controlled by the NR_CATALOG_CACHE environment variable (defaults to ~/.cache/):
~/.cache/
├── RIT/
│ ├── metadata/
│ │ ├── metadata.csv # aggregated DataFrame of all RIT sim metadata
│ │ └── RIT:BBH:0001-n100-id3_Metadata.txt # individual files
│ └── data/
│ ├── ExtrapStrain_RIT-BBH-0001-n100.h5
│ └── ExtrapPsi4_RIT-BBH-0001-n100-id3.tar.gz
├── MAYA/
│ ├── metadata/
│ └── data/
│ └── catalog.zip # zipped MAYAmetadata.pkl
└── SXS/
└── (managed entirely by the sxs package; typically ~/.cache/sxs/)
| Package | Role |
|---|---|
sxs (≥ 2024.0.0) |
SXS simulations access, base class (sxs.WaveformModes) |
mayawaves |
MAYA coalescence loading |
pycbc |
TimeSeries, match(), get_td_waveform_modes(), pnutils
|
lal / lalsimulation
|
Physical constants (MTSUN_SI, MSUN_SI, G_SI, C_SI, PC_SI) |
h5py |
HDF5 file reading (RIT waveform files) |
quaternionic |
Quaternion representation of |
spherical |
Wigner D-matrix computation |
scipy |
InterpolatedUnivariateSpline for mode resampling |
scri |
Spin-weighted Gaunt coefficients (optional; needed for BMS optimization) |
gwsurrogate |
Surrogate model evaluation (used in analysis scripts, not the package itself) |
import nrcatalogtools as nrcat
ritcat = nrcat.RITCatalog.load(verbosity=0)
sxscat = nrcat.SXSCatalog.load(download=False, verbosity=0)
mayacat = nrcat.MayaCatalog.load(verbosity=0)df_rit = ritcat.simulations_dataframe
df_sxs = sxscat.simulations_dataframe
df_maya = mayacat.simulations_dataframe
# RIT (hyphenated column names)
rit_q1 = df_rit[(df_rit["relaxed-mass-ratio-1-over-2"].astype(float) - 1.0).abs() < 0.02].index.tolist()
# SXS (underscore column names)
sxs_q1 = df_sxs[(df_sxs["reference_mass_ratio"].astype(float) - 1.0).abs() < 0.02].index.tolist()
# MAYA
maya_q1 = df_maya[(df_maya["q"].astype(float) - 1.0).abs() < 0.02].index.tolist()TOTAL_MASS = 60.0 # M_sun
DISTANCE = 100.0 # Mpc
DELTA_T = 1.0 / 4096 # seconds
# RIT / MAYA
wfm = ritcat.get("RIT:BBH:0001-n100-id3", quantity="waveform")
mode22 = wfm.get_mode(2, 2, total_mass=TOTAL_MASS, distance=DISTANCE, delta_t_seconds=DELTA_T)
# mode22 is a complex PyCBC TimeSeries; epoch set so peak is at t=0
# mode22.real() ≈ h_plus; mode22.imag() ≈ h_cross
# SXS (uses sxs.load internally)
wfm_sxs = sxscat.get("SXS:BBH:0001")
mode22_sxs = wfm_sxs.get_mode(2, 2, total_mass=TOTAL_MASS, distance=DISTANCE, delta_t=DELTA_T)params = ritcat.get_parameters("RIT:BBH:0001-n100-id3", total_mass=60.0)
# Returns: {'mass1': 30.0, 'mass2': 30.0, 'spin1x': 0.0, ..., 'f_lower': 23.4}
# Compatible with pycbc.waveform.get_td_waveform_modes(**params)# Frequency at the start of the NR data
f_start = wfm.f_lower_at_1Msun() / TOTAL_MASS # Hz
# Frequency at the relaxation time
t_relax_dimless = wfm.time[0] + metadata["relaxed-time"] # both in dimensionless M
f_relax = wfm.f_lower_at_1Msun(t=t_relax_dimless) / TOTAL_MASS # HzSXSCatalog._add_paths_to_metadata() sets all path columns to empty strings at catalog-load time, because resolving real on-disk paths requires calling sxs.load(sim_name) for every simulation which would trigger ~2000 network requests. Actual file access happens inside SXSCatalog.get() through the sxs package's own caching.
_filepath is extracted from w_attributes inside __new__ before passing to the parent sxs.WaveformModes constructor. This prevents class-level attribute sharing where loading a second simulation would silently overwrite the first object's _filepath.
sxs.WaveformModes.data may return a memoryview rather than a writable numpy array. All arithmetic operations (especially in-place *=) must wrap the result with np.array(..., dtype=complex) first. The relevant locations are: get_mode() (mode_data, h_mode_complex) and peak_time_22 (mode22_data).
The package requires sxs ≥ 2024.0.0. The older API sxs.load("SXS:BBH:0001/rhOverM") with catalog metadata from data.black-holes.org is no longer functional. The current API is:
sim = sxs.load("SXS:BBH:0001", auto_supersede=True)
strain = sim.strain # sxs.WaveformModesRIT metadata uses hyphens in the scraped registry (relaxed-chi1z). When calling get_source_parameters_from_metadata(), these are accessed directly using hyphens, consistent with the scraped format. MAYA uses short names (a1x, q). SXS uses long underscored names (reference_dimensionless_spin1).
The function signature is check_interp_req(h5_file=None, metadata=None, ref_time=None, ...). Call sites must pass ref_time as a keyword argument:
interp, avail_ref_time = check_interp_req(h5_file, ref_time=t_ref)Passing t_ref as a positional second argument binds it to metadata, causing the interpolation check to never fire.
get_td_waveform() returns conjugate(h) so that .real() gives .imag() gives