diff --git a/examples/ci_project-flagship.yaml b/examples/ci_project-flagship.yaml new file mode 100644 index 0000000..e557dc7 --- /dev/null +++ b/examples/ci_project-flagship.yaml @@ -0,0 +1,52 @@ +Project: + + Name: ci_test_flagship + + # Include other configuration files + Includes: + - tests/ci_project_library.yaml + + PathTemplates: {} + + CommonPaths: + root: /pscratch/sd/q/qhang/ + scratch_root: "{root}" + catalogs_dir: "{root}/Flagship" + project: ci_test_flagship + sim_version: v1 + + # Baseline configuraiton, included in others by default + Baseline: + catalog_tag: flagship + pipelines: ['all'] + file_aliases: # Set the training and test files + test: test_file_100k + train: train_file_100k + train_zCOSMOS: train_file_zCOSMOS_100k + wide: wide_file_full + deep: deep_file_full + spec: spec_file_full + + # These define the variant configurations for the various parts of the analysis + Flavors: + - Flavor: + name: train_cosmos + pipelines: ['pz', 'tomography'] + file_aliases: # Set the training and test files + test: test_file_100k + train: train_file_zCOSMOS_100k + - Flavor: + name: gpz_gl + pipelines: ['pz'] # only run the pz pipeline + pipeline_overrides: # Override specifics for particular pipelines + default: + kwargs: + algorithms: ['gpz'] # Only run gpz + inform: + inform_gpz: + gpz_method: GL + + # These are variables that we iterate over when running over entire catalogs + IterationVars: + healpix: + - 1 diff --git a/examples/rail_project_flagship_reducer_example.ipynb b/examples/rail_project_flagship_reducer_example.ipynb new file mode 100644 index 0000000..32448d5 --- /dev/null +++ b/examples/rail_project_flagship_reducer_example.ipynb @@ -0,0 +1,177 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "4227b7d4-a861-4839-b4b7-52a37534a283", + "metadata": {}, + "source": [ + "# Demo for reducing the Euclid flagship simulations:\n", + "\n", + "The full simulations are available at https://cosmohub.pic.es/catalogs/353. If you use the catalog for any publication, please use the acknowledgements listed on this website.\n", + "\n", + "Notice that the catalog inherently has a cut for the Euclid h band at ${\\tt euclid\\_nisp\\_h} < 26.6$. The full catalog covers 1 octant of the sky but the example file only contain objects on 1 pixel (out of a total of 256), and subsampled to be 1% of the original size. Notice also there is a small region (150 < RA < 155 deg., 5 < DEC < 10 deg.) where there is no magnitude or flux cut, which has significantly higher number density. This small region *is* part of the example file." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c2d0ca66-8c50-410d-9a35-f844f6b518aa", + "metadata": {}, + "outputs": [], + "source": [ + "# import relevant packages\n", + "import os\n", + "from rail.projects import library\n", + "from rail.projects import RailProject\n", + "import pandas as pd\n", + "\n", + "check_dir = os.path.basename(os.path.abspath(os.curdir))\n", + "if check_dir == 'examples':\n", + " os.chdir('..')" + ] + }, + { + "cell_type": "markdown", + "id": "193228cc-9394-4807-b86d-34d54e2dab97", + "metadata": {}, + "source": [ + "We load the project configuration for flagship below. This config file is very similar to the RomanRubin example, but you notice a few difference. \n", + "Firstly, I changed the `CommonPaths` to the path to the flagship example catalog, which currently sits on my scratch. The subsequent file structure is similar to the RomanRubin case, so we can use the same library file, `ci_project_libraray.yaml`. I changed the `catalog_tag` under `Baseline` to `flagship` as well. Finally, I changed `IterationVars` to match the iteration pattern, in this case I only have one pixel, `1`. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a0302bd0-b62e-484d-8bd0-6fcdfa39e494", + "metadata": {}, + "outputs": [], + "source": [ + "# load the project configuration:\n", + "project_flagship = RailProject.load_config(\"examples/ci_project-flagship.yaml\")" + ] + }, + { + "cell_type": "markdown", + "id": "bba045f2-c38b-4f2d-b8dc-ac11ee460b9d", + "metadata": {}, + "source": [ + "We can check if we've found the catalog we specified:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5b63ac86-0177-4226-8abf-77bd5b1eac63", + "metadata": {}, + "outputs": [], + "source": [ + "catalog_files_truth = project_flagship.get_catalog_files(\"truth\")\n", + "print(catalog_files_truth)" + ] + }, + { + "cell_type": "markdown", + "id": "8f3768c3-b700-4617-b2e4-9944c806bbef", + "metadata": {}, + "source": [ + "Yes, this is the correct directory. Now we can reduce the raw catalog to the correct format we want.\n", + "To do this, we simply changed the `reducer_class_name` to `flagship`. This is defined in the library file, and calls the `FlagshipReducer`. This reducer does the following things:\n", + "- Convert all fluxes (here, we use columns `[fluxtype]_el_model3_ext` which includes fluxes from both the continuum and the emission lines from the galaxy, and `[fluxtype]` denotes LSST $ugrizy$ and Euclid nisp y,j,h and Euclid vis) into magnitudes, and rename the columns to standard values.\n", + "- Compute the semi-major and minor axes of the galaxy in arcsec and save in the `major` and `minor` columns. Some intermediate quantites such as orientation angles are also saved.\n", + "- Apply a `gold` magnitude cut, also defined in the library, which corresponds to $i<25.5$. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1d64106c-eedb-4607-a4af-b34e4a0d18c3", + "metadata": {}, + "outputs": [], + "source": [ + "project_flagship.reduce_data(\n", + " catalog_template=\"truth\",\n", + " output_catalog_template=\"reduced\",\n", + " reducer_class_name=\"flagship\",\n", + " input_selection=\"\",\n", + " selection=\"gold\",\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "c90dcf8d-8049-4f2a-b9a1-c6d7ac671236", + "metadata": {}, + "source": [ + "You can check the input and output catalogs, to see if the reducer has worked:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8250a69d-dab5-4b8c-8e31-3b2d84783ead", + "metadata": {}, + "outputs": [], + "source": [ + "f_truth=pd.read_parquet(\"/pscratch/sd/q/qhang//Flagship/ci_test_flagship_v1/1/part-0.pq\")\n", + "f_reduce=pd.read_parquet(\"/pscratch/sd/q/qhang//Flagship/ci_test_flagship_v1_gold/1/part-0.pq\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "57a5615a-f388-4cd9-a607-abad6c9743dc", + "metadata": {}, + "outputs": [], + "source": [ + "f_truth" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "aa68bc7b-7f0f-4349-91c4-387730cb3e09", + "metadata": {}, + "outputs": [], + "source": [ + "f_reduce" + ] + }, + { + "cell_type": "markdown", + "id": "181c15af-1ce0-4ac0-99c5-1f2b9011fe2b", + "metadata": {}, + "source": [ + "From here on you can follow examples in `rail_project_example.ipynb` to run the rest of the pipeline." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3bacb742-35f5-4589-929f-e69450f290ed", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "rail_dev", + "language": "python", + "name": "rail_dev" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.9" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/src/rail/projects/reducer.py b/src/rail/projects/reducer.py index 982174c..fb2cd19 100644 --- a/src/rail/projects/reducer.py +++ b/src/rail/projects/reducer.py @@ -63,6 +63,29 @@ "y_cModelFluxErr", ] +COLUMNS_FLAGSHIP = [ + "galaxy_id", + "ra_mag_gal", # observed galaxy ra/dec with lensing displacement field applied [degrees] + "dec_mag_gal", + "lsst_u_el_model3_ext", # observed flux from the continuum + emission including internal attenuation in LSST bands + "lsst_g_el_model3_ext", + "lsst_r_el_model3_ext", + "lsst_i_el_model3_ext", + "lsst_z_el_model3_ext", + "lsst_y_el_model3_ext", + "euclid_nisp_h_el_model3_ext", # euclid bands (noiseless) + "euclid_nisp_j_el_model3_ext", + "euclid_nisp_y_el_model3_ext", + "euclid_vis_el_model3_ext", + "bulge_r50", # half light radius of the bulge [arcsec] + "disk_r50", # half light radius of the disk for an exponential profile (or Sersic profile with index n=1); disk_r50 = disk_scalelength * 1.678 [arcsec] + "bulge_fraction", # ratio of the flux in the bulge component to the total flux (often written B/T) + "gamma1", # shape contribution from lensing, not large but added for consistency + "gamma2", + "eps1_gal", # intrinsic galaxy ellipticity component + "eps2_gal", +] + PROJECTIONS_COM_CAM = [ { "ref_flux": pc.field("i_cModelFlux"), @@ -122,6 +145,83 @@ }, ] +PROJECTIONS_FLAGSHIP = [ + { + "mag_u_lsst": pc.subtract( + pc.multiply(pc.scalar(-2.5), pc.log10(pc.field("lsst_u_el_model3_ext"))), pc.scalar(48.6) + ), + "mag_g_lsst": pc.subtract( + pc.multiply(pc.scalar(-2.5), pc.log10(pc.field("lsst_g_el_model3_ext"))), pc.scalar(48.6) + ), + "mag_r_lsst": pc.subtract( + pc.multiply(pc.scalar(-2.5), pc.log10(pc.field("lsst_r_el_model3_ext"))), pc.scalar(48.6) + ), + "mag_i_lsst": pc.subtract( + pc.multiply(pc.scalar(-2.5), pc.log10(pc.field("lsst_i_el_model3_ext"))), pc.scalar(48.6) + ), + "mag_z_lsst": pc.subtract( + pc.multiply(pc.scalar(-2.5), pc.log10(pc.field("lsst_z_el_model3_ext"))), pc.scalar(48.6) + ), + "mag_y_lsst": pc.subtract( + pc.multiply(pc.scalar(-2.5), pc.log10(pc.field("lsst_y_el_model3_ext"))), pc.scalar(48.6) + ), + "mag_h_euclid_nisp": pc.subtract( + pc.multiply(pc.scalar(-2.5), pc.log10(pc.field("euclid_nisp_h_el_model3_ext"))), pc.scalar(48.6) + ), + "mag_j_euclid_nisp": pc.subtract( + pc.multiply(pc.scalar(-2.5), pc.log10(pc.field("euclid_nisp_j_el_model3_ext"))), pc.scalar(48.6) + ), + "mag_y_euclid_nisp": pc.subtract( + pc.multiply(pc.scalar(-2.5), pc.log10(pc.field("euclid_nisp_y_el_model3_ext"))), pc.scalar(48.6) + ), + "mag_vis_euclid": pc.subtract( + pc.multiply(pc.scalar(-2.5), pc.log10(pc.field("euclid_vis_el_model3_ext"))), pc.scalar(48.6) + ), + "totalHalfLightRadiusArcsec": pc.add( + pc.multiply( + pc.field("disk_r50"), + pc.subtract(pc.scalar(1), pc.field("bulge_fraction")), + ), + pc.multiply( + pc.field("bulge_r50"), + pc.field("bulge_fraction"), + ), + ), + "_orientationAngle": pc.atan2( + pc.add(pc.field("eps2_gal"), pc.field("gamma2")), + pc.add(pc.field("eps1_gal"), pc.field("gamma1")) + ), + }, + { + "major": pc.divide( + pc.field("totalHalfLightRadiusArcsec"), + pc.sqrt( + pc.sqrt(pc.add(pc.power(pc.add(pc.field("eps1_gal"), pc.field("gamma1")), 2), + pc.power(pc.add(pc.field("eps2_gal"), pc.field("gamma2")), 2))) + ), + ), + "minor": pc.multiply( + pc.field("totalHalfLightRadiusArcsec"), + pc.sqrt( + pc.sqrt(pc.add(pc.power(pc.add(pc.field("eps1_gal"), pc.field("gamma1")), 2), + pc.power(pc.add(pc.field("eps2_gal"), pc.field("gamma2")), 2))) + ), + ), + "orientationAngle": pc.multiply( + pc.scalar(0.5), + pc.subtract( + pc.field("_orientationAngle"), + pc.multiply( + pc.floor( + pc.divide(pc.field("_orientationAngle"), pc.scalar(2 * math.pi)) + ), + pc.scalar(2 * math.pi), + ), + ), + ), + }, +] + class RailReducer(Configurable, DynamicClass): """Base class for reducing data catalogs @@ -250,6 +350,86 @@ def run( pq.write_table(table, output_catalog) +class FlagshipReducer(RailReducer): + """Class to reduce the 'flagship' simulation input files for pz analysis""" + + config_options: dict[str, StageParameter] = dict( + name=StageParameter(str, None, fmt="%s", required=True, msg="Reducer Name"), + cuts=StageParameter(dict, {}, fmt="%s", msg="Selections"), + ) + + def run( + self, + input_catalog: str, + output_catalog: str, + ) -> None: + # Try to do this right + try: + parsed_filter = parse_item(self.config.cuts) + predicate = pq.filters_to_expression(parsed_filter) + except Exception as msg: + # Fallback to old way. FIXME, deprecate this + if self.config.cuts: + if "maglim_i" in self.config.cuts: + predicate = pc.subtract(pc.multiply(pc.scalar(-2.5), pc.log10(pc.field("lsst_i_el_model3_ext"))), pc.scalar(48.6)) < self.config.cuts["maglim_i"][1] + else: + raise ValueError("No valid cut") from msg + else: # pragma: no cover + predicate = None + + dataset = ds.dataset( + input_catalog, + format="parquet", + ) + + scan_node = acero.Declaration( + "scan", + acero.ScanNodeOptions( + dataset, + columns=COLUMNS_FLAGSHIP, + filter=predicate, + ), + ) + + filter_node = acero.Declaration( + "filter", + acero.FilterNodeOptions( + predicate, + ), + ) + + column_projection = {k: pc.field(k) for k in COLUMNS_FLAGSHIP} + projection = column_projection + project_nodes = [] + for _projection in PROJECTIONS_FLAGSHIP: + for k, v in _projection.items(): + projection[k] = v + project_node = acero.Declaration( + "project", + acero.ProjectNodeOptions( + [v for k, v in projection.items()], + names=[k for k, v in projection.items()], + ), + ) + project_nodes.append(project_node) + + seq = [ + scan_node, + filter_node, + *project_nodes, + ] + plan = acero.Declaration.from_sequence(seq) + + # batches = plan.to_reader(use_threads=True) + table = plan.to_table(use_threads=True) + print(f"writing dataset to {output_catalog}") + + output_dir = os.path.dirname(output_catalog) + + os.makedirs(output_dir, exist_ok=True) + pq.write_table(table, output_catalog) + + class ComCamReducer(RailReducer): """Class to reduce the 'com_cam' input files for pz analysis""" diff --git a/tests/ci_algorithms.yaml b/tests/ci_algorithms.yaml index a4272ea..4cac99f 100644 --- a/tests/ci_algorithms.yaml +++ b/tests/ci_algorithms.yaml @@ -115,3 +115,8 @@ Reducers: name: roman_rubin Reduce: RomanRubinReducer Module: rail.projects.reducer + + - Reducer: + name: flagship + Reduce: FlagshipReducer + Module: rail.projects.reducer diff --git a/tests/ci_project_library.yaml b/tests/ci_project_library.yaml index 1898fe3..af832a2 100644 --- a/tests/ci_project_library.yaml +++ b/tests/ci_project_library.yaml @@ -192,6 +192,11 @@ Reducers: Reduce: RomanRubinReducer Module: rail.projects.reducer + - Reducer: + name: flagship + Reduce: FlagshipReducer + Module: rail.projects.reducer + # These describe the various data analysis pipelines Pipelines: