Skip to content

deeppatel710/rubin-variability

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

rubin-variability

Photometric variability feature extraction pipeline for Vera Rubin / LSST research, using ZTF data (via ALeRCE) as a public proxy.

Science motivation

The Vera Rubin Observatory's LSST will observe ~37 billion objects over 10 years, generating 10 million nightly alerts. ZTF is the best available proxy: same g/r photometric bands, similar cadence (3-night revisit), and the same sky. Pipelines built on ZTF transfer directly to Rubin DP1 data with a one-line change in the ingest layer.

Headline feature: inter-band lag

Most variability pipelines treat g and r bands independently. This pipeline introduces inter-band lag — the time delay (in days) between g and r band variability peaks, measured via cross-correlation.

Physical interpretation:

  • g leads r by 1–10 days → AGN accretion disk reverberation. The disk is hotter (bluer) in the centre; a driving fluctuation propagates outward to cooler (redder) radii with a light-travel-time delay proportional to disk size.
  • lag ≈ 0 → stellar flares, where g and r brighten simultaneously.
  • r leads g (negative lag) → dust reverberation echoes, where the dust re-radiates at longer wavelengths with a delay.

At survey scale, this feature has been largely unexplored. The combination of inter_band_lag with color_slope (bluer-when-brighter = AGN-like) cleanly separates physical variability mechanisms.

Features extracted per object

Feature Physical meaning
amplitude_g/r Total flux swing — AGN/SNe large, RRL moderate, noise small
stetson_j Correlated cross-band variability — high for real astrophysical events
stetson_k_g/r Shape of variability distribution — Gaussian vs. burst-like
von_neumann_g/r Temporal autocorrelation — low = periodic/smooth, high = stochastic
ls_period_g/r Best Lomb-Scargle period — useful for RRL, Cepheids, LPV
ls_fap_g/r False alarm probability — quality flag for periodicity
sf_slope_g/r Structure function slope — AGN ≈ 0.3, RRL steep, random walk = 1.0
color_slope Pearson r(color vs brightness) — negative = bluer-when-brighter (AGN)
inter_band_lag g−r cross-correlation lag in days — accretion disk size proxy
ccf_peak CCF peak quality (0–1) — use as confidence weight on lag

Pipeline architecture

ALeRCE API (ZTF)
     │
  ingest.py          ← fetch light curves, ~400 objects × 4 classes
     │
 features.py         ← extract 30 features per object including inter_band_lag
     │
  cluster.py         ← UMAP (2D) + HDBSCAN, generate 3 diagnostic plots
     │
 pipeline.py         ← orchestrate, save features.parquet + top_candidates.csv

Outputs

File Description
outputs/features.parquet Full feature catalog (one row per object)
outputs/top_candidates.csv Top 5 objects per cluster with |lag| high + ccf_peak > 0.5
outputs/cluster_plot.png UMAP embedding coloured by cluster, shaped by ALeRCE class
outputs/lag_histogram.png Inter-band lag distribution per cluster
outputs/color_lag_scatter.png color_slope vs inter_band_lag (the key diagnostic plot)
outputs/summary.txt Plain-text run summary

Usage

pip install -r requirements.txt
python scripts/run_pipeline.py

Adapting to Rubin DP1 via the Butler

Replace ingest.py's load_dataset() with:

from lsst.daf.butler import Butler
import pandas as pd

def load_dataset_rubin(repo_path, collection):
    butler = Butler(repo_path, collections=collection)
    dataset = {}
    for ref in butler.registry.queryDatasets("source", ..., instrument="LSSTCam"):
        src = butler.get(ref)
        for obj_id in src["objectId"].unique():
            obj = src[src["objectId"] == obj_id]
            g = obj[obj["band"] == "g"][["midpointMjdTai", "psfMag", "psfMagErr"]]
            r = obj[obj["band"] == "r"][["midpointMjdTai", "psfMag", "psfMagErr"]]
            g.columns = ["mjd", "mag", "magerr"]
            r.columns = ["mjd", "mag", "magerr"]
            dataset[str(obj_id)] = {"g": g, "r": r, "class": "unknown"}
    return dataset

Everything downstream (features, clustering, plots) is unchanged.

Suggested citations

Next steps

  1. Apply to Rubin DP1 AGN candidates (available via RSP)
  2. Use top_candidates.csv to prioritise reverberation mapping follow-up
  3. Extend lag measurement to all band pairs (u, g, r, i, z, y) once Rubin data is available
  4. Train a supervised classifier on the feature set using ALeRCE labels as ground truth

About

Survey-scale photometric variability pipeline for ZTF/LSST with inter-band lag as a classification feature

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors