rubin-variability

Photometric variability feature extraction pipeline for Vera Rubin / LSST research, using ZTF data (via ALeRCE) as a public proxy.

Science motivation

The Vera Rubin Observatory's LSST will observe ~37 billion objects over 10 years, generating 10 million nightly alerts. ZTF is the best available proxy: same g/r photometric bands, similar cadence (3-night revisit), and the same sky. Pipelines built on ZTF transfer directly to Rubin DP1 data with a one-line change in the ingest layer.

Headline feature: inter-band lag

Most variability pipelines treat g and r bands independently. This pipeline introduces inter-band lag — the time delay (in days) between g and r band variability peaks, measured via cross-correlation.

Physical interpretation:

g leads r by 1–10 days → AGN accretion disk reverberation. The disk is hotter (bluer) in the centre; a driving fluctuation propagates outward to cooler (redder) radii with a light-travel-time delay proportional to disk size.
lag ≈ 0 → stellar flares, where g and r brighten simultaneously.
r leads g (negative lag) → dust reverberation echoes, where the dust re-radiates at longer wavelengths with a delay.

At survey scale, this feature has been largely unexplored. The combination of inter_band_lag with color_slope (bluer-when-brighter = AGN-like) cleanly separates physical variability mechanisms.

Features extracted per object

Feature	Physical meaning
`amplitude_g/r`	Total flux swing — AGN/SNe large, RRL moderate, noise small
`stetson_j`	Correlated cross-band variability — high for real astrophysical events
`stetson_k_g/r`	Shape of variability distribution — Gaussian vs. burst-like
`von_neumann_g/r`	Temporal autocorrelation — low = periodic/smooth, high = stochastic
`ls_period_g/r`	Best Lomb-Scargle period — useful for RRL, Cepheids, LPV
`ls_fap_g/r`	False alarm probability — quality flag for periodicity
`sf_slope_g/r`	Structure function slope — AGN ≈ 0.3, RRL steep, random walk = 1.0
`color_slope`	Pearson r(color vs brightness) — negative = bluer-when-brighter (AGN)
`inter_band_lag`	g−r cross-correlation lag in days — accretion disk size proxy
`ccf_peak`	CCF peak quality (0–1) — use as confidence weight on lag

Pipeline architecture

ALeRCE API (ZTF)
     │
  ingest.py          ← fetch light curves, ~400 objects × 4 classes
     │
 features.py         ← extract 30 features per object including inter_band_lag
     │
  cluster.py         ← UMAP (2D) + HDBSCAN, generate 3 diagnostic plots
     │
 pipeline.py         ← orchestrate, save features.parquet + top_candidates.csv

Outputs

File	Description
`outputs/features.parquet`	Full feature catalog (one row per object)
`outputs/top_candidates.csv`	Top 5 objects per cluster with \|lag\| high + ccf_peak > 0.5
`outputs/cluster_plot.png`	UMAP embedding coloured by cluster, shaped by ALeRCE class
`outputs/lag_histogram.png`	Inter-band lag distribution per cluster
`outputs/color_lag_scatter.png`	color_slope vs inter_band_lag (the key diagnostic plot)
`outputs/summary.txt`	Plain-text run summary

Usage

pip install -r requirements.txt
python scripts/run_pipeline.py

Adapting to Rubin DP1 via the Butler

Replace ingest.py's load_dataset() with:

from lsst.daf.butler import Butler
import pandas as pd

def load_dataset_rubin(repo_path, collection):
    butler = Butler(repo_path, collections=collection)
    dataset = {}
    for ref in butler.registry.queryDatasets("source", ..., instrument="LSSTCam"):
        src = butler.get(ref)
        for obj_id in src["objectId"].unique():
            obj = src[src["objectId"] == obj_id]
            g = obj[obj["band"] == "g"][["midpointMjdTai", "psfMag", "psfMagErr"]]
            r = obj[obj["band"] == "r"][["midpointMjdTai", "psfMag", "psfMagErr"]]
            g.columns = ["mjd", "mag", "magerr"]
            r.columns = ["mjd", "mag", "magerr"]
            dataset[str(obj_id)] = {"g": g, "r": r, "class": "unknown"}
    return dataset

Everything downstream (features, clustering, plots) is unchanged.

Suggested citations

Förster et al. 2021 — ALeRCE: Alert broker for ZTF (arXiv:2008.03311)
Bellm et al. 2019 — ZTF survey design (PASP 131, 018002)
McInnes et al. 2018 — UMAP (arXiv:1802.03426)
Campello et al. 2013 — HDBSCAN (ECML/PKDD 2013)
Stetson 1996 — Variability indices J, K (PASP 108, 851)

Next steps

Apply to Rubin DP1 AGN candidates (available via RSP)
Use top_candidates.csv to prioritise reverberation mapping follow-up
Extend lag measurement to all band pairs (u, g, r, i, z, y) once Rubin data is available
Train a supervised classifier on the feature set using ALeRCE labels as ground truth

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
figures		figures
rubin_variability		rubin_variability
scripts		scripts
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
paper.bib		paper.bib
paper.md		paper.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

rubin-variability

Science motivation

Headline feature: inter-band lag

Features extracted per object

Pipeline architecture

Outputs

Usage

Adapting to Rubin DP1 via the Butler

Suggested citations

Next steps

About

Releases

Packages

Contributors

Languages

Folders and files

Latest commit

History

Repository files navigation

rubin-variability

Science motivation

Headline feature: inter-band lag

Features extracted per object

Pipeline architecture

Outputs

Usage

Adapting to Rubin DP1 via the Butler

Suggested citations

Next steps

About

Resources

Stars

Watchers

Forks

Releases

Packages

Contributors

Languages