Skip to content

add CalAdapt WRF download function for CMIP6 hourly met data#3

Closed
divine7022 wants to merge 19 commits into
developfrom
caladapt-wrf
Closed

add CalAdapt WRF download function for CMIP6 hourly met data#3
divine7022 wants to merge 19 commits into
developfrom
caladapt-wrf

Conversation

@divine7022
Copy link
Copy Markdown
Owner

@divine7022 divine7022 commented Apr 16, 2026

Description

adds download.CalAdaptWRF() to PEcAn.data.atmosphere new met driver that pulls hourly WRF dynamically downscaled CMIP6 projections from the Cal-Adapt Analytics Engine (WUS-D3 dataset, Rahimi et al. 2024). Data sits on public AWS S3, no auth needed

this is implemented as a part of CCMMF where we need future climate forcing at ~200 California sites for SIPNET runs under multiple GCMs and SSPs.

  • looked at how pecan handles met downloads and this follows the same pattern as CRUNCEP/GFDL, the download function does everything (fetch, extract, convert, write CF) in one shot, so we skip the met2CF and extract.nc stages. Main reason: WRF uses a Lambert Conformal grid and extract.nc / closest_xy assume lat-lon grids with NARR-style bounds, so they can't handle this projection. Adding CalAdaptWRF to skip from list in met.process.R was the cleanest path.

  • the tricky part is that pecan's papply calls download.CalAdaptWRF once per site, it never sees the full site list. Naively that means re-reading the same WRF grid from S3 for every site. The 45 km grid is small (~20-30 MB per var per year), so we cache the full grid as rds in tempdir() on the first site. Sites 2 through N just do
    readRDS() and extract their grid cell locally. For 200 sites x 8 vars x 20 years, that cuts S3 round trips from 32,000 to 160. The cache auto cleans when R exits

  • added caladapt_wrf column to pecan_standard_met_table

  • 9 output variables total:

    • direct (no conversion): air_temperature, air_pressure, shortwave, longwave, eastward_wind, northward_wind
    • converted: precipitation_flux (mm/hr -> kg/m2/s), specific_humidity (mixing ratio -> q/(1+q))
    • derived: wind_speed (sqrt(u10^2 + v10^2))
  • data coverage:

    • 8 GCMs under SSP3-7.0: CESM2, CNRM-ESM2-1, EC-Earth3, EC-Earth3-Veg, FGOALS-g3, MPI-ESM1-2-HR, MIROC6, TaiESM1
    • CESM2 also has SSP2-4.5 and SSP5-8.5
    • 1980-2100, hourly, 45 km (d01)
    • 3 models store precip as rainc + rainnc components instead of a single prec field; but caladaptR handles that transparently

implements -- ccmmf/organization#82 and ccmmf/organization#83

Motivation and Context

Review Time Estimate

  • Immediately
  • Within one week
  • When possible

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My change requires a change to the documentation.
  • My name is in the list of CITATION.cff
  • I agree that PEcAn Project may distribute my contribution under any or all of
    • the same license as the existing code,
    • and/or the BSD 3-clause license.
  • I have updated the CHANGELOG.md.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

Copy link
Copy Markdown

@dlebauer dlebauer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! A few comments / suggestions.

year_start <- paste0(year, "-01-01T00:00:00")
year_end <- paste0(year, "-12-31T23:00:00")

##fetch each variable, using session cache to avoid redundant S3 reads
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I understand, writing the rds then loading it 10k times will be less efficient than using a format that is indexed like parquet or netcdf etc.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great point, ran the numbers before changing anything; tested ads vs netcdf (non-proxy and proxy) vs Parquet on WRF data from S3. proxy mode looks like the obvious win on paper, but the stars docs say it directly: "operations requiring data access automatically fetch the underlying data, defeating lazy evaluation benefits" st_extract is one of those, it materializes full grid every call, then pays the proxy overhead on top
the reason indexed formats don't help us here is that our access pattern is "load full grid, extract one cell, done" there's no partial read happening to optimize. rds deserialization is just faster than GDAL's netcdf parsing for stars objects at this size class. and bonus, rds preserves the units attribute on the stars object; netcdf write_stars and read_stars round trip strips it.
Keeping rds for now and open to revisiting if you want me to add an in memory env cache so the same (model, scenario, var, year) only hits disk once per R session

model = model,
scenario = scenario,
timescale = "1hr",
resolution = "d01",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can grid size be a function argument?

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, exposed as resolution param, defaults to "d01".
heads-up: WUS-D3 publishes d01 (45 km), d02 (9 km), and d03 (3 km), all at hourly/daily/monthly. coverage varies by model+scenario+variable combo, so caladaptaer::cae_check_variables() is the right thing to run before launching production at d02 or d03

# precip: WRF hourly accumulation (mm) -> CF flux (kg/m2/s)
# 1 mm water = 1 kg/m2, divide by 3600s for hourly timestep
if ("prec" %in% names(dat.list)) {
dat.list[["prec"]] <- dat.list[["prec"]] / 3600
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set hour_to_second <- PEcAn.utils::ud_convert(1, 'h', 's') at top of file. Hard coded conversion factors are more likely to be in error, even when they seem straightforward it is better to be explicit.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looked into this. there are two things:

  1. "hardcoded factors are error-prone"

argument is real in general, agreed ud_convert is the cleaner pattern when there's any chance the unit could change
2. but for this specific code timescale = "1hr" is hardcoded right next to the conversion in the same function (lines apart). and verified the catalog too, WUS-D3 only publishes 1hr, day, mon for WRF activity, no 3hr table exists. so there's no realistic future state where the timescale changes without someone editing this exact block of code
the closest analog in download.NLDAS.R uses precipitation_flux / 3600 with same hardcoded value, so we're consistent with existing precedent

reverted my ud_convert commit and went back to / 3600. could be swap back if you would rather standardize on the named constant pattern across both functions

Comment thread modules/data.atmosphere/DESCRIPTION Outdated
testthat (>= 3.1.7),
withr
Remotes:
github::lebauerapproach/caladaptR,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will need to make this public

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@divine7022 divine7022 closed this Apr 29, 2026
@divine7022
Copy link
Copy Markdown
Owner Author

divine7022 commented Apr 29, 2026

closing in favor of PR against PecanProject/pecan

dlebauer pushed a commit that referenced this pull request May 20, 2026
Pull develop into patch 1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants