add CalAdapt WRF download function for CMIP6 hourly met data by divine7022 · Pull Request #3 · divine7022/pecan

divine7022 · 2026-04-16T02:19:33Z

Description

adds download.CalAdaptWRF() to PEcAn.data.atmosphere new met driver that pulls hourly WRF dynamically downscaled CMIP6 projections from the Cal-Adapt Analytics Engine (WUS-D3 dataset, Rahimi et al. 2024). Data sits on public AWS S3, no auth needed

this is implemented as a part of CCMMF where we need future climate forcing at ~200 California sites for SIPNET runs under multiple GCMs and SSPs.

looked at how pecan handles met downloads and this follows the same pattern as CRUNCEP/GFDL, the download function does everything (fetch, extract, convert, write CF) in one shot, so we skip the met2CF and extract.nc stages. Main reason: WRF uses a Lambert Conformal grid and extract.nc / closest_xy assume lat-lon grids with NARR-style bounds, so they can't handle this projection. Adding CalAdaptWRF to skip from list in met.process.R was the cleanest path.
the tricky part is that pecan's papply calls download.CalAdaptWRF once per site, it never sees the full site list. Naively that means re-reading the same WRF grid from S3 for every site. The 45 km grid is small (~20-30 MB per var per year), so we cache the full grid as rds in tempdir() on the first site. Sites 2 through N just do
readRDS() and extract their grid cell locally. For 200 sites x 8 vars x 20 years, that cuts S3 round trips from 32,000 to 160. The cache auto cleans when R exits
added caladapt_wrf column to pecan_standard_met_table
9 output variables total:
- direct (no conversion): air_temperature, air_pressure, shortwave, longwave, eastward_wind, northward_wind
- converted: precipitation_flux (mm/hr -> kg/m2/s), specific_humidity (mixing ratio -> q/(1+q))
- derived: wind_speed (sqrt(u10^2 + v10^2))
data coverage:
- 8 GCMs under SSP3-7.0: CESM2, CNRM-ESM2-1, EC-Earth3, EC-Earth3-Veg, FGOALS-g3, MPI-ESM1-2-HR, MIROC6, TaiESM1
- CESM2 also has SSP2-4.5 and SSP5-8.5
- 1980-2100, hourly, 45 km (d01)
- 3 models store precip as rainc + rainnc components instead of a single prec field; but caladaptR handles that transparently

implements -- ccmmf/organization#82 and ccmmf/organization#83

Motivation and Context

Review Time Estimate

Immediately
Within one week
When possible

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

My change requires a change to the documentation.
My name is in the list of CITATION.cff
I agree that PEcAn Project may distribute my contribution under any or all of
- the same license as the existing code,
- and/or the BSD 3-clause license.
I have updated the CHANGELOG.md.
I have updated the documentation accordingly.
I have read the CONTRIBUTING document.
I have added tests to cover my changes.
All new and existing tests passed.

dlebauer

Looks great! A few comments / suggestions.

dlebauer · 2026-04-16T05:23:14Z

+    year_start <- paste0(year, "-01-01T00:00:00")
+    year_end   <- paste0(year, "-12-31T23:00:00")
+
+    ##fetch each variable, using session cache to avoid redundant S3 reads


From what I understand, writing the rds then loading it 10k times will be less efficient than using a format that is indexed like parquet or netcdf etc.

great point, ran the numbers before changing anything; tested ads vs netcdf (non-proxy and proxy) vs Parquet on WRF data from S3. proxy mode looks like the obvious win on paper, but the stars docs say it directly: "operations requiring data access automatically fetch the underlying data, defeating lazy evaluation benefits" st_extract is one of those, it materializes full grid every call, then pays the proxy overhead on top
the reason indexed formats don't help us here is that our access pattern is "load full grid, extract one cell, done" there's no partial read happening to optimize. rds deserialization is just faster than GDAL's netcdf parsing for stars objects at this size class. and bonus, rds preserves the units attribute on the stars object; netcdf write_stars and read_stars round trip strips it.
Keeping rds for now and open to revisiting if you want me to add an in memory env cache so the same (model, scenario, var, year) only hits disk once per R session

dlebauer · 2026-04-16T05:24:08Z

+          model      = model,
+          scenario   = scenario,
+          timescale  = "1hr",
+          resolution = "d01",


can grid size be a function argument?

done, exposed as resolution param, defaults to "d01".
heads-up: WUS-D3 publishes d01 (45 km), d02 (9 km), and d03 (3 km), all at hourly/daily/monthly. coverage varies by model+scenario+variable combo, so caladaptaer::cae_check_variables() is the right thing to run before launching production at d02 or d03

dlebauer · 2026-04-16T05:26:59Z

+    # precip: WRF hourly accumulation (mm) -> CF flux (kg/m2/s)
+    # 1 mm water = 1 kg/m2, divide by 3600s for hourly timestep
+    if ("prec" %in% names(dat.list)) {
+      dat.list[["prec"]] <- dat.list[["prec"]] / 3600


set hour_to_second <- PEcAn.utils::ud_convert(1, 'h', 's') at top of file. Hard coded conversion factors are more likely to be in error, even when they seem straightforward it is better to be explicit.

looked into this. there are two things:

"hardcoded factors are error-prone"

argument is real in general, agreed ud_convert is the cleaner pattern when there's any chance the unit could change
2. but for this specific code timescale = "1hr" is hardcoded right next to the conversion in the same function (lines apart). and verified the catalog too, WUS-D3 only publishes 1hr, day, mon for WRF activity, no 3hr table exists. so there's no realistic future state where the timescale changes without someone editing this exact block of code
the closest analog in download.NLDAS.R uses precipitation_flux / 3600 with same hardcoded value, so we're consistent with existing precedent

reverted my ud_convert commit and went back to / 3600. could be swap back if you would rather standardize on the named constant pattern across both functions

dlebauer · 2026-04-16T05:28:49Z

    testthat (>= 3.1.7),
    withr
 Remotes:
+    github::lebauerapproach/caladaptR,


will need to make this public

This reverts commit 48742b1.

This reverts commit cab2ce8.

divine7022 · 2026-04-29T21:35:15Z

closing in favor of PR against PecanProject/pecan

Pull develop into patch 1

divine7022 added 9 commits April 15, 2026 21:17

add download.CalAdaptWRF for Cal-Adapt WRF met data

2eae265

add CalAdaptWRF registration xml

a7902f7

add caladapt_wrf column to pecan_standard_met_table

5e7cad0

add CalAdaptWRF to met.process skip list

a0d783e

add caladaptR and stars to Suggests

73cd4e1

add Cal-Adapt WRF section to met driver docs

b4a1ed2

add CalAdaptWRF entry to NEWS.md

ddc97bb

add CalAdaptWRF entry to CHANGELOG.md

e0f4357

update NAMESPACE and man pages for CalAdaptWRF

ab55816

This was referenced Apr 16, 2026

Write download_caladapt function ccmmf/organization#82

Closed

Write met2cf.caladapt ccmmf/organization#83

Open

dlebauer reviewed Apr 16, 2026

View reviewed changes

divine7022 added 10 commits April 28, 2026 22:37

swap caladaptR for caladaptaer in deps

c22bf37

point download fn at cae_fetch

4910644

regen Rd for caladaptaer rename

06dbbe4

bump NEWS to caladaptaer

3e3e7a7

update met docs for caladaptaer

4b327db

swap rds cache for netcdf with lazy read

48742b1

let callers pick the WRF grid resolution

1380bf0

use ud_convert for hour to seconds

cab2ce8

Revert "swap rds cache for netcdf with lazy read"

849fc7e

This reverts commit 48742b1.

Revert "use ud_convert for hour to seconds"

c5189a6

This reverts commit cab2ce8.

divine7022 closed this Apr 29, 2026

dlebauer pushed a commit that referenced this pull request May 20, 2026

Merge pull request #3 from chaneyfink/develop

3b3e951

Pull develop into patch 1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add CalAdapt WRF download function for CMIP6 hourly met data#3

add CalAdapt WRF download function for CMIP6 hourly met data#3
divine7022 wants to merge 19 commits into
developfrom
caladapt-wrf

divine7022 commented Apr 16, 2026 •

edited

Loading

Uh oh!

dlebauer left a comment

Uh oh!

dlebauer Apr 16, 2026

Uh oh!

divine7022 Apr 29, 2026

Uh oh!

dlebauer Apr 16, 2026

Uh oh!

divine7022 Apr 29, 2026

Uh oh!

dlebauer Apr 16, 2026

Uh oh!

divine7022 Apr 29, 2026

Uh oh!

dlebauer Apr 16, 2026

Uh oh!

divine7022 Apr 29, 2026

Uh oh!

divine7022 commented Apr 29, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

divine7022 commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Review Time Estimate

Types of changes

Checklist:

Uh oh!

dlebauer left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

divine7022 commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

divine7022 commented Apr 16, 2026 •

edited

Loading

divine7022 commented Apr 29, 2026 •

edited

Loading