Skip to content

feat: add ERA5 CDS ↔ CF variable name mapping utility#3922

Open
Akash-paluvai wants to merge 2 commits intoPecanProject:developfrom
Akash-paluvai:GH-3605-era5-cf-varname-map
Open

feat: add ERA5 CDS ↔ CF variable name mapping utility#3922
Akash-paluvai wants to merge 2 commits intoPecanProject:developfrom
Akash-paluvai:GH-3605-era5-cf-varname-map

Conversation

@Akash-paluvai
Copy link
Copy Markdown
Contributor

Description

This PR introduces an internal ERA5 CDS to CF variable name translation utility and updates the AmeriFlux coverage workflow to use it.

Previously, CDS variable names (used for ERA5 download) and CF standard names (used in NetCDF files) were not aligned. This mismatch caused silent failures in downstream coalescing, where missing values were not filled even when ERA5 fallback was required.

This PR adds an explicit mapping layer and ensures both naming conventions are available and correctly used within the workflow.


Motivation and Context

Fixes silent failure in ERA5 fallback preparation (see #3605).

Previously:

  • Coverage checks returned only CDS variable names
  • Downstream coalescing expects CF standard names
  • Result: variables were not matched correctly and no filling occurred

This PR ensures:

  • CDS names are used for ERA5 data download
  • CF names are used for NetCDF variable matching during coalescing
  • Both naming conventions are consistently propagated through the pipeline

Review Time Estimate

  • Immediately
  • Within one week
  • When possible

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My change requires a change to the documentation.
  • My name is in the list of CITATION.cff
  • I agree that PEcAn Project may distribute my contribution under any or all of
    • the same license as the existing code,
    • and/or the BSD 3-clause license.
  • I have updated the CHANGELOG.md.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

Copy link
Copy Markdown
Member

@dlebauer dlebauer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Akash-paluvai thank you for this contribution. It occurs to me that this would a good chance to make a general mapping function; see comments.

paste(unknown, collapse = ", "),
"returning NA for those entries"
)
warning(msg, call. = FALSE)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only the PEcAn.logger call is required, it will pass to the appropriate function (warning in this case). and it handles the outer paste internally (the inner one with collapse = may need to stay.

#' included. Variables not listed here are not handled by this pipeline.
#'
#' @noRd
era5_cds_to_cf_varnames <- c(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please let me know if this doesn't make sense, but it seems like it would be simpler and more generally useful to

translate_met_varnames <- function(vars, from, to, table = pecan_standard_met_table) {
  lookup <- table |>
    dplyr::select(dplyr::all_of(c(from, to))) |>
    dplyr::filter(!is.na(.data[[from]]), .data[[from]] != "")

  result <- lookup[[to]]
  names(result) <- lookup[[from]]

  result[vars]
}

then if it is useful (not clear if it is, but simple enough), developers can create <from>_to_<to>_varnames. functions as needed.

@Akash-paluvai
Copy link
Copy Markdown
Contributor Author

@Akash-paluvai thank you for this contribution. It occurs to me that this would a good chance to make a general mapping function; see comments.

Thanks @dlebauer — I’ve updated the PR based on your suggestions. Moving this into pecan_standard_met_table makes the design much cleaner.

Changes made

  • Added an era5_cds column to pecan_standard_met_table for long-form CDS API variable names (distinct from the existing era5 GRIB short names).
  • Implemented a generic translate_met_varnames(vars, from, to, table) utility for translating between naming conventions using the table as the source of truth.
  • Rewrote cds_to_cf_varnames() and cf_to_cds_varnames() as thin wrappers over this function.
  • Removed the standalone mapping file (era5_cf_varname_map.R) to avoid duplication.
  • Updated check_met_coverage_for_fallback() to return both fill_vars_cds and fill_vars_cf.
  • Adjusted tests accordingly to validate translation behaviour and return structure.

Notes

  • Only the variables currently used in the fallback pipeline have non-NA values in era5_cds; others remain NA until verified.
  • Warning behaviour is handled via PEcAn.logger::logger.warn(), and tests now validate correct NA handling for unknown inputs.

@Akash-paluvai
Copy link
Copy Markdown
Contributor Author

translate_met_varnames() is currently exported as a generic utility for translating between naming conventions using pecan_standard_met_table.

I’m happy to keep it internal instead if you’d prefer to limit the public API surface for now.

@Akash-paluvai Akash-paluvai requested a review from dlebauer April 20, 2026 09:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants