Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions modules/data.land/NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ export(download_package_rm)
export(ens_veg_module)
export(eto_to_etc)
export(eto_to_etc_bism)
export(events_to_crop_cycle_starts)
export(extract.stringCode)
export(extract_FIA)
export(extract_NEON_veg)
Expand Down
52 changes: 52 additions & 0 deletions modules/data.land/R/events_to_crop_cycle_starts.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
#' Extract the first planting date of each crop cycle
#'
#' Reads a (JSON) management events file and finds the planting events at which
#' the site changes from from one crop to another, ignoring repeat plantings of
#' the same crop.
#' These are the dates when single-PFT models will need to restart to update
#' their crop parameterization.
#'
#' TODO: For now this function requires each planting event to specify a
#' `crop` attribute, but note that this is not enforced by v0.1 of the PEcAn
#' events schema. The schema instead allows each site object to specify a
#' site-level `PFT` attribute that is implied constant over time.
#' As I write this I think the schema may need to change to require a crop or
#' PFT identifier be specified for every planting event.
#'
#' @param event_json path to an `events.json` file
#'
#' @return data frame with columns `site_id`, `date`, `crop`,
#' with one row per detected crop cycle.
#' @export
#' @author Chris Black
#'
#' @examples
#' # Not currently runnable because file does not list crop in planting events.
#' # Revisit after deciding if schema update is warranted.
#' \dontrun{
#' evts <- system.file(
#' "events_fixtures/events_site1_site2.json",
#' package = "PEcAn.data.land"
#' )
#' events_to_crop_cycle_starts(evts)
#' }
events_to_crop_cycle_starts <- function(event_json) {
jsonlite::read_json(event_json) |>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it would be helpful to have helper functions that convert events.json to and from tables, e.g. events_json_to_table()
events_table_to_json(). Please either implement or convert this comment to an issue.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where else do you expect to want to use these, and for what fraction of event usage? If we'll ~always want to process events in table format, then maybe they should be stored that way instead of unnesting from JSON all the time.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each event type is already being generated in a "tidy" tabular format, so staying tabular is easier and more compact for us. But the crux of the problem comes when you have to interleave different event types chronologically as each event type has different variables associated with it. We solved that in SIPNET by not having column headers -- you just have to know from the metadata the position of the different variables in each row. That remains an option here, but we'll loose the ability for the dataframe to play nice with lots of R tools (e.g., tidyverse). Other options are a wide format (all possible event column names, most irrelevant for most events) or a long format (e.g., datetime, site, variable, value) which will result in each event taking up multiple rows.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds to me like we don't yet have a single target table format, so I think it's premature to try to write the helper for it in this PR. It seems very plausible that the format will be context-specific: Here I unnested first to get a wide (and sparse!) table and it was fine, but in other cases we might want to filter the still-nested events by event type/crop/etc so that we can unnest to a form with fewer NAs.

dplyr::bind_rows() |>
dplyr::mutate(events = purrr::map(.data$events, as.data.frame)) |>
tidyr::unnest(.data$events) |>
dplyr::mutate(date = as.Date(.data$date)) |>
find_crop_changes()
}

# helper for events_to_crop_cyle_starts,
# mostly to ease unit testing
find_crop_changes <- function(event_df) {
event_df |>
dplyr::filter(.data$event_type == "planting") |>
dplyr::arrange(.data$site_id, .data$date) |>
dplyr::mutate(crop_cycle_id = dplyr::consecutive_id(.data$site_id, .data$crop)) |>
dplyr::group_by(.data$site_id, .data$crop_cycle_id) |>
dplyr::slice_min(.data$date) |>
dplyr::select("site_id", "date", "crop")
Comment on lines +49 to +51
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two quick notes (from trying these changes in my restart work):

  1. This will return a grouped data frame because we never ungroup. I suggest either adding an ungroup call to the end of the pipe or (my personal preference) replacing the group_by with the by argument to slice_min to only group that one operation.
    dplyr::slice_min(.data$date, by = c("site_id", "crop_cycle_id")) |>
  2. I think in Update events schema: require crop id when planting, remove pft from site #3836 we will start to require crop_code, right? I'm not sure what the relationship between crop and crop_code is, but assuming we'll be using crop_code instead of crop, this function will need to be adjusted accordingly. Since the two PRs are closely related, I might suggest sequencing this one after Update events schema: require crop id when planting, remove pft from site #3836 and future-proofing this to start using crop_code right away.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Good call!
  2. Yep, this is now waiting on schema finalization in Update events schema: require crop id when planting, remove pft from site #3836. I'll mark it as waiting.

}
44 changes: 44 additions & 0 deletions modules/data.land/man/events_to_crop_cycle_starts.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
test_that("non-planting events are ignored", {
dat <- dplyr::tribble(
~site_id, ~date, ~event_type, ~crop,
"a", "2016-01-01", "planting", "almond",
"a", "2016-05-01", "irrigation", NA_character_,
"a", "2017-01-01", "planting", "almond",
"a", "2017-05-15", "fertilization", NA_character_,
)
res <- find_crop_changes(dat)
expect_equal(nrow(res), 1)
expect_equal(res$date, "2016-01-01")
expect_equal(res, find_crop_changes(dat[-c(2, 4), ]))
})

test_that("nonconsecutive runs of the same crop counted separately", {
dat <- dplyr::tribble(
~site_id, ~date, ~event_type, ~crop,
"b", "2016-03-01", "planting", "tomato",
"b", "2017-03-05", "planting", "tomato",
"b", "2018-04-15", "planting", "potato",
"b", "2018-08-01", "planting", "tomato",
)
res <- find_crop_changes(dat)
expect_equal(nrow(res), 3)
expect_equal(res$date, dat$date[c(1, 3, 4)])
})

test_that("sites are counted separately", {
dat <- dplyr::tribble(
~site_id, ~date, ~event_type, ~crop,
"a", "2016-03-01", "planting", "grape",
"b", "2016-03-01", "planting", "grape",
"c", "2023-03-01", "planting", "grape",
)
res <- find_crop_changes(dat)
expect_equal(nrow(res), 3)
expect_equal(res$date, dat$date)
expect_equal(res$site_id, dat$site_id)
})

test_that("reads from JSON", {
path <- system.file(
"events_fixtures/events_site1.json",
package = "PEcAn.data.land"
)
res <- events_to_crop_cycle_starts(path)
expect_equal(res$date, "2022-02-19")
expect_equal(res$crop, "EX1")
})
Loading