-
Notifications
You must be signed in to change notification settings - Fork 308
Function to parse restart dates out of event files #3828
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,52 @@ | ||
| #' Extract the first planting date of each crop cycle | ||
| #' | ||
| #' Reads a (JSON) management events file and finds the planting events at which | ||
| #' the site changes from from one crop to another, ignoring repeat plantings of | ||
| #' the same crop. | ||
| #' These are the dates when single-PFT models will need to restart to update | ||
| #' their crop parameterization. | ||
| #' | ||
| #' TODO: For now this function requires each planting event to specify a | ||
| #' `crop` attribute, but note that this is not enforced by v0.1 of the PEcAn | ||
| #' events schema. The schema instead allows each site object to specify a | ||
| #' site-level `PFT` attribute that is implied constant over time. | ||
| #' As I write this I think the schema may need to change to require a crop or | ||
| #' PFT identifier be specified for every planting event. | ||
| #' | ||
| #' @param event_json path to an `events.json` file | ||
| #' | ||
| #' @return data frame with columns `site_id`, `date`, `crop`, | ||
| #' with one row per detected crop cycle. | ||
| #' @export | ||
| #' @author Chris Black | ||
| #' | ||
| #' @examples | ||
| #' # Not currently runnable because file does not list crop in planting events. | ||
| #' # Revisit after deciding if schema update is warranted. | ||
| #' \dontrun{ | ||
| #' evts <- system.file( | ||
| #' "events_fixtures/events_site1_site2.json", | ||
| #' package = "PEcAn.data.land" | ||
| #' ) | ||
| #' events_to_crop_cycle_starts(evts) | ||
| #' } | ||
| events_to_crop_cycle_starts <- function(event_json) { | ||
| jsonlite::read_json(event_json) |> | ||
| dplyr::bind_rows() |> | ||
| dplyr::mutate(events = purrr::map(.data$events, as.data.frame)) |> | ||
| tidyr::unnest(.data$events) |> | ||
| dplyr::mutate(date = as.Date(.data$date)) |> | ||
| find_crop_changes() | ||
| } | ||
|
|
||
| # helper for events_to_crop_cyle_starts, | ||
| # mostly to ease unit testing | ||
| find_crop_changes <- function(event_df) { | ||
| event_df |> | ||
| dplyr::filter(.data$event_type == "planting") |> | ||
| dplyr::arrange(.data$site_id, .data$date) |> | ||
| dplyr::mutate(crop_cycle_id = dplyr::consecutive_id(.data$site_id, .data$crop)) |> | ||
| dplyr::group_by(.data$site_id, .data$crop_cycle_id) |> | ||
| dplyr::slice_min(.data$date) |> | ||
| dplyr::select("site_id", "date", "crop") | ||
|
Comment on lines
+49
to
+51
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Two quick notes (from trying these changes in my restart work):
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
| } | ||
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,49 @@ | ||
| test_that("non-planting events are ignored", { | ||
| dat <- dplyr::tribble( | ||
| ~site_id, ~date, ~event_type, ~crop, | ||
| "a", "2016-01-01", "planting", "almond", | ||
| "a", "2016-05-01", "irrigation", NA_character_, | ||
| "a", "2017-01-01", "planting", "almond", | ||
| "a", "2017-05-15", "fertilization", NA_character_, | ||
| ) | ||
| res <- find_crop_changes(dat) | ||
| expect_equal(nrow(res), 1) | ||
| expect_equal(res$date, "2016-01-01") | ||
| expect_equal(res, find_crop_changes(dat[-c(2, 4), ])) | ||
| }) | ||
|
|
||
| test_that("nonconsecutive runs of the same crop counted separately", { | ||
| dat <- dplyr::tribble( | ||
| ~site_id, ~date, ~event_type, ~crop, | ||
| "b", "2016-03-01", "planting", "tomato", | ||
| "b", "2017-03-05", "planting", "tomato", | ||
| "b", "2018-04-15", "planting", "potato", | ||
| "b", "2018-08-01", "planting", "tomato", | ||
| ) | ||
| res <- find_crop_changes(dat) | ||
| expect_equal(nrow(res), 3) | ||
| expect_equal(res$date, dat$date[c(1, 3, 4)]) | ||
| }) | ||
|
|
||
| test_that("sites are counted separately", { | ||
| dat <- dplyr::tribble( | ||
| ~site_id, ~date, ~event_type, ~crop, | ||
| "a", "2016-03-01", "planting", "grape", | ||
| "b", "2016-03-01", "planting", "grape", | ||
| "c", "2023-03-01", "planting", "grape", | ||
| ) | ||
| res <- find_crop_changes(dat) | ||
| expect_equal(nrow(res), 3) | ||
| expect_equal(res$date, dat$date) | ||
| expect_equal(res$site_id, dat$site_id) | ||
| }) | ||
|
|
||
| test_that("reads from JSON", { | ||
| path <- system.file( | ||
| "events_fixtures/events_site1.json", | ||
| package = "PEcAn.data.land" | ||
| ) | ||
| res <- events_to_crop_cycle_starts(path) | ||
| expect_equal(res$date, "2022-02-19") | ||
| expect_equal(res$crop, "EX1") | ||
| }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that it would be helpful to have helper functions that convert events.json to and from tables, e.g. events_json_to_table()
events_table_to_json(). Please either implement or convert this comment to an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where else do you expect to want to use these, and for what fraction of event usage? If we'll ~always want to process events in table format, then maybe they should be stored that way instead of unnesting from JSON all the time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each event type is already being generated in a "tidy" tabular format, so staying tabular is easier and more compact for us. But the crux of the problem comes when you have to interleave different event types chronologically as each event type has different variables associated with it. We solved that in SIPNET by not having column headers -- you just have to know from the metadata the position of the different variables in each row. That remains an option here, but we'll loose the ability for the dataframe to play nice with lots of R tools (e.g., tidyverse). Other options are a wide format (all possible event column names, most irrelevant for most events) or a long format (e.g., datetime, site, variable, value) which will result in each event taking up multiple rows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds to me like we don't yet have a single target table format, so I think it's premature to try to write the helper for it in this PR. It seems very plausible that the format will be context-specific: Here I unnested first to get a wide (and sparse!) table and it was fine, but in other cases we might want to filter the still-nested events by event type/crop/etc so that we can unnest to a form with fewer NAs.