Skip to content

CIMIS irrigation workflow, v1#3821

Merged
mdietze merged 71 commits intoPecanProject:developfrom
ashiklom:cimis-et
Apr 24, 2026
Merged

CIMIS irrigation workflow, v1#3821
mdietze merged 71 commits intoPecanProject:developfrom
ashiklom:cimis-et

Conversation

@ashiklom
Copy link
Copy Markdown
Member

See #3762. This is a draft with significant stylistic and performance issues. See the qmd file for an end-to-end overview of the functionality.

I definitely want to refine this, but I wanted to open it as a progress marker and to make confirm I'm on the right track.

Copy link
Copy Markdown
Member

@mdietze mdietze left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall looks to be on the right track to knock of #3762. Before moving on the #3763 and #3764 I think the main thing that remains is to scale up the calculations to the full set of CA crop parcels using the HLS phenology data extracted by @sarahkanee. That said, if we can sneak any bits of #3763 into here I'd prioritize better defaults for minWHC

Comment thread modules/data.land/data-raw/kc_coefficients.R Outdated
Comment thread modules/data.land/R/water_balance.R Outdated
Comment thread modules/data.land/R/water_balance.R Outdated
Comment thread modules/data.land/R/water_balance.R Outdated
Comment thread modules/data.land/R/water_balance.R Outdated
@ashiklom
Copy link
Copy Markdown
Member Author

ashiklom commented Feb 24, 2026

I made the mistake of initially developing this completely offline of PEcAn and, in doing so, completely missed #3765, which implements a sizeable chunk of this work (and does so with better code). Won't be that mistake again! So we should definitely hold on merging this until I refactor my work to use that code and eliminate my redundant implementations.

That said, the refactor is a good opportunity to bang out the two companion issues mentioned above, at least in first pass form, since they seem like only minor straightforward tweaks to the water balance logic.

Can you point me to where the HLS phenology data are? If so, I can integrate that at the same time as the other changes.

@ashiklom
Copy link
Copy Markdown
Member Author

I refactored this to use the BISM coefficients from #3765 . A nice side-effect is that the vignette now runs much faster, and the logic is simpler. I also vectorized the CHIRPS download code so we only read one file per year (rather than re-reading the same year's file separately for every day, like we were doing before) and updated some of the water balance docs.

I still need to clean up the deprecated CIMIS data (I'll do that as a rebase to avoid inflating the repo size).

I also need to make some of the changes to the water balance parameterization as suggested by @mdietze.

Finally, and most importantly, I still need to integrate the HLS-based phenology information.

@ashiklom ashiklom force-pushed the cimis-et branch 2 times, most recently from 62b12a5 to e40ee6b Compare March 1, 2026 15:51
Comment thread modules/data.land/R/water_balance.R Outdated
@ashiklom ashiklom self-assigned this Mar 5, 2026
@ashiklom ashiklom changed the title WIP: Draft CIMIS irrigation workflow CIMIS irrigation workflow, v1 Mar 5, 2026
@ashiklom ashiklom force-pushed the cimis-et branch 2 times, most recently from bd86959 to 2c8858f Compare March 5, 2026 21:36
@ashiklom
Copy link
Copy Markdown
Member Author

ashiklom commented Mar 6, 2026

OK, I think I'm ready to call this PR done. It seems like a lot, but I'm hoping the changes are relatively self-contained and straightforward to review.

My suggestion for reviewing this is to start with the three new scripts in inst that show the relevant functionality and how it all fits together. From there, you can inspect the implementation details of individual functions.

The demonstration scripts are as follows:

  • ssurgo-soil-inputs.R --- a quick test to confirm grabbing arbitrary data from the SSURGO API. This includes the new ssurgo_mukey_* functionality.
  • compare-cimis-openet.R --- a comparison of CIMIS ET and OpenET ET data, including a demonstration of how to use the functions.
  • CIMIS-event-files.R --- a complete workflow that
    • starts from design points and dates
    • gets ET from CIMIS
    • gets precip from CHIRPS
    • merges design points with LandIQ to get crop types
    • merges with BISM KC data to get Kc coefficients (to get actual ET rates)
    • merges with the new crop_whc table, which contains rooting depth and MAD (whc_min_frac) values for various crops
    • extracts precalculated available water content (AWC) from SSURGO for the sites, and combines with rooting depth information to get WHC for each site
    • calculates actual ET from CIMIS ETref by using the crop-specific Kc coefficients
    • joins everything together
    • performs water balance calculations
    • writes SIPNET event files based on the irrigation information

The most important science thing to check here is the calc_water_balance function. Per my comment above, the logic here is changed from the original prototype Python implementation such that, when you dip below the irrigation threshold (MAD), you irrigate back up to field capacity (not to MAD). This leads to larger and more episodic irrigation. From my (limited) research on agriculture, this seems closer to what farmers actually do, but you all should weigh in here.

Note also that I was unaware of #3643 when I wrote the SSURGO mukey functionality, so I inadvertently recreated a (possibly worse) version of soilDB::mukey.wcs. I'll remove my code as soon as I confirm that it does what I need.

EDIT: I tried out soilDB::mukey.wcs and poked around the soilDB docs. From a cursory look, the soilDB functions are all structured around querying bounding boxes, but I don't see support for querying individual points. If you pass a sparse list of coordinates to soilDB::mukey.wcs, it takes the bounding box of the polygon and returns the result, which is much more data than you need and will fail if the total AOI is greater than 10,100 km2 (e.g., if I want to sample 10 sites across CONUS, or even 10 sites spread out over all of California). I also don't see a way to overcome the spatial extent limit except splitting up requests by hand. My ssurgo_mukeys_bigbbox handles this directly by splitting up a large bounding box into smaller cells guaranteed to be less than the spatial limit and then querying those in parallel.

Copy link
Copy Markdown
Member

@mdietze mdietze left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ashiklom there's a LOT here so I had to skip parts. A few quick questions before we merge

  1. Has this workflow been run to produce outputs (e.g. on the BU server) and if so is that just for the anchor sites or statewide? Please share file paths on Slack as applicable.
  2. If this has only been run for anchor sites, is there anything preventing a scaling up to statewide or any improvements you'd want to make before doing so?
  3. In writing events, are you still aggregating irrigation needs up to a weekly timescale or does the switch to irrigating to FC alleviate this need?

Comment thread modules/data.land/R/water_balance.R Outdated
@ashiklom
Copy link
Copy Markdown
Member Author

  1. No, because the current outputs are fake and potentially somewhat nonsensical in the absence of real phenology/planting inputs (per your requests to integrate Sarah's files). All development of this has been local. If the core logic of the workflow looks sound, and once I have a pointer to real files that can be used for phenology and/or planting dates, I can try to scale it up on the BU HPC.

  2. Strictly speaking, no limitations. Any memory and compute bottlenecks could be alleviated by just naive parallelization.

  3. Weekly aggregation is still there. I have no idea why it was there originally (some SIPNET limitation/expectation?), so I can't comment on whether it's still appropriate. My intuition is that, with more episodic irrigation events under the new logic, it's no longer necessary. If so, I'm happy to remove it.

Comment thread modules/data.land/inst/CIMIS-event-files.R Outdated
Copy link
Copy Markdown
Member

@infotroph infotroph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently I started a review ages ago and never clicked submit 🤦. See below

Comment thread modules/data.land/NAMESPACE Outdated
@infotroph
Copy link
Copy Markdown
Member

@ashiklom OK to merge this when checks pass?

@ashiklom
Copy link
Copy Markdown
Member Author

OK, now this is ready for merge (or at least review). This now accurately reflects the actual statewide irrigation workflow we are using for the current batch of event files for the CCMMF MVP.

A few major changes since last time:

  1. Similar to SIPNET workflow for restarting with events #3919, this puts the code in the workflows folder rather than abusing data.land/inst as it did before.
  2. The entire workflow is implemented using targets. I know we originally shied away from targets, but I've found this works really well here, especially for incrementally updating the pipeline and for scaling dynamically across the BU HPC.
    • Some of these functions should eventually be migrated into data.land, but that feels like a premature generalization since this is only one very specific workflow. Suggest punting that to post-MVP.
  3. The CIMIS and CHIRPS functions I had originally developed for data.land scaled terribly for the statewide analysis. I replaced these with offline workflows (in workflows/irrigation-statewide/preprocessing) that do the time- and compute-intensive spatial extractions as separate steps. These run much faster. At some point, we can think about whether/how to integrate these more tightly into data.land, data.remote, or somewhere else.
  4. There is an explicit implementation of flooded rice.
  5. Upland canopy irrigation can be capped per-even via a irrigation_max parameter (reasonable default is 150 mm ~= 0.6 in).

Note that the CI check will currently fail because data.land now imports >= 25 packages, which causes devtools::check to raise a NOTE like:

0 warnings found in modules/data.land.
3 notes found in modules/data.land.
Error: Please fix these and resubmit.
R check of modules/data.land returned new problems:
checking package dependencies ... NOTE: Imports includes 25 non-default packages.
Execution halted

The 3 packages I added are (1) arrow, which we actually want for reading parquet files; (2) httr2, which has been the modern alternative to httr for a while now (and which is used here for SSURGO queries); and (3) tibble, which we already implicitly depend on through dplyr. It's possible to artificially silence this NOTE by replacing a bunch of calls like tibble::tibble, tibble::as_tibble, tidyselect::one_of, etc. with dplyr:: (which re-rexports all of these functions)...but that seems disingenuous. The better solution is to add this note to our NOTE checker, but that may require intervention by @infotroph or other CI wizards.

Comment thread modules/data.land/tests/Rcheck_reference.log
Comment thread modules/data.land/DESCRIPTION Outdated
@mdietze mdietze enabled auto-merge April 24, 2026 14:47
@mdietze mdietze added this pull request to the merge queue Apr 24, 2026
Merged via the queue into PecanProject:develop with commit 3d44a06 Apr 24, 2026
19 of 26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants