Skip to content

Refactor the baseline Validation Pipeline to align with GSoC Architecture#4017

Open
ayushman1210 wants to merge 14 commits into
PecanProject:developfrom
ayushman1210:gsoc/phase2-architecture
Open

Refactor the baseline Validation Pipeline to align with GSoC Architecture#4017
ayushman1210 wants to merge 14 commits into
PecanProject:developfrom
ayushman1210:gsoc/phase2-architecture

Conversation

@ayushman1210
Copy link
Copy Markdown
Contributor

This issue tracks the follow-up changes required for the dataframe-first validation framework introduced in PR #3892 aligning it with the architecture outlined in my GSoC 2026 workplan

Planned Changes

  1. Modular 4-Stage Pipeline: Refactor the monolithic run_benchmark() entry point into a flexible 4-stage pipeline: Validate -> Align -> Compute -> Plot.
  2. data.table Integration: Replace the existing naive alignment logic with high-performance data.table operations for spatial and temporal alignment, adopting patterns from PR data prep and post bias correction for NEE and LE #3934.
  3. BETYdb Decoupling: Ensure that the pipeline is completely independent of BETYdb IDs, relying solely on standard variable names and tabular data structures.
  4. Generalized Metrics: Expand the metric computation (RMSE, MAE, R2, correlation) to ensure they operate cleanly on the aligned data frames, independent of legacy database structures.

Context

PR #3892 established the baseline dataframe-first concept. The changes tracked in this issue will evolve it into the generalized, high-performance toolkit planned for Phase 2 of my GSoC project.

@divine7022 divine7022 self-requested a review June 1, 2026 17:42
@infotroph
Copy link
Copy Markdown
Member

data.table Integration: Replace the existing naive alignment logic with high-performance data.table operations

Please convert this to a dplyr or base R approach. data.table is a good package that lots of people use happily, but it is not widely used in PEcAn and its syntax is confusing to folks who don't know it already.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants