docs: add benchmarks

atsyplenkov · atsyplenkov · commit 3aa22eff8d7b · 2025-06-27T13:58:05.000+12:00
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -26,4 +26,4 @@ Config/testthat/edition: 3
 URL: https://github.com/atsyplenkov/tidyhydro, https://atsyplenkov.github.io/tidyhydro/
 BugReports: https://github.com/atsyplenkov/tidyhydro/issues
 LazyData: true
-Config/Needs/website: bench, ggplot2, quarto
+Config/Needs/website: bench, ggplot2, quarto, lubridate, dplyr
diff --git a/NEWS.md b/NEWS.md
@@ -10,6 +10,9 @@
 -   Improved documenation by switching from `\url` to `\doi`
 -   Removed unicode characters α, β
 
+## Miscellaneous
+-   Created website with vignettes (https://atsyplenkov.github.io/tidyhydro)
+
 # tidyhydro 0.1.0
 
 ## New features
diff --git a/R/nse.R b/R/nse.R
@@ -1,4 +1,4 @@
-#' Nash-Sutcliffe efficiency (NSE)
+#' Nash-Sutcliffe Efficiency (NSE)
 #'
 #' @description
 #' Calculate the Nash-Sutcliffe efficiency (*Nash & Sutcliffe, 1970*).
@@ -8,8 +8,8 @@
 #' @details
 #' The Nash-Sutcliffe efficiency is a normalized statistic that determines
 #' the relative magnitude of the residual variance ("noise") compared to the
-#' measured data variance ("information"; *Nash and Sutcliffe, 1970*). 
-#' 
+#' measured data variance ("information"; *Nash and Sutcliffe, 1970*).
+#'
 #' The formula for NSE is:
 #'
 #' \deqn{
@@ -25,8 +25,8 @@
 #'   \item \eqn{obs} defines model observations at time step \eqn{i}
 #'   \item \eqn{\mu_{obs}} defines mean of model observations
 #' }
-#' 
-#' According to Moriasi et al. (2015) the metric interpretation can be 
+#'
+#' According to Moriasi et al. (2015) the metric interpretation can be
 #' as follows:
 #'
 #' - **Excellent**/**Very Good** -- `nse()` > 0.8
diff --git a/README.Rmd b/README.Rmd
@@ -73,7 +73,7 @@ pak::pak("atsyplenkov/tidyhydro")
 ```
 
 ## Benchmarking
-Since the package uses `Rcpp` in the background, it performs slightly faster than base R and other R packages. This is particularly noticeable with large datasets:
+Since the package uses `Rcpp` in the background, it performs slightly faster than base R and other R packages (see [benchmarks](https://atsyplenkov.github.io/tidyhydro/articles/benchmarks.html)). This is particularly noticeable with large datasets:
 ```{r benchmarking}
 set.seed(12234)
 x <- runif(10^6)
diff --git a/README.md b/README.md
@@ -4,9 +4,7 @@
 # tidyhydro
 
 <!-- badges: start -->
-
 <p align="center">
-
 <a href="https://github.com/atsyplenkov/tidyhydro/releases">
 <img src="https://img.shields.io/github/v/release/atsyplenkov/tidyhydro?style=flat&labelColor=1C2C2E&color=198ce7&logo=GitHub&logoColor=white"></a>
 <a href="https://cran.r-project.org/package=tidyhydro">
@@ -16,7 +14,6 @@
 <a href="https://github.com/atsyplenkov/tidyhydro/actions/workflows/check-r-pkg.yaml">
 <img src="https://img.shields.io/github/actions/workflow/status/atsyplenkov/tidyhydro/check-r-pkg.yaml?style=flat&labelColor=1C2C2E&color=256bc0&logo=GitHub%20Actions&logoColor=white"></a>
 </p>
-
 <!-- badges: end -->
 
 The `tidyhydro` package provides a set of commonly used metrics in
@@ -116,8 +113,9 @@ pak::pak("atsyplenkov/tidyhydro")
 ## Benchmarking
 
 Since the package uses `Rcpp` in the background, it performs slightly
-faster than base R and other R packages. This is particularly noticeable
-with large datasets:
+faster than base R and other R packages (see
+[benchmarks](https://atsyplenkov.github.io/tidyhydro/articles/benchmarks.html)).
+This is particularly noticeable with large datasets:
 
 ``` r
 set.seed(12234)
@@ -142,15 +140,15 @@ bench::mark(
 #> # A tibble: 3 × 6
 #>   expression   min median `itr/sec` mem_alloc `gc/sec`
 #>   <bch:expr> <dbl>  <dbl>     <dbl>     <dbl>    <dbl>
-#> 1 tidyhydro    1      1       30.3        NaN      NaN
-#> 2 hydroGOF    21.7   24.3      1          Inf      Inf
-#> 3 baseR       13.4   15.1      1.94       Inf      Inf
+#> 1 tidyhydro   1       1       29.2        NaN      NaN
+#> 2 hydroGOF   15.8    21.2      1          Inf      Inf
+#> 3 baseR       8.54   10.6      2.32       Inf      Inf
 ```
 
 ## See also
 
-- [`hydroGOF`](https://github.com/hzambran/hydroGOF) - Goodness-of-fit
-  functions for comparison of simulated and observed hydrological time
-  series.
-- [`yardstick`](https://github.com/tidymodels/yardstick/tree/main) -
-  tidy methods for models performance assessment.
+-   [`hydroGOF`](https://github.com/hzambran/hydroGOF) - Goodness-of-fit
+    functions for comparison of simulated and observed hydrological time
+    series.
+-   [`yardstick`](https://github.com/tidymodels/yardstick/tree/main) -
+    tidy methods for models performance assessment.
diff --git a/man/nse.Rd b/man/nse.Rd
diff --git a/vignettes/articles/benchmarks.qmd b/vignettes/articles/benchmarks.qmd
@@ -0,0 +1,70 @@
+---
+title: "Benchmarks"
+knitr:
+  opts_chunk:
+    collapse: true
+    comment: '#>'
+    message: false
+---
+
+Since `tidyhydro` uses C++ under the hood, it performs slightly faster than similar R packages (like `hydroGOF`). The results are particularly noticeable in large datasets with $N$ observations exceeding 1000.
+
+```{r}
+#| label: setup
+library(tidyhydro)
+library(hydroGOF)
+```
+
+# Default dataset `avacha`
+
+```{r}
+# NSE
+bench::mark(
+  tidyhydro = nse_vec(truth = avacha$obs, estimate = avacha$sim),
+  hydroGOF = hydroGOF::NSE(sim = avacha$sim, obs = avacha$obs),
+  relative = TRUE,
+  check = TRUE,
+  iterations = 25L,
+  filter_gc = FALSE
+)
+
+# KGE
+bench::mark(
+  tidyhydro = kge_vec(truth = avacha$obs, estimate = avacha$sim),
+  hydroGOF = hydroGOF::KGE(sim = avacha$sim, obs = avacha$obs, method = "2009"),
+  relative = TRUE,
+  check = TRUE,
+  iterations = 25L,
+  filter_gc = FALSE
+)
+
+# KGE'
+bench::mark(
+  tidyhydro = kge2012_vec(truth = avacha$obs, estimate = avacha$sim),
+  hydroGOF = hydroGOF::KGE(sim = avacha$sim, obs = avacha$obs, method = "2012"),
+  relative = TRUE,
+  check = TRUE,
+  iterations = 25L,
+  filter_gc = FALSE
+)
+
+# pBIAS
+bench::mark(
+  tidyhydro = pbias_vec(truth = avacha$obs, estimate = avacha$sim),
+  hydroGOF = hydroGOF::pbias(sim = avacha$sim, obs = avacha$obs, dec = 9),
+  relative = TRUE,
+  check = TRUE,
+  iterations = 25L,
+  filter_gc = FALSE
+)
+
+# MSE
+bench::mark(
+  tidyhydro = mse_vec(truth = avacha$obs, estimate = avacha$sim),
+  hydroGOF = hydroGOF::mse(sim = avacha$sim, obs = avacha$obs),
+  relative = TRUE,
+  check = TRUE,
+  iterations = 25L,
+  filter_gc = FALSE
+)
+```
diff --git a/vignettes/articles/tidyhydro.qmd b/vignettes/articles/tidyhydro.qmd
@@ -1,14 +1,19 @@
 ---
 title: "Getting started"
+format:
+  html:
+    toc: true
 knitr:
   opts_chunk:
     collapse: true
     comment: '#>'
+    message: false
 ---
 
 ```{r}
 #| label: setup
 #| include: false
+library(tidyhydro)
 library(ggplot2)
 theme_set(
   theme_minimal() +
@@ -18,8 +23,23 @@ theme_set(
 
 # Available metrics
 
-# Example data `avacha`
-The package comes with the mean daily water discharge values (`obs` in m^3/s) measured at the state gauging station Avacha River — Elizovo City (site No. 2090). Alongside with the measured water discharge, the mean water discharge in the last 24 hours derived from the GloFAS-ERA5 v4.0 reanalysis is provided (`sim`).
+In `tidyhydro` v`r packageVersion("tidyhydro")`, `r length(getNamespaceExports("tidyhydro"))/2` metrics are implemented.
+
+| Name | Abbr. | Function calls | Reference |
+|----------------|--------|--------|-------------|
+| Kling-Gupta Efficiency | $KGE$ | `kge`, `kge_vec` | <span style="font-size: 0.8em;">Gupta, H.V.; Kling, H.; Yilmaz, K.K.; Martinez, G.F. (2009). *Journal of Hydrology*, 377(1–2), 80–91.</span> |
+| Modified Kling-Gupta Efficiency | $KGE'$ | `kge2012`, `kge_vec` | <span style="font-size: 0.8em;">Kling, H., Fuchs, M., & Paulin, M. (2012). *Journal of Hydrology*, 424–425, 264–277.</span> |
+| Nash-Sutcliffe Efficiency | $NSE$ | `nse`, `nse_vec` | <span style="font-size: 0.8em;">Nash, J. E., & Sutcliffe, J. V. (1970). *Journal of Hydrology*, 10(3), 282–290.</span> |
+| Mean Squared Error | $MSE$ | `mse`, `mse_vec` | <span style="font-size: 0.8em;">Clark, M. P., Vogel, R. M., Lamontagne, J. R., Mizukami, N., Knoben, W. J. M., Tang, G., Gharari, S., Freer, J. E., Whitfield, P. H., Shook, K. R., & Papalexiou, S. M. (2021). The Abuse of Popular Performance Metrics in Hydrologic Modeling. Water Resources Research, 57(9), e2020WR029001.</span> |
+| Percent BIAS | $pBIAS$ | `pbias`, `pbias_vec` | <span style="font-size: 0.8em;">Gupta, H. V., S. Sorooshian, and P. O. Yapo. (1999). Status of automatic calibration for hydrologic models: Comparison with multilevel expert calibration. J. Hydrologic Eng. 4(2): 135-143 </span> |
+| PRediction Error Sum of Squares | $PRESS$ | `press`, `press_vec` | <span style="font-size: 0.8em;"> Rasmussen, P. P., Gray, J. R., Glysson, G. D. & Ziegler, A. C. Guidelines and procedures for computing time-series suspended-sediment concentrations and loads from in-stream turbidity-sensor and streamflow data. in U.S. Geological Survey Techniques and Methods book 3, chap. C4 53 (2009)</span> |
+| Standard Factorial Error | $SFE$ | `sfe`, `sfe_vec` | <span style="font-size: 0.8em;"> Herschy, R.W. 1978: Accuracy. Chapter 10 In: Herschy, R.W. (ed.) Hydrometry - principles and practices. John Wiley and Sons, Chichester, 511 p.</span> |
+
+: Metrics currently implemented in `tidyhydro` v`r packageVersion("tidyhydro")`
+
+# `avacha` dataset
+
+The package includes the mean daily water discharge values (`obs` in m³/s) measured at the state gauging station Avacha River — Elizovo City (site No. 2090). Alongside the measured water discharge, the mean water discharge in the last 24 hours derived from the [GloFAS-ERA5 v4.0](https://confluence.ecmwf.int/display/CEMS/GloFAS+v4.0) reanalysis is provided (`sim`).
 
 ```{r}
 #| fig-cap: Avacha River - Elizovo City hydrograph
@@ -29,9 +49,43 @@ data(avacha)
 
 avacha |>
   ggplot(aes(x = date)) +
-  geom_line(aes(y = obs, color = "Measured")) +
-  geom_line(aes(y = sim, color = "Predicted")) +
-  scale_color_brewer(name = "", palette = "Set1") +
-  labs(x = "", y = "Water Discharge, m3/s")
+  geom_line(aes(y = obs, colour = "Measured")) +
+  geom_line(aes(y = sim, colour = "Predicted")) +
+  scale_colour_brewer(name = "", palette = "Set1") +
+  labs(x = "", y = "Water Discharge, m³/s")
+```
+
+# Example usage
+
+One can estimate the desired metrics using the `tidyverse` [syntax](https://style.tidyverse.org/). For example, to get the Nash-Sutcliffe Efficiency ($NSE$) or Modified Kling-Gupta Efficiency ($KGE'$) for the `avacha` dataset, one can run:
+
+```{r}
+nse(avacha, obs, sim)
+kge2012(avacha, obs, sim)
+```
 
+Or using the `yardstick` helper functions, one can create a metric set, combining it with other `yardstick` metrics, such as $R^2$:
+
+```{r}
+library(yardstick)
+hydro_metrics <- metric_set(kge, pbias, rsq)
+hydro_metrics(avacha, obs, sim)
 ```
+
+Such syntax is particularly useful when running a group analysis, for example, estimating model performance for different months:
+
+```{r}
+library(lubridate)
+library(dplyr)
+
+avacha |>
+  mutate(month = month(date)) |>
+  group_by(month) |>
+  nse(obs, sim)
+```
+
+Alternatively, one can still use the vectorised versions of the metrics, ending with the `*_vec` suffix:
+
+```{r}
+nse_vec(truth = avacha$obs, estimate = avacha$sim)
+```