Skip to content

Commit 3aa22ef

Browse files
committed
docs: add benchmarks
1 parent c572300 commit 3aa22ef

8 files changed

Lines changed: 152 additions & 27 deletions

File tree

DESCRIPTION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,4 +26,4 @@ Config/testthat/edition: 3
2626
URL: https://github.com/atsyplenkov/tidyhydro, https://atsyplenkov.github.io/tidyhydro/
2727
BugReports: https://github.com/atsyplenkov/tidyhydro/issues
2828
LazyData: true
29-
Config/Needs/website: bench, ggplot2, quarto
29+
Config/Needs/website: bench, ggplot2, quarto, lubridate, dplyr

NEWS.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,9 @@
1010
- Improved documenation by switching from `\url` to `\doi`
1111
- Removed unicode characters α, β
1212

13+
## Miscellaneous
14+
- Created website with vignettes (https://atsyplenkov.github.io/tidyhydro)
15+
1316
# tidyhydro 0.1.0
1417

1518
## New features

R/nse.R

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
#' Nash-Sutcliffe efficiency (NSE)
1+
#' Nash-Sutcliffe Efficiency (NSE)
22
#'
33
#' @description
44
#' Calculate the Nash-Sutcliffe efficiency (*Nash & Sutcliffe, 1970*).
@@ -8,8 +8,8 @@
88
#' @details
99
#' The Nash-Sutcliffe efficiency is a normalized statistic that determines
1010
#' the relative magnitude of the residual variance ("noise") compared to the
11-
#' measured data variance ("information"; *Nash and Sutcliffe, 1970*).
12-
#'
11+
#' measured data variance ("information"; *Nash and Sutcliffe, 1970*).
12+
#'
1313
#' The formula for NSE is:
1414
#'
1515
#' \deqn{
@@ -25,8 +25,8 @@
2525
#' \item \eqn{obs} defines model observations at time step \eqn{i}
2626
#' \item \eqn{\mu_{obs}} defines mean of model observations
2727
#' }
28-
#'
29-
#' According to Moriasi et al. (2015) the metric interpretation can be
28+
#'
29+
#' According to Moriasi et al. (2015) the metric interpretation can be
3030
#' as follows:
3131
#'
3232
#' - **Excellent**/**Very Good** -- `nse()` > 0.8

README.Rmd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,7 @@ pak::pak("atsyplenkov/tidyhydro")
7373
```
7474

7575
## Benchmarking
76-
Since the package uses `Rcpp` in the background, it performs slightly faster than base R and other R packages. This is particularly noticeable with large datasets:
76+
Since the package uses `Rcpp` in the background, it performs slightly faster than base R and other R packages (see [benchmarks](https://atsyplenkov.github.io/tidyhydro/articles/benchmarks.html)). This is particularly noticeable with large datasets:
7777
```{r benchmarking}
7878
set.seed(12234)
7979
x <- runif(10^6)

README.md

Lines changed: 11 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,7 @@
44
# tidyhydro
55

66
<!-- badges: start -->
7-
87
<p align="center">
9-
108
<a href="https://github.com/atsyplenkov/tidyhydro/releases">
119
<img src="https://img.shields.io/github/v/release/atsyplenkov/tidyhydro?style=flat&labelColor=1C2C2E&color=198ce7&logo=GitHub&logoColor=white"></a>
1210
<a href="https://cran.r-project.org/package=tidyhydro">
@@ -16,7 +14,6 @@
1614
<a href="https://github.com/atsyplenkov/tidyhydro/actions/workflows/check-r-pkg.yaml">
1715
<img src="https://img.shields.io/github/actions/workflow/status/atsyplenkov/tidyhydro/check-r-pkg.yaml?style=flat&labelColor=1C2C2E&color=256bc0&logo=GitHub%20Actions&logoColor=white"></a>
1816
</p>
19-
2017
<!-- badges: end -->
2118

2219
The `tidyhydro` package provides a set of commonly used metrics in
@@ -116,8 +113,9 @@ pak::pak("atsyplenkov/tidyhydro")
116113
## Benchmarking
117114

118115
Since the package uses `Rcpp` in the background, it performs slightly
119-
faster than base R and other R packages. This is particularly noticeable
120-
with large datasets:
116+
faster than base R and other R packages (see
117+
[benchmarks](https://atsyplenkov.github.io/tidyhydro/articles/benchmarks.html)).
118+
This is particularly noticeable with large datasets:
121119

122120
``` r
123121
set.seed(12234)
@@ -142,15 +140,15 @@ bench::mark(
142140
#> # A tibble: 3 × 6
143141
#> expression min median `itr/sec` mem_alloc `gc/sec`
144142
#> <bch:expr> <dbl> <dbl> <dbl> <dbl> <dbl>
145-
#> 1 tidyhydro 1 1 30.3 NaN NaN
146-
#> 2 hydroGOF 21.7 24.3 1 Inf Inf
147-
#> 3 baseR 13.4 15.1 1.94 Inf Inf
143+
#> 1 tidyhydro 1 1 29.2 NaN NaN
144+
#> 2 hydroGOF 15.8 21.2 1 Inf Inf
145+
#> 3 baseR 8.54 10.6 2.32 Inf Inf
148146
```
149147

150148
## See also
151149

152-
- [`hydroGOF`](https://github.com/hzambran/hydroGOF) - Goodness-of-fit
153-
functions for comparison of simulated and observed hydrological time
154-
series.
155-
- [`yardstick`](https://github.com/tidymodels/yardstick/tree/main) -
156-
tidy methods for models performance assessment.
150+
- [`hydroGOF`](https://github.com/hzambran/hydroGOF) - Goodness-of-fit
151+
functions for comparison of simulated and observed hydrological time
152+
series.
153+
- [`yardstick`](https://github.com/tidymodels/yardstick/tree/main) -
154+
tidy methods for models performance assessment.

man/nse.Rd

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

vignettes/articles/benchmarks.qmd

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
---
2+
title: "Benchmarks"
3+
knitr:
4+
opts_chunk:
5+
collapse: true
6+
comment: '#>'
7+
message: false
8+
---
9+
10+
Since `tidyhydro` uses C++ under the hood, it performs slightly faster than similar R packages (like `hydroGOF`). The results are particularly noticeable in large datasets with $N$ observations exceeding 1000.
11+
12+
```{r}
13+
#| label: setup
14+
library(tidyhydro)
15+
library(hydroGOF)
16+
```
17+
18+
# Default dataset `avacha`
19+
20+
```{r}
21+
# NSE
22+
bench::mark(
23+
tidyhydro = nse_vec(truth = avacha$obs, estimate = avacha$sim),
24+
hydroGOF = hydroGOF::NSE(sim = avacha$sim, obs = avacha$obs),
25+
relative = TRUE,
26+
check = TRUE,
27+
iterations = 25L,
28+
filter_gc = FALSE
29+
)
30+
31+
# KGE
32+
bench::mark(
33+
tidyhydro = kge_vec(truth = avacha$obs, estimate = avacha$sim),
34+
hydroGOF = hydroGOF::KGE(sim = avacha$sim, obs = avacha$obs, method = "2009"),
35+
relative = TRUE,
36+
check = TRUE,
37+
iterations = 25L,
38+
filter_gc = FALSE
39+
)
40+
41+
# KGE'
42+
bench::mark(
43+
tidyhydro = kge2012_vec(truth = avacha$obs, estimate = avacha$sim),
44+
hydroGOF = hydroGOF::KGE(sim = avacha$sim, obs = avacha$obs, method = "2012"),
45+
relative = TRUE,
46+
check = TRUE,
47+
iterations = 25L,
48+
filter_gc = FALSE
49+
)
50+
51+
# pBIAS
52+
bench::mark(
53+
tidyhydro = pbias_vec(truth = avacha$obs, estimate = avacha$sim),
54+
hydroGOF = hydroGOF::pbias(sim = avacha$sim, obs = avacha$obs, dec = 9),
55+
relative = TRUE,
56+
check = TRUE,
57+
iterations = 25L,
58+
filter_gc = FALSE
59+
)
60+
61+
# MSE
62+
bench::mark(
63+
tidyhydro = mse_vec(truth = avacha$obs, estimate = avacha$sim),
64+
hydroGOF = hydroGOF::mse(sim = avacha$sim, obs = avacha$obs),
65+
relative = TRUE,
66+
check = TRUE,
67+
iterations = 25L,
68+
filter_gc = FALSE
69+
)
70+
```

vignettes/articles/tidyhydro.qmd

Lines changed: 60 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,19 @@
11
---
22
title: "Getting started"
3+
format:
4+
html:
5+
toc: true
36
knitr:
47
opts_chunk:
58
collapse: true
69
comment: '#>'
10+
message: false
711
---
812

913
```{r}
1014
#| label: setup
1115
#| include: false
16+
library(tidyhydro)
1217
library(ggplot2)
1318
theme_set(
1419
theme_minimal() +
@@ -18,8 +23,23 @@ theme_set(
1823

1924
# Available metrics
2025

21-
# Example data `avacha`
22-
The package comes with the mean daily water discharge values (`obs` in m^3/s) measured at the state gauging station Avacha River — Elizovo City (site No. 2090). Alongside with the measured water discharge, the mean water discharge in the last 24 hours derived from the GloFAS-ERA5 v4.0 reanalysis is provided (`sim`).
26+
In `tidyhydro` v`r packageVersion("tidyhydro")`, `r length(getNamespaceExports("tidyhydro"))/2` metrics are implemented.
27+
28+
| Name | Abbr. | Function calls | Reference |
29+
|----------------|--------|--------|-------------|
30+
| Kling-Gupta Efficiency | $KGE$ | `kge`, `kge_vec` | <span style="font-size: 0.8em;">Gupta, H.V.; Kling, H.; Yilmaz, K.K.; Martinez, G.F. (2009). *Journal of Hydrology*, 377(1–2), 80–91.</span> |
31+
| Modified Kling-Gupta Efficiency | $KGE'$ | `kge2012`, `kge_vec` | <span style="font-size: 0.8em;">Kling, H., Fuchs, M., & Paulin, M. (2012). *Journal of Hydrology*, 424–425, 264–277.</span> |
32+
| Nash-Sutcliffe Efficiency | $NSE$ | `nse`, `nse_vec` | <span style="font-size: 0.8em;">Nash, J. E., & Sutcliffe, J. V. (1970). *Journal of Hydrology*, 10(3), 282–290.</span> |
33+
| Mean Squared Error | $MSE$ | `mse`, `mse_vec` | <span style="font-size: 0.8em;">Clark, M. P., Vogel, R. M., Lamontagne, J. R., Mizukami, N., Knoben, W. J. M., Tang, G., Gharari, S., Freer, J. E., Whitfield, P. H., Shook, K. R., & Papalexiou, S. M. (2021). The Abuse of Popular Performance Metrics in Hydrologic Modeling. Water Resources Research, 57(9), e2020WR029001.</span> |
34+
| Percent BIAS | $pBIAS$ | `pbias`, `pbias_vec` | <span style="font-size: 0.8em;">Gupta, H. V., S. Sorooshian, and P. O. Yapo. (1999). Status of automatic calibration for hydrologic models: Comparison with multilevel expert calibration. J. Hydrologic Eng. 4(2): 135-143 </span> |
35+
| PRediction Error Sum of Squares | $PRESS$ | `press`, `press_vec` | <span style="font-size: 0.8em;"> Rasmussen, P. P., Gray, J. R., Glysson, G. D. & Ziegler, A. C. Guidelines and procedures for computing time-series suspended-sediment concentrations and loads from in-stream turbidity-sensor and streamflow data. in U.S. Geological Survey Techniques and Methods book 3, chap. C4 53 (2009)</span> |
36+
| Standard Factorial Error | $SFE$ | `sfe`, `sfe_vec` | <span style="font-size: 0.8em;"> Herschy, R.W. 1978: Accuracy. Chapter 10 In: Herschy, R.W. (ed.) Hydrometry - principles and practices. John Wiley and Sons, Chichester, 511 p.</span> |
37+
38+
: Metrics currently implemented in `tidyhydro` v`r packageVersion("tidyhydro")`
39+
40+
# `avacha` dataset
41+
42+
The package includes the mean daily water discharge values (`obs` in m³/s) measured at the state gauging station Avacha River — Elizovo City (site No. 2090). Alongside the measured water discharge, the mean water discharge in the last 24 hours derived from the [GloFAS-ERA5 v4.0](https://confluence.ecmwf.int/display/CEMS/GloFAS+v4.0) reanalysis is provided (`sim`).
2343

2444
```{r}
2545
#| fig-cap: Avacha River - Elizovo City hydrograph
@@ -29,9 +49,43 @@ data(avacha)
2949
3050
avacha |>
3151
ggplot(aes(x = date)) +
32-
geom_line(aes(y = obs, color = "Measured")) +
33-
geom_line(aes(y = sim, color = "Predicted")) +
34-
scale_color_brewer(name = "", palette = "Set1") +
35-
labs(x = "", y = "Water Discharge, m3/s")
52+
geom_line(aes(y = obs, colour = "Measured")) +
53+
geom_line(aes(y = sim, colour = "Predicted")) +
54+
scale_colour_brewer(name = "", palette = "Set1") +
55+
labs(x = "", y = "Water Discharge, m³/s")
56+
```
57+
58+
# Example usage
59+
60+
One can estimate the desired metrics using the `tidyverse` [syntax](https://style.tidyverse.org/). For example, to get the Nash-Sutcliffe Efficiency ($NSE$) or Modified Kling-Gupta Efficiency ($KGE'$) for the `avacha` dataset, one can run:
61+
62+
```{r}
63+
nse(avacha, obs, sim)
64+
kge2012(avacha, obs, sim)
65+
```
3666

67+
Or using the `yardstick` helper functions, one can create a metric set, combining it with other `yardstick` metrics, such as $R^2$:
68+
69+
```{r}
70+
library(yardstick)
71+
hydro_metrics <- metric_set(kge, pbias, rsq)
72+
hydro_metrics(avacha, obs, sim)
3773
```
74+
75+
Such syntax is particularly useful when running a group analysis, for example, estimating model performance for different months:
76+
77+
```{r}
78+
library(lubridate)
79+
library(dplyr)
80+
81+
avacha |>
82+
mutate(month = month(date)) |>
83+
group_by(month) |>
84+
nse(obs, sim)
85+
```
86+
87+
Alternatively, one can still use the vectorised versions of the metrics, ending with the `*_vec` suffix:
88+
89+
```{r}
90+
nse_vec(truth = avacha$obs, estimate = avacha$sim)
91+
```

0 commit comments

Comments
 (0)