Change density default bw = "nrd0" to bw = "sj"#5854
Conversation
|
I'm not sure I think this is worth the breaking change to be honest... I probably miss a compelling example of when it would improve a visualisation |
|
I don't know enough about these bandwidth estimators to make a qualified judgement. However, this post shares some thoughts. library(ggplot2)
set.seed(0)
df <- data.frame(x = c(
runif(20, 9, 11),
runif(10, 19, 21),
runif(20, 29, 31),
runif(30, 39, 41)
))
ggplot(df, aes(x)) +
geom_density(bw = "nrd0") +
geom_rug() +
ggtitle("bw = 'nrd0'")Whereas ggplot(df, aes(x)) +
geom_density(bw = "sj") +
geom_rug() +
ggtitle("bw = 'sj'")Counter-argument might be that if you have some exponential data, df <- data.frame(x = rexp(1000))
ggplot(df, aes(x)) +
geom_density(bw = "nrd0", bounds = c(0, Inf)) +
geom_rug() +
ggtitle("bw = 'nrd0'")Whereas ggplot(df, aes(x)) +
geom_density(bw = "sj", bounds = c(0, Inf)) +
geom_rug() +
ggtitle("bw = 'sj'")Created on 2024-05-20 with reprex v2.1.0 It is probably a trade-off that I cannot motivate very well. |
|
I could perhaps dig up some examples if there is interest. I changed the ggdist default to the equivalent of |
|
For example set.seed(20240522)
data.frame(x = rlnorm(10000)) |>
ggplot() +
geom_density(aes(x, color = "sj"), bw = "sj", bounds = c(0, Inf), n = 1000) +
geom_density(aes(x, color = "nrd0"), bw = "nrd0", bounds = c(0, Inf), n = 1000) +
geom_function(fun = dlnorm, n = 1000, linetype = "22") +
coord_cartesian(xlim = c(0,5))It is worth pointing out that density(rep(1, 10), bw = "SJ")
## Error in bw.SJ(x, method = "ste") : sample is too sparse to find TDggdist handles this by falling back to |
|
Thanks @mjskay, I appreciate your thoughts on the matter. I tend to agree with you and the authors of |





This PR aims to fix #3825.
Briefly,
?densityrecommends the"sj"method over the"nrd0"default. This change propagates that recommendation to ggplot2's density calculations.Some details:
calc_bw()andprecompute_bw()were doing the exact same task, so I merged these.var(x) == 0, most of thestats::bw.*()functions throw an error, so we keepbw.nrd0()for these degenerate cases because it will not throw an error.