From 2e09c4ea3ae881836b8904e350b6b4b131d462c4 Mon Sep 17 00:00:00 2001 From: John Ehrlinger Date: Wed, 10 Jun 2026 10:28:32 -0400 Subject: [PATCH] fix(vignettes): static PD surfaces + 96-dpi figures to cut install size The regression and survival partial-dependence surfaces were interactive plotly widgets; self-contained quarto inlined plotly.js (~3.5 MB) into each vignette HTML, and figures rendered at retina 2x. Installed size was 17.1 MB (doc 16.3 MB), well over CRAN's 5 MB guideline. Replace both surfaces with static ggplot2 heat maps, set fig-format png / fig-dpi 96 in all four vignettes, and drop the now-unused plotly Suggests. Installed size drops to ~5.5 MB (doc 4.7 MB); source tarball 9.0 -> 3.7 MB. R CMD check --as-cran (with manual, ggraph present): pending confirmation. Co-Authored-By: Claude Opus 4.8 --- DESCRIPTION | 1 - NEWS.md | 5 +++ vignettes/ggRandomForests-regression.qmd | 48 ++++++------------------ vignettes/ggRandomForests-survival.qmd | 45 +++++----------------- vignettes/ggRandomForests.qmd | 2 + vignettes/varpro.qmd | 2 + 6 files changed, 31 insertions(+), 72 deletions(-) diff --git a/DESCRIPTION b/DESCRIPTION index 0b19ec4c..8a25f12c 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -44,7 +44,6 @@ Suggests: pkgdown, pkgload, knitr, - plotly, ggraph, callr VignetteBuilder: quarto diff --git a/NEWS.md b/NEWS.md index d982a256..ea002a0e 100644 --- a/NEWS.md +++ b/NEWS.md @@ -17,6 +17,11 @@ ggRandomForests v3.1.0 prose deepened with the same framing; one-line code-comment fixes; fixed a stale `@return` in `gg_roc()` (documented a `yvar` column the function does not return). No user-facing behaviour change. +* Vignettes: the regression and survival partial-dependence surfaces are + now rendered as static `ggplot2` heat maps instead of interactive + `plotly` widgets, and figures render at 96 dpi. This cuts the installed + size from ~17 MB to ~5 MB (the `plotly` library is no longer bundled into + the vignette HTML). `plotly` is dropped from `Suggests`. ggRandomForests v3.0.0 ====================== diff --git a/vignettes/ggRandomForests-regression.qmd b/vignettes/ggRandomForests-regression.qmd index 04d6808a..60639279 100644 --- a/vignettes/ggRandomForests-regression.qmd +++ b/vignettes/ggRandomForests-regression.qmd @@ -4,6 +4,8 @@ author: "John Ehrlinger" date: today format: html: + fig-format: png + fig-dpi: 96 toc: true toc-depth: 3 html-math-method: mathjax @@ -56,8 +58,8 @@ Boston Housing data set [@Harrison:1978; @Belsley:1980]: 3. **Variable selection** --- VIMP and minimal depth via `max.subtree()` 4. **Dependence plots** --- variable dependence and partial dependence via `gg_variable()` and `gg_partial_rfsrc()` -5. **Variable interactions** --- conditioning plots and interactive 3-D partial - dependence surfaces with **plotly** +5. **Variable interactions** --- conditioning plots and partial dependence + surfaces ```{r packages} library(ggplot2) @@ -346,7 +348,7 @@ plot(gg_v, xvar = "rm", alpha = 0.5) + The `rm` effect is strongest in low-`lstat` tracts (bottom-left panels) and nearly flat in high-`lstat` tracts, confirming a meaningful interaction. -# Interactive Partial Dependence Surface +# Partial Dependence Surface To visualize the joint partial dependence of `medv` on `lstat` and `rm`, we compute partial dependence on a grid: 25 values of `rm`, each evaluated at 25 @@ -367,38 +369,12 @@ surface_list <- lapply(rm_grid, function(rm_val) { surface_df <- bind_rows(surface_list) ``` -```{r plotly-surface, fig.cap="Interactive partial dependence surface: median home value as a function of lstat and rm."} -if (requireNamespace("plotly", quietly = TRUE)) { - library(plotly) - - surface_wide <- surface_df |> - select(lstat = x, rm, medv = yhat) |> - arrange(rm, lstat) - - lstat_vals <- sort(unique(surface_wide$lstat)) - rm_vals <- sort(unique(surface_wide$rm)) - z_matrix <- matrix(surface_wide$medv, - nrow = length(rm_vals), - ncol = length(lstat_vals), - byrow = TRUE) - - plot_ly(x = lstat_vals, y = rm_vals, z = z_matrix) |> - add_surface(colorscale = "Viridis", showscale = TRUE) |> - layout( - scene = list( - xaxis = list(title = "Lower Status (%)"), - yaxis = list(title = "Rooms per Dwelling"), - zaxis = list(title = "Median Value ($1000s)") - ) - ) -} else { - message("Install the plotly package for interactive 3D surfaces.") - ggplot(surface_df, aes(x = x, y = rm, fill = yhat)) + - geom_tile() + - scale_fill_viridis_c(name = "Median Value") + - labs(x = "Lower Status (%)", y = "Rooms per Dwelling") + - theme_bw() -} +```{r pd-surface, fig.cap="Partial dependence surface: median home value as a function of lstat and rm. Fill colour is the predicted median value."} +ggplot(surface_df, aes(x = x, y = rm, fill = yhat)) + + geom_tile() + + scale_fill_viridis_c(name = "Median Value\n($1000s)") + + labs(x = "Lower Status (%)", y = "Rooms per Dwelling") + + theme_bw() ``` The surface confirms the strong interaction: home values are highest when `lstat` @@ -419,7 +395,7 @@ We have walked a full random forest regression analysis with same shapes the raw-data EDA hinted at. - Partial dependence from `gg_partial_rfsrc()` gave the risk-adjusted version of those curves: concave for `lstat`, threshold-like for `rm`. -- Conditioning plots and the interactive surface pulled out the `lstat`--`rm` +- Conditioning plots and the partial dependence surface pulled out the `lstat`--`rm` interaction, with the room-size effect strongest in high-status tracts. Notice the pattern in all of this. Each `gg_*()` function returns a tidy diff --git a/vignettes/ggRandomForests-survival.qmd b/vignettes/ggRandomForests-survival.qmd index e0dc7a9b..a42f2bdd 100644 --- a/vignettes/ggRandomForests-survival.qmd +++ b/vignettes/ggRandomForests-survival.qmd @@ -4,6 +4,8 @@ author: "John Ehrlinger" date: today format: html: + fig-format: png + fig-dpi: 96 toc: true toc-depth: 3 html-math-method: mathjax @@ -68,8 +70,8 @@ primary biliary cirrhosis (PBC) data set [@fleming:1991]: 3. **Variable selection** --- VIMP and minimal depth via `max.subtree()` 4. **Dependence plots** --- variable dependence and partial dependence via `gg_variable()` and `gg_partial_rfsrc()` -5. **Variable interactions** --- conditioning plots and interactive 3-D partial - dependence surfaces with **plotly** +5. **Variable interactions** --- conditioning plots and partial dependence + surfaces ```{r packages} library(ggplot2) @@ -452,7 +454,7 @@ survival. ::: {.callout-warning} **Known issue (draft):** `randomForestSRC::partial.rfsrc()` currently fails for survival forests in randomForestSRC ≥ 3.3. The partial dependence and -interactive surface sections below will show an error until this upstream bug +surface sections below will show an error until this upstream bug is resolved. All other sections of this vignette are fully functional. ::: @@ -540,7 +542,7 @@ plot(gg_v1, xvar = "bili", alpha = 0.5) + The effect of bilirubin attenuates at higher albumin levels, suggesting an interaction between these two liver function markers. -# Interactive Partial Dependence Surfaces +# Partial Dependence Surfaces For a richer view of the interaction between bilirubin and albumin, we construct a partial dependence surface. We compute partial dependence on a grid of 25 @@ -564,37 +566,10 @@ surface_list <- lapply(alb_grid, function(alb_val) { surface_df <- bind_rows(surface_list) ``` -```{r plotly-surface, error=TRUE, fig.cap="Interactive partial dependence surface: survival as a function of bilirubin and albumin."} +```{r pd-surface, error=TRUE, fig.height=5, fig.cap="Partial dependence surface: survival at 1 year as a function of bilirubin and albumin. Fill colour is the predicted survival probability."} if (!exists("surface_df")) { - message("surface_df not available — skipping plotly surface (see surface-data chunk error above).") -} else if (requireNamespace("plotly", quietly = TRUE)) { - # Reshape for surface - library(plotly) - - surface_wide <- surface_df |> - select(bili = x, albumin, survival = yhat) |> - arrange(albumin, bili) - - # Create matrix form - bili_vals <- sort(unique(surface_wide$bili)) - alb_vals <- sort(unique(surface_wide$albumin)) - z_matrix <- matrix(surface_wide$survival, - nrow = length(alb_vals), - ncol = length(bili_vals), - byrow = TRUE) - - plot_ly(x = bili_vals, y = alb_vals, z = z_matrix) |> - add_surface(colorscale = "Viridis", showscale = TRUE) |> - layout( - scene = list( - xaxis = list(title = "Bilirubin"), - yaxis = list(title = "Albumin"), - zaxis = list(title = "Survival") - ) - ) + message("surface_df not available --- skipping surface (see surface-data chunk error above).") } else { - message("Install the plotly package for interactive 3D surfaces.") - # Fallback: contour plot with ggplot2 ggplot(surface_df, aes(x = x, y = albumin, fill = yhat)) + geom_tile() + scale_fill_viridis_c(name = "Survival") + @@ -605,7 +580,7 @@ if (!exists("surface_df")) { The surface shows that survival is highest when bilirubin is low and albumin is high (upper-left corner), and drops steeply as bilirubin increases or albumin -decreases. The non-planar shape of the surface --- particularly the steep +decreases. The curvature of the surface --- particularly the steep gradient at low albumin and high bilirubin --- confirms the interaction detected in the conditional plots. @@ -685,7 +660,7 @@ We have walked a full random survival forest analysis with where the gap between time horizons widens. - Partial dependence from `gg_partial_rfsrc()` gave the risk-adjusted version of those curves and backed the log-transforms used in the parametric model. -- Conditioning plots and the interactive surface drew out the +- Conditioning plots and the partial dependence surface drew out the bilirubin--albumin interaction. - `gg_brier()` measured how accurate the predictions actually were, both across time and as a single CRPS summary. diff --git a/vignettes/ggRandomForests.qmd b/vignettes/ggRandomForests.qmd index aa113531..cfac0415 100644 --- a/vignettes/ggRandomForests.qmd +++ b/vignettes/ggRandomForests.qmd @@ -4,6 +4,8 @@ author: "John Ehrlinger" date: today format: html: + fig-format: png + fig-dpi: 96 toc: true html-math-method: mathjax editor: diff --git a/vignettes/varpro.qmd b/vignettes/varpro.qmd index e3fc0533..389dd194 100644 --- a/vignettes/varpro.qmd +++ b/vignettes/varpro.qmd @@ -4,6 +4,8 @@ author: "John Ehrlinger" date: today format: html: + fig-format: png + fig-dpi: 96 toc: true toc-depth: 3 html-math-method: mathjax