Skip to content

Commit db71424

Browse files
ehrlingerclaude
andauthored
fix(vignettes): static PD surfaces + 96-dpi figures to cut install size (#110)
The regression and survival partial-dependence surfaces were interactive plotly widgets; self-contained quarto inlined plotly.js (~3.5 MB) into each vignette HTML, and figures rendered at retina 2x. Installed size was 17.1 MB (doc 16.3 MB), well over CRAN's 5 MB guideline. Replace both surfaces with static ggplot2 heat maps, set fig-format png / fig-dpi 96 in all four vignettes, and drop the now-unused plotly Suggests. Installed size drops to ~5.5 MB (doc 4.7 MB); source tarball 9.0 -> 3.7 MB. R CMD check --as-cran (with manual, ggraph present): pending confirmation. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
1 parent 8873932 commit db71424

6 files changed

Lines changed: 31 additions & 72 deletions

File tree

DESCRIPTION

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,6 @@ Suggests:
4444
pkgdown,
4545
pkgload,
4646
knitr,
47-
plotly,
4847
ggraph,
4948
callr
5049
VignetteBuilder: quarto

NEWS.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,11 @@ ggRandomForests v3.1.0
1717
prose deepened with the same framing; one-line code-comment fixes;
1818
fixed a stale `@return` in `gg_roc()` (documented a `yvar` column the
1919
function does not return). No user-facing behaviour change.
20+
* Vignettes: the regression and survival partial-dependence surfaces are
21+
now rendered as static `ggplot2` heat maps instead of interactive
22+
`plotly` widgets, and figures render at 96 dpi. This cuts the installed
23+
size from ~17 MB to ~5 MB (the `plotly` library is no longer bundled into
24+
the vignette HTML). `plotly` is dropped from `Suggests`.
2025

2126
ggRandomForests v3.0.0
2227
======================

vignettes/ggRandomForests-regression.qmd

Lines changed: 12 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,8 @@ author: "John Ehrlinger"
44
date: today
55
format:
66
html:
7+
fig-format: png
8+
fig-dpi: 96
79
toc: true
810
toc-depth: 3
911
html-math-method: mathjax
@@ -56,8 +58,8 @@ Boston Housing data set [@Harrison:1978; @Belsley:1980]:
5658
3. **Variable selection** --- VIMP and minimal depth via `max.subtree()`
5759
4. **Dependence plots** --- variable dependence and partial dependence via
5860
`gg_variable()` and `gg_partial_rfsrc()`
59-
5. **Variable interactions** --- conditioning plots and interactive 3-D partial
60-
dependence surfaces with **plotly**
61+
5. **Variable interactions** --- conditioning plots and partial dependence
62+
surfaces
6163

6264
```{r packages}
6365
library(ggplot2)
@@ -346,7 +348,7 @@ plot(gg_v, xvar = "rm", alpha = 0.5) +
346348
The `rm` effect is strongest in low-`lstat` tracts (bottom-left panels) and
347349
nearly flat in high-`lstat` tracts, confirming a meaningful interaction.
348350

349-
# Interactive Partial Dependence Surface
351+
# Partial Dependence Surface
350352

351353
To visualize the joint partial dependence of `medv` on `lstat` and `rm`, we
352354
compute partial dependence on a grid: 25 values of `rm`, each evaluated at 25
@@ -367,38 +369,12 @@ surface_list <- lapply(rm_grid, function(rm_val) {
367369
surface_df <- bind_rows(surface_list)
368370
```
369371

370-
```{r plotly-surface, fig.cap="Interactive partial dependence surface: median home value as a function of lstat and rm."}
371-
if (requireNamespace("plotly", quietly = TRUE)) {
372-
library(plotly)
373-
374-
surface_wide <- surface_df |>
375-
select(lstat = x, rm, medv = yhat) |>
376-
arrange(rm, lstat)
377-
378-
lstat_vals <- sort(unique(surface_wide$lstat))
379-
rm_vals <- sort(unique(surface_wide$rm))
380-
z_matrix <- matrix(surface_wide$medv,
381-
nrow = length(rm_vals),
382-
ncol = length(lstat_vals),
383-
byrow = TRUE)
384-
385-
plot_ly(x = lstat_vals, y = rm_vals, z = z_matrix) |>
386-
add_surface(colorscale = "Viridis", showscale = TRUE) |>
387-
layout(
388-
scene = list(
389-
xaxis = list(title = "Lower Status (%)"),
390-
yaxis = list(title = "Rooms per Dwelling"),
391-
zaxis = list(title = "Median Value ($1000s)")
392-
)
393-
)
394-
} else {
395-
message("Install the plotly package for interactive 3D surfaces.")
396-
ggplot(surface_df, aes(x = x, y = rm, fill = yhat)) +
397-
geom_tile() +
398-
scale_fill_viridis_c(name = "Median Value") +
399-
labs(x = "Lower Status (%)", y = "Rooms per Dwelling") +
400-
theme_bw()
401-
}
372+
```{r pd-surface, fig.cap="Partial dependence surface: median home value as a function of lstat and rm. Fill colour is the predicted median value."}
373+
ggplot(surface_df, aes(x = x, y = rm, fill = yhat)) +
374+
geom_tile() +
375+
scale_fill_viridis_c(name = "Median Value\n($1000s)") +
376+
labs(x = "Lower Status (%)", y = "Rooms per Dwelling") +
377+
theme_bw()
402378
```
403379

404380
The surface confirms the strong interaction: home values are highest when `lstat`
@@ -419,7 +395,7 @@ We have walked a full random forest regression analysis with
419395
same shapes the raw-data EDA hinted at.
420396
- Partial dependence from `gg_partial_rfsrc()` gave the risk-adjusted version
421397
of those curves: concave for `lstat`, threshold-like for `rm`.
422-
- Conditioning plots and the interactive surface pulled out the `lstat`--`rm`
398+
- Conditioning plots and the partial dependence surface pulled out the `lstat`--`rm`
423399
interaction, with the room-size effect strongest in high-status tracts.
424400

425401
Notice the pattern in all of this. Each `gg_*()` function returns a tidy

vignettes/ggRandomForests-survival.qmd

Lines changed: 10 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,8 @@ author: "John Ehrlinger"
44
date: today
55
format:
66
html:
7+
fig-format: png
8+
fig-dpi: 96
79
toc: true
810
toc-depth: 3
911
html-math-method: mathjax
@@ -68,8 +70,8 @@ primary biliary cirrhosis (PBC) data set [@fleming:1991]:
6870
3. **Variable selection** --- VIMP and minimal depth via `max.subtree()`
6971
4. **Dependence plots** --- variable dependence and partial dependence via
7072
`gg_variable()` and `gg_partial_rfsrc()`
71-
5. **Variable interactions** --- conditioning plots and interactive 3-D partial
72-
dependence surfaces with **plotly**
73+
5. **Variable interactions** --- conditioning plots and partial dependence
74+
surfaces
7375

7476
```{r packages}
7577
library(ggplot2)
@@ -452,7 +454,7 @@ survival.
452454
::: {.callout-warning}
453455
**Known issue (draft):** `randomForestSRC::partial.rfsrc()` currently fails for
454456
survival forests in randomForestSRC ≥ 3.3. The partial dependence and
455-
interactive surface sections below will show an error until this upstream bug
457+
surface sections below will show an error until this upstream bug
456458
is resolved. All other sections of this vignette are fully functional.
457459
:::
458460

@@ -540,7 +542,7 @@ plot(gg_v1, xvar = "bili", alpha = 0.5) +
540542
The effect of bilirubin attenuates at higher albumin levels, suggesting an
541543
interaction between these two liver function markers.
542544

543-
# Interactive Partial Dependence Surfaces
545+
# Partial Dependence Surfaces
544546

545547
For a richer view of the interaction between bilirubin and albumin, we construct
546548
a partial dependence surface. We compute partial dependence on a grid of 25
@@ -564,37 +566,10 @@ surface_list <- lapply(alb_grid, function(alb_val) {
564566
surface_df <- bind_rows(surface_list)
565567
```
566568

567-
```{r plotly-surface, error=TRUE, fig.cap="Interactive partial dependence surface: survival as a function of bilirubin and albumin."}
569+
```{r pd-surface, error=TRUE, fig.height=5, fig.cap="Partial dependence surface: survival at 1 year as a function of bilirubin and albumin. Fill colour is the predicted survival probability."}
568570
if (!exists("surface_df")) {
569-
message("surface_df not available — skipping plotly surface (see surface-data chunk error above).")
570-
} else if (requireNamespace("plotly", quietly = TRUE)) {
571-
# Reshape for surface
572-
library(plotly)
573-
574-
surface_wide <- surface_df |>
575-
select(bili = x, albumin, survival = yhat) |>
576-
arrange(albumin, bili)
577-
578-
# Create matrix form
579-
bili_vals <- sort(unique(surface_wide$bili))
580-
alb_vals <- sort(unique(surface_wide$albumin))
581-
z_matrix <- matrix(surface_wide$survival,
582-
nrow = length(alb_vals),
583-
ncol = length(bili_vals),
584-
byrow = TRUE)
585-
586-
plot_ly(x = bili_vals, y = alb_vals, z = z_matrix) |>
587-
add_surface(colorscale = "Viridis", showscale = TRUE) |>
588-
layout(
589-
scene = list(
590-
xaxis = list(title = "Bilirubin"),
591-
yaxis = list(title = "Albumin"),
592-
zaxis = list(title = "Survival")
593-
)
594-
)
571+
message("surface_df not available --- skipping surface (see surface-data chunk error above).")
595572
} else {
596-
message("Install the plotly package for interactive 3D surfaces.")
597-
# Fallback: contour plot with ggplot2
598573
ggplot(surface_df, aes(x = x, y = albumin, fill = yhat)) +
599574
geom_tile() +
600575
scale_fill_viridis_c(name = "Survival") +
@@ -605,7 +580,7 @@ if (!exists("surface_df")) {
605580

606581
The surface shows that survival is highest when bilirubin is low and albumin is
607582
high (upper-left corner), and drops steeply as bilirubin increases or albumin
608-
decreases. The non-planar shape of the surface --- particularly the steep
583+
decreases. The curvature of the surface --- particularly the steep
609584
gradient at low albumin and high bilirubin --- confirms the interaction detected
610585
in the conditional plots.
611586

@@ -685,7 +660,7 @@ We have walked a full random survival forest analysis with
685660
where the gap between time horizons widens.
686661
- Partial dependence from `gg_partial_rfsrc()` gave the risk-adjusted version
687662
of those curves and backed the log-transforms used in the parametric model.
688-
- Conditioning plots and the interactive surface drew out the
663+
- Conditioning plots and the partial dependence surface drew out the
689664
bilirubin--albumin interaction.
690665
- `gg_brier()` measured how accurate the predictions actually were, both
691666
across time and as a single CRPS summary.

vignettes/ggRandomForests.qmd

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,8 @@ author: "John Ehrlinger"
44
date: today
55
format:
66
html:
7+
fig-format: png
8+
fig-dpi: 96
79
toc: true
810
html-math-method: mathjax
911
editor:

vignettes/varpro.qmd

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,8 @@ author: "John Ehrlinger"
44
date: today
55
format:
66
html:
7+
fig-format: png
8+
fig-dpi: 96
79
toc: true
810
toc-depth: 3
911
html-math-method: mathjax

0 commit comments

Comments
 (0)