Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,6 @@ Suggests:
pkgdown,
pkgload,
knitr,
plotly,
ggraph,
callr
VignetteBuilder: quarto
Expand Down
5 changes: 5 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,11 @@ ggRandomForests v3.1.0
prose deepened with the same framing; one-line code-comment fixes;
fixed a stale `@return` in `gg_roc()` (documented a `yvar` column the
function does not return). No user-facing behaviour change.
* Vignettes: the regression and survival partial-dependence surfaces are
now rendered as static `ggplot2` heat maps instead of interactive
`plotly` widgets, and figures render at 96 dpi. This cuts the installed
size from ~17 MB to ~5 MB (the `plotly` library is no longer bundled into
the vignette HTML). `plotly` is dropped from `Suggests`.

ggRandomForests v3.0.0
======================
Expand Down
48 changes: 12 additions & 36 deletions vignettes/ggRandomForests-regression.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ author: "John Ehrlinger"
date: today
format:
html:
fig-format: png
fig-dpi: 96
toc: true
toc-depth: 3
html-math-method: mathjax
Expand Down Expand Up @@ -56,8 +58,8 @@ Boston Housing data set [@Harrison:1978; @Belsley:1980]:
3. **Variable selection** --- VIMP and minimal depth via `max.subtree()`
4. **Dependence plots** --- variable dependence and partial dependence via
`gg_variable()` and `gg_partial_rfsrc()`
5. **Variable interactions** --- conditioning plots and interactive 3-D partial
dependence surfaces with **plotly**
5. **Variable interactions** --- conditioning plots and partial dependence
surfaces

```{r packages}
library(ggplot2)
Expand Down Expand Up @@ -346,7 +348,7 @@ plot(gg_v, xvar = "rm", alpha = 0.5) +
The `rm` effect is strongest in low-`lstat` tracts (bottom-left panels) and
nearly flat in high-`lstat` tracts, confirming a meaningful interaction.

# Interactive Partial Dependence Surface
# Partial Dependence Surface

To visualize the joint partial dependence of `medv` on `lstat` and `rm`, we
compute partial dependence on a grid: 25 values of `rm`, each evaluated at 25
Expand All @@ -367,38 +369,12 @@ surface_list <- lapply(rm_grid, function(rm_val) {
surface_df <- bind_rows(surface_list)
```

```{r plotly-surface, fig.cap="Interactive partial dependence surface: median home value as a function of lstat and rm."}
if (requireNamespace("plotly", quietly = TRUE)) {
library(plotly)

surface_wide <- surface_df |>
select(lstat = x, rm, medv = yhat) |>
arrange(rm, lstat)

lstat_vals <- sort(unique(surface_wide$lstat))
rm_vals <- sort(unique(surface_wide$rm))
z_matrix <- matrix(surface_wide$medv,
nrow = length(rm_vals),
ncol = length(lstat_vals),
byrow = TRUE)

plot_ly(x = lstat_vals, y = rm_vals, z = z_matrix) |>
add_surface(colorscale = "Viridis", showscale = TRUE) |>
layout(
scene = list(
xaxis = list(title = "Lower Status (%)"),
yaxis = list(title = "Rooms per Dwelling"),
zaxis = list(title = "Median Value ($1000s)")
)
)
} else {
message("Install the plotly package for interactive 3D surfaces.")
ggplot(surface_df, aes(x = x, y = rm, fill = yhat)) +
geom_tile() +
scale_fill_viridis_c(name = "Median Value") +
labs(x = "Lower Status (%)", y = "Rooms per Dwelling") +
theme_bw()
}
```{r pd-surface, fig.cap="Partial dependence surface: median home value as a function of lstat and rm. Fill colour is the predicted median value."}
ggplot(surface_df, aes(x = x, y = rm, fill = yhat)) +
geom_tile() +
scale_fill_viridis_c(name = "Median Value\n($1000s)") +
labs(x = "Lower Status (%)", y = "Rooms per Dwelling") +
theme_bw()
```

The surface confirms the strong interaction: home values are highest when `lstat`
Expand All @@ -419,7 +395,7 @@ We have walked a full random forest regression analysis with
same shapes the raw-data EDA hinted at.
- Partial dependence from `gg_partial_rfsrc()` gave the risk-adjusted version
of those curves: concave for `lstat`, threshold-like for `rm`.
- Conditioning plots and the interactive surface pulled out the `lstat`--`rm`
- Conditioning plots and the partial dependence surface pulled out the `lstat`--`rm`
interaction, with the room-size effect strongest in high-status tracts.

Notice the pattern in all of this. Each `gg_*()` function returns a tidy
Expand Down
45 changes: 10 additions & 35 deletions vignettes/ggRandomForests-survival.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ author: "John Ehrlinger"
date: today
format:
html:
fig-format: png
fig-dpi: 96
toc: true
toc-depth: 3
html-math-method: mathjax
Expand Down Expand Up @@ -68,8 +70,8 @@ primary biliary cirrhosis (PBC) data set [@fleming:1991]:
3. **Variable selection** --- VIMP and minimal depth via `max.subtree()`
4. **Dependence plots** --- variable dependence and partial dependence via
`gg_variable()` and `gg_partial_rfsrc()`
5. **Variable interactions** --- conditioning plots and interactive 3-D partial
dependence surfaces with **plotly**
5. **Variable interactions** --- conditioning plots and partial dependence
surfaces

```{r packages}
library(ggplot2)
Expand Down Expand Up @@ -452,7 +454,7 @@ survival.
::: {.callout-warning}
**Known issue (draft):** `randomForestSRC::partial.rfsrc()` currently fails for
survival forests in randomForestSRC ≥ 3.3. The partial dependence and
interactive surface sections below will show an error until this upstream bug
surface sections below will show an error until this upstream bug
is resolved. All other sections of this vignette are fully functional.
:::

Expand Down Expand Up @@ -540,7 +542,7 @@ plot(gg_v1, xvar = "bili", alpha = 0.5) +
The effect of bilirubin attenuates at higher albumin levels, suggesting an
interaction between these two liver function markers.

# Interactive Partial Dependence Surfaces
# Partial Dependence Surfaces

For a richer view of the interaction between bilirubin and albumin, we construct
a partial dependence surface. We compute partial dependence on a grid of 25
Expand All @@ -564,37 +566,10 @@ surface_list <- lapply(alb_grid, function(alb_val) {
surface_df <- bind_rows(surface_list)
```

```{r plotly-surface, error=TRUE, fig.cap="Interactive partial dependence surface: survival as a function of bilirubin and albumin."}
```{r pd-surface, error=TRUE, fig.height=5, fig.cap="Partial dependence surface: survival at 1 year as a function of bilirubin and albumin. Fill colour is the predicted survival probability."}
if (!exists("surface_df")) {
message("surface_df not available — skipping plotly surface (see surface-data chunk error above).")
} else if (requireNamespace("plotly", quietly = TRUE)) {
# Reshape for surface
library(plotly)

surface_wide <- surface_df |>
select(bili = x, albumin, survival = yhat) |>
arrange(albumin, bili)

# Create matrix form
bili_vals <- sort(unique(surface_wide$bili))
alb_vals <- sort(unique(surface_wide$albumin))
z_matrix <- matrix(surface_wide$survival,
nrow = length(alb_vals),
ncol = length(bili_vals),
byrow = TRUE)

plot_ly(x = bili_vals, y = alb_vals, z = z_matrix) |>
add_surface(colorscale = "Viridis", showscale = TRUE) |>
layout(
scene = list(
xaxis = list(title = "Bilirubin"),
yaxis = list(title = "Albumin"),
zaxis = list(title = "Survival")
)
)
message("surface_df not available --- skipping surface (see surface-data chunk error above).")
} else {
message("Install the plotly package for interactive 3D surfaces.")
# Fallback: contour plot with ggplot2
ggplot(surface_df, aes(x = x, y = albumin, fill = yhat)) +
geom_tile() +
scale_fill_viridis_c(name = "Survival") +
Expand All @@ -605,7 +580,7 @@ if (!exists("surface_df")) {

The surface shows that survival is highest when bilirubin is low and albumin is
high (upper-left corner), and drops steeply as bilirubin increases or albumin
decreases. The non-planar shape of the surface --- particularly the steep
decreases. The curvature of the surface --- particularly the steep
gradient at low albumin and high bilirubin --- confirms the interaction detected
in the conditional plots.

Expand Down Expand Up @@ -685,7 +660,7 @@ We have walked a full random survival forest analysis with
where the gap between time horizons widens.
- Partial dependence from `gg_partial_rfsrc()` gave the risk-adjusted version
of those curves and backed the log-transforms used in the parametric model.
- Conditioning plots and the interactive surface drew out the
- Conditioning plots and the partial dependence surface drew out the
bilirubin--albumin interaction.
- `gg_brier()` measured how accurate the predictions actually were, both
across time and as a single CRPS summary.
Expand Down
2 changes: 2 additions & 0 deletions vignettes/ggRandomForests.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ author: "John Ehrlinger"
date: today
format:
html:
fig-format: png
fig-dpi: 96
toc: true
html-math-method: mathjax
editor:
Expand Down
2 changes: 2 additions & 0 deletions vignettes/varpro.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ author: "John Ehrlinger"
date: today
format:
html:
fig-format: png
fig-dpi: 96
toc: true
toc-depth: 3
html-math-method: mathjax
Expand Down
Loading