ehrlinger
diff --git a/‎CRAN-SUBMISSION‎
Lines changed: 3 additions & 3 deletions b/‎CRAN-SUBMISSION‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎DESCRIPTION‎
Lines changed: 2 additions & 3 deletions b/‎DESCRIPTION‎
Lines changed: 2 additions & 3 deletions
diff --git a/‎NEWS.md‎
Lines changed: 36 additions & 1 deletion b/‎NEWS.md‎
Lines changed: 36 additions & 1 deletion
diff --git a/‎R/calc_roc.R‎
Lines changed: 4 additions & 3 deletions b/‎R/calc_roc.R‎
Lines changed: 4 additions & 3 deletions
diff --git a/‎R/gg_beta_varpro.R‎
Lines changed: 27 additions & 16 deletions b/‎R/gg_beta_varpro.R‎
Lines changed: 27 additions & 16 deletions
diff --git a/‎R/gg_brier.R‎
Lines changed: 30 additions & 17 deletions b/‎R/gg_brier.R‎
Lines changed: 30 additions & 17 deletions
diff --git a/‎R/gg_isopro.R‎
Lines changed: 4 additions & 4 deletions b/‎R/gg_isopro.R‎
Lines changed: 4 additions & 4 deletions
diff --git a/‎R/gg_ivarpro.R‎
Lines changed: 23 additions & 11 deletions b/‎R/gg_ivarpro.R‎
Lines changed: 23 additions & 11 deletions
@@ -1,3 +1,3 @@
-Version: 2.7.3
-Date: 2026-05-12 13:23:24 UTC
-SHA: dd8e66f248a91e943c1c6dd1ffc2356058ac652b
+Version: 3.1.0
+Date: 2026-06-11 15:26:24 UTC
+SHA: a7d805290e69ae517d04846bf13fae6a01062fce
@@ -1,8 +1,8 @@
 Package: ggRandomForests
 Type: Package
 Title: Visually Exploring Random Forests
-Version: 3.0.0.9001
-Date: 2026-05-29
+Version: 3.1.0.9000
+Date: 2026-06-11
 Authors@R: person("John", "Ehrlinger",
   role = c("aut", "cre"),
   email = "john.ehrlinger@gmail.com")
@@ -44,7 +44,6 @@ Suggests:
     pkgdown,
     pkgload,
     knitr,
-    plotly,
     ggraph,
     callr,
     randomForestRHF
 
@@ -1,8 +1,9 @@
 Package: ggRandomForests
-Version: 3.0.0.9001
+Version: 3.1.0.9000
 
 ggRandomForests v4.0.0 (development)
 ====================================
+* Development version 3.1.0.9000, opened after the v3.1.0 CRAN release.
 * `gg_auct()` / `plot.gg_auct()`: tidy wrapper and plot for time-varying
   AUC from `randomForestRHF::auct.rhf()` (RHF Phase 2). Returns a long
   frame `time / auc / se / lower / upper / marker` with an `iauc`
@@ -16,6 +17,40 @@ ggRandomForests v4.0.0 (development)
   `requireNamespace("randomForestRHF")`. No change for users who do not
   install it.
 
+ggRandomForests v3.1.0
+======================
+* Fix: `gg_vimp()` for single-outcome rfsrc forests now correctly flags
+  variables with non-positive VIMP in the `positive` column (affecting
+  plot coloring). The column was named `VIMP` (uppercase) in single-outcome
+  fits but the flag check accessed `$vimp` (lowercase), leaving `positive`
+  stuck at `TRUE` for all variables. Surfaced by the Copilot review on
+  PR #109.
+* Documentation pass. Deepened the varPro-family and rfsrc
+  importance/partial/survival help pages against the upstream
+  randomForestSRC and varPro documentation, and made the line between
+  `gg_vimp()` (permutation, Breiman-Cutler importance) and `gg_varpro()`
+  (varPro release-rule importance) explicit and cross-linked. Vignette
+  prose deepened with the same framing; one-line code-comment fixes;
+  fixed a stale `@return` in `gg_roc()` (documented a `yvar` column the
+  function does not return). No user-facing behaviour change.
+* Vignettes: the regression and survival partial-dependence surfaces are
+  now rendered as static `ggplot2` heat maps instead of interactive
+  `plotly` widgets, and figures render at 96 dpi. This cuts the installed
+  size from ~17 MB to ~5 MB (the `plotly` library is no longer bundled into
+  the vignette HTML). `plotly` is dropped from `Suggests`.
+* Check time: reduced the `R CMD check` vignette-rebuild and test timings to
+  bring the overall CRAN check comfortably under budget (CRAN flagged the
+  overall check time on the 3.1.0 submission). The regression and survival
+  vignettes use lighter forests (`ntree` 200 / 150, imputation `ntree` 100)
+  and coarser partial-dependence grids. The varpro vignette's three
+  `gg_partial_varpro()` calls and the Boston `beta.varpro()` fit (~34 s
+  combined) are precomputed offline by `vignettes/precompute_varpro.R` and
+  loaded from `vignettes/varpro_precomputed.rds`, with an automatic
+  live-computation fallback if the file is absent. The `gg_udependent()`
+  tests memoise the per-fit entropy matrix (`varPro::get.beta.entropy()`,
+  ~1.5 s and a pure function of the fit) instead of recomputing it once per
+  test. No user-facing behaviour change.
+
 ggRandomForests v3.0.0
 ======================
 * **Version jump to 3.0.0.** The varPro integration is a major scope
 
@@ -206,7 +206,7 @@ calc_roc <- function(object,
 }
 
 # Build the sensitivity/specificity table for a single class index k.
-# Plain lapply (not mclapply) — per-threshold work is a single table()
+# Plain lapply (not mclapply): per-threshold work is a single table()
 # + a few arithmetic ops (microseconds); fork overhead would dominate,
 # and the closure-scope fragility caused the earlier xtabs/Windows
 # failure. Returns a data.frame with columns sens, spec, pct.
@@ -320,8 +320,9 @@ calc_auc <- function(x) {
   # Sort in decreasing specificity so FPR = 1-spec increases monotonically
   x <- x[order(x$spec, decreasing = TRUE), ]
 
-  # Δ(FPR) = -(Δspec)  — spec decreases, so (spec[i] - spec[i+1]) > 0
-  # Average height of trapezoid = (sens[i] + sens[i+1]) / 2
+  # Trapezoid area = sens_avg * Δspec, where Δspec = spec[i] - spec[i+1] > 0
+  # (spec decreases left-to-right). This equals sens_avg * Δ(1-FPR), which
+  # gives the standard AUC = ∫ sens d(FPR) with a positive sign.
   auc <- (x$sens + shift(x$sens)) / 2 * (x$spec - shift(x$spec)) # nolint: object_usage_linter
   sum(auc, na.rm = TRUE)
 }
 
@@ -11,11 +11,22 @@
 #' `beta.varpro()` step once and reuse the result.
 #'
 #' @section What this is doing:
-#' For each rule (a tree-branch pair) in the forest, [varPro::beta.varpro()]
-#' fits a one-predictor lasso regression of the response on the released
-#' variable's values, restricted to the OOB observations inside the rule's
-#' region. The wrapper aggregates those per-rule coefficients into one
-#' number per variable.
+#' Think of the varPro release-rule mechanism as asking: "given a region of
+#' the feature space that the forest carved out, what changes when I remove
+#' the constraint on this one variable and let observations leave?" The
+#' standard importance answer (from [gg_varpro()]) measures that change as a
+#' z-scored contrast between local estimators: no synthetic data, no
+#' permutation. \code{beta.varpro()} asks the same question with a different
+#' ruler: for each rule (a tree-branch pair), it fits a one-predictor lasso
+#' regression of the response on the released variable's values, restricted
+#' to the OOB observations inside the rule's region. The wrapper aggregates
+#' those per-rule coefficients into one number per variable.
+#'
+#' The key distinction from [gg_vimp()], which measures Breiman-Cutler
+#' permutation importance by perturbing a variable's values and watching OOB
+#' error climb, is that neither [gg_varpro()] nor \code{gg_beta_varpro()}
+#' touches the data synthetically: all contrasts are between real subsets
+#' defined by the forest's rules.
 #'
 #' @section What `imp` actually is (pedantic, because the column name is misleading):
 #' The `imp` column on `beta.varpro()`'s `$results` is **not** a
@@ -63,7 +74,7 @@
 #'
 #' @section What you use this for:
 #' Picking variables when local effects matter more than aggregate
-#' split-strength contribution. Compare side-by-side with [gg_varpro()] —
+#' split-strength contribution. Compare side-by-side with [gg_varpro()]:
 #' a variable that scores high here but low in `gg_varpro` is one whose
 #' local linear effect inside many rules is real even though its
 #' release-rule contrast is modest.
@@ -92,7 +103,7 @@
 #' class.
 #'
 #' **Binary default**: `which_class = NULL` resolves to the *last*
-#' factor level of the response — the positive-class convention used
+#' factor level of the response, the positive-class convention used
 #' by `glm` and `gg_roc`. For a 30-day-mortality outcome with levels
 #' `c("no", "yes")`, that means the wrapper shows you `"yes"` (the
 #' event) by default.
@@ -118,7 +129,7 @@
 #' @section Reproducibility:
 #' Byte-for-byte agreement between cached (`beta_fit = b`) and uncached
 #' (`beta_fit = NULL`) outputs requires that `b` was computed by
-#' `beta.varpro(object, ...)` on the same `object` — `set.seed()` alone is
+#' `beta.varpro(object, ...)` on the same `object`; `set.seed()` alone is
 #' not sufficient, because `beta.varpro`'s internal `cv.glmnet` fits can
 #' pick slightly different folds across separate calls. Reuse `beta_fit`
 #' when reproducibility matters.
@@ -132,15 +143,15 @@
 #' @param ... Forwarded to [varPro::beta.varpro()] when `beta_fit = NULL`;
 #'   ignored otherwise (with a warning). Documented forwardables: `use.cv`,
 #'   `use.1se`, `nfolds`, `maxit`, `thresh`, `max.rules.tree`, `max.tree`.
-#' @param cutoff Selection threshold on `beta_mean`. `NULL` (default) →
+#' @param cutoff Selection threshold on `beta_mean`. `NULL` (default) means
 #'   `mean(beta_mean)` across released variables. Numeric scalar otherwise.
 #' @param beta_fit Optional pre-computed [varPro::beta.varpro()] result for
-#'   the same `object`. `NULL` (default) → the wrapper runs `beta.varpro()`
+#'   the same `object`. `NULL` (default) means the wrapper runs `beta.varpro()`
 #'   itself. When supplied, must be a `varpro`-class object whose `$results`
 #'   has columns `tree / branch / variable / n.oob / imp`.
 #' @param which_class For a classification fit, name of a single response
 #'   level to subset on. `NULL` (default) returns all classes (binary fits
-#'   resolve to the *last* factor level — the positive-class convention
+#'   resolve to the *last* factor level, the positive-class convention
 #'   used by `glm` and `gg_roc`). Ignored with a warning on regression
 #'   fits.
 #'
@@ -153,7 +164,7 @@
 #'   the same row order. `which_class` (or the binary default
 #'   last-factor-level) collapses the output to a single class.
 #'
-#' @seealso [gg_varpro()], [plot.gg_beta_varpro()], [varPro::beta.varpro()].
+#' @seealso [gg_varpro()], [gg_vimp()], [plot.gg_beta_varpro()], [varPro::beta.varpro()].
 #'
 #' @examples
 #' \donttest{
@@ -208,7 +219,7 @@ gg_beta_varpro.varpro <- function(object, ..., cutoff = NULL,
     which_class <- NULL
   }
 
-  # Capture use.cv from `...` here (NOT inside the internals — the dots
+  # Capture use.cv from `...` here (NOT inside the internals; the dots
   # don't pass through to the internal frame).
   dots_use_cv <- if (is.null(beta_fit)) isTRUE(list(...)$use.cv) else NA
 
@@ -372,7 +383,7 @@ gg_beta_varpro.varpro <- function(object, ..., cutoff = NULL,
   ord_names <- names(sort(beta_mean_total, decreasing = TRUE))
   lvl <- rev(ord_names)
 
-  # Per-class aggregation — long format
+  # Per-class aggregation: long format
   rows <- list()
   for (k in seq_len(n_classes)) {
     col <- imp_cols[k]
@@ -452,8 +463,8 @@ gg_beta_varpro.varpro <- function(object, ..., cutoff = NULL,
   class(base) <- c("gg_beta_varpro", "data.frame")
 
   # Build provenance with shape-stable cutoff:
-  # regr  → c("regr" = NA_real_)
-  # class → named NA_real_ vector, one entry per class level
+  # regr  gives c("regr" = NA_real_)
+  # class gives named NA_real_ vector, one entry per class level
   if (fam == "class") {
     class_levels <- .class_levels_from_varpro(object)
     cutoff_empty <- stats::setNames(rep(NA_real_, length(class_levels)),
 
@@ -16,24 +16,37 @@
 #'
 #' The Brier score asks a familiar question of any probabilistic forecast:
 #' how far did the predicted probability sit from what actually happened?
-#' For a survival forest the forecast is the predicted survival probability,
-#' and the score is computed at each event time, so the result is a curve
-#' rather than a single number -- lower is better, at every time.
+#' For a survival forest the forecast is the predicted survival probability
+#' at a given moment, and the "what happened" is whether the subject was
+#' still alive at that moment.  The score is computed at every event time,
+#' so you get a curve rather than a single number -- lower is better
+#' everywhere.  A perfectly calibrated forest that predicts \code{0} for
+#' every subject who died and \code{1} for every subject who survived would
+#' score \code{0}; a forest that predicts \code{0.5} for everyone scores
+#' roughly \code{0.25} regardless of the true outcome -- that is the
+#' "uninformative" ceiling.
 #'
-#' This function extracts that time-resolved Brier score for a survival
-#' forest grown with \code{randomForestSRC}, both overall and split by
-#' mortality-risk quartile. It also returns the continuous ranked
-#' probability score (CRPS), which is the Brier score integrated over time
-#' and divided by elapsed time -- a running average of the curve so far.
+#' This function extracts the time-resolved Brier score for a survival
+#' forest grown with \code{randomForestSRC}, both overall and broken down
+#' by mortality-risk quartile (lowest-risk to highest-risk subjects).  It
+#' also returns the continuous ranked probability score (CRPS) -- the Brier
+#' score integrated over time and divided by elapsed time, a running average
+#' that summarises calibration up to each point on the time axis.
 #'
-#' @details This wraps \code{\link[randomForestSRC]{get.brier.survival}} and
-#' rebuilds the quartile decomposition and running CRPS from the returned
-#' \code{brier.matx} and \code{mort} components, following the computation
-#' in the internal \code{plot.survival} function of \pkg{randomForestSRC}.
-#' Right-censored data make a plain Brier score biased, so the score uses
-#' inverse-probability-of-censoring weighting. The censoring distribution
-#' is estimated either by Kaplan-Meier (\code{cens.model = "km"}, the
-#' default) or by a separate censoring forest (\code{cens.model = "rfsrc"}).
+#' @details
+#' Because subjects are right-censored, a plain Brier score is biased:
+#' censored subjects contribute no outcome information yet still inflate the
+#' denominator.  The score here uses inverse-probability-of-censoring
+#' weighting (IPCW), which up-weights uncensored observations to compensate.
+#' The censoring distribution is estimated either by Kaplan-Meier
+#' (\code{cens.model = "km"}, the default) or by a separate censoring
+#' forest (\code{cens.model = "rfsrc"}) when the censoring mechanism is
+#' itself covariate-dependent.
+#'
+#' Internally, this wraps \code{\link[randomForestSRC]{get.brier.survival}}
+#' and rebuilds the quartile decomposition and running CRPS from the returned
+#' \code{brier.matx} and \code{mort} components, following the approach in
+#' the internal \code{plot.survival} of \pkg{randomForestSRC}.
 #'
 #' @param object A fitted \code{\link[randomForestSRC]{rfsrc}} survival
 #'   forest (\code{object$family == "surv"}).
@@ -143,7 +156,7 @@ gg_brier.rfsrc <- function(object,
   bs_quartile <- vapply(seq_len(4), function(k) {
     in_bin <- mort > mort_breaks[k] & mort <= mort_breaks[k + 1]
     if (!any(in_bin, na.rm = TRUE)) {
-      # Empty bin — can occur when mort has ties at a quantile boundary.
+      # Empty bin: can occur when mort has ties at a quantile boundary.
       return(rep(NA_real_, nrow(bs_df)))
     }
     colMeans(brier_matx[in_bin, , drop = FALSE], na.rm = TRUE)
 
@@ -20,7 +20,7 @@
 #' a typical observation sits in the dense middle of the feature cloud and
 #' takes many splits to isolate, while an unusual observation sits out
 #' near an edge and gets cut off after only a few. So \strong{the depth at
-#' which an observation is isolated is a proxy for how typical it is} —
+#' which an observation is isolated is a proxy for how typical it is}:
 #' shallow depth means anomalous, deep depth means ordinary. Average a
 #' single observation's depth across many trees and the noise washes out,
 #' leaving a stable per-observation rank.
@@ -68,7 +68,7 @@
 #'     against a fitted model and compare the test scores to the training
 #'     distribution.
 #' }
-#' The score is a \emph{rank}, not a probability of being an outlier — two
+#' The score is a \emph{rank}, not a probability of being an outlier: two
 #' observations with \code{howbad = 0.92} are both unusual, not "92\%
 #' likely to be anomalous". Pick a cutoff by looking at where the elbow
 #' rises; \code{\link{plot.gg_isopro}} can annotate either a score
@@ -86,7 +86,7 @@
 #' \code{howbad} (where \emph{higher} is more anomalous). The wrapper
 #' exposes both conventions so nothing is hidden:
 #' \itemize{
-#'   \item \code{case.depth} carries varPro's native polarity — \emph{lower
+#'   \item \code{case.depth} carries varPro's native polarity, \emph{lower
 #'     = more anomalous}. This is the unmodified output of
 #'     \code{predict(object, newdata, quantiles = FALSE)}. Use it to
 #'     cross-reference against raw varPro output.
@@ -128,7 +128,7 @@
 #'       order as the rows of the data passed to
 #'       \code{\link[varPro]{isopro}}.}
 #'     \item{case.depth}{Numeric; mean isolation depth across the forest.
-#'       Lower means the observation was isolated quickly — more
+#'       Lower means the observation was isolated quickly, so more
 #'       anomalous.}
 #'     \item{howbad}{Numeric in \code{[0, 1]}; the \code{case.depth}
 #'       values pushed through their own empirical CDF and flipped so
 
@@ -10,15 +10,27 @@
 #' `ivarpro()` call.
 #'
 #' @section What this is doing:
-#' `ivarpro()` walks the varPro forest's rules and, for each
-#' (observation, variable) pair, computes a scaled per-rule
-#' contribution to predicting that observation. Per-rule LOO removes
-#' the observation from its own rule before scoring. Per-region
-#' scaling (`scale = "local"`, default) standardises the contribution
-#' by the rule's local response standard deviation so values are
-#' comparable across rules of different size. Aggregating those
-#' per-rule scores into one number per (obs, variable) pair gives the
-#' `local_imp` cell.
+#' The varPro framework builds importance from release rules: for a given
+#' rule region, it compares a local estimator inside that region to what
+#' the estimator becomes after the constraint on the tested variable is
+#' removed ("released"). That contrast is summed over many rules and trees
+#' to get a global z-score: the quantity [gg_varpro()] shows. What
+#' `ivarpro()` adds is a per-observation view of the same mechanism.
+#'
+#' Concretely: `ivarpro()` walks the forest's rules and, for each
+#' (observation, variable) pair, computes a scaled per-rule contribution
+#' to predicting that observation. Per-rule LOO removes the observation
+#' from its own rule before scoring, so the contribution is not inflated
+#' by the observation having helped define the region. Per-region scaling
+#' (`scale = "local"`, default) standardises the contribution by the
+#' rule's local response standard deviation so values are comparable
+#' across rules of different size. Aggregating those per-rule scores into
+#' one number per (obs, variable) pair gives the `local_imp` cell.
+#'
+#' No permutation, no synthetic data: the contrast is always between real
+#' subsets of the observed data, defined by the forest's own rules. This
+#' is the same no-synthetic-features property that distinguishes
+#' [gg_varpro()] from [gg_vimp()]'s Breiman-Cutler permutation importance.
 #'
 #' @section What `local_imp` actually is (pedantic):
 #' `local_imp[i, v]` is the **scaled aggregated rule contribution** of
@@ -126,7 +138,7 @@
 #'   `mean(|local_imp|)` descending across all rows (the unified
 #'   ranking axis shared across facets / panels).
 #'
-#' @seealso [gg_varpro()], [gg_beta_varpro()], [varPro::ivarpro()].
+#' @seealso [gg_varpro()], [gg_vimp()], [gg_beta_varpro()], [varPro::ivarpro()].
 #'
 #' @examples
 #' \donttest{
@@ -415,7 +427,7 @@ gg_ivarpro.varpro <- function(object, ..., which_obs = NULL,
   }
 
   # Unified factor-level ordering across all (obs, class), REVERSED so the
-  # most-important variable lands at the TOP after coord_flip — shared
+  # most-important variable lands at the TOP after coord_flip; shared
   # across every class facet for alignment.
   agg <- tapply(abs(long$local_imp), long$variable, mean, na.rm = TRUE)
   ord_names <- names(sort(agg, decreasing = TRUE))