ehrlinger · ehrlinger · May 26, 2026 · May 26, 2026 · May 26, 2026 · May 26, 2026
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -1,7 +1,7 @@
 Package: ggRandomForests
 Type: Package
 Title: Visually Exploring Random Forests
-Version: 2.7.3.9009
+Version: 2.7.3.9010
 Date: 2026-05-21
 Authors@R: person("John", "Ehrlinger",
   role = c("aut", "cre"),

diff --git a/NEWS.md b/NEWS.md
@@ -1,8 +1,34 @@
 Package: ggRandomForests
-Version: 2.7.3.9009
+Version: 2.7.3.9010
 
 ggRandomForests v2.8.0 (development) — continued
 =================================================
+* `gg_isopro()` gains a `newdata` argument so a fitted `varPro::isopro`
+  model can score new observations into the same tidy `gg_isopro` frame.
+  Internally the wrapper calls `predict.isopro()` twice: with
+  `quantiles = FALSE` to populate the `case.depth` column (varPro's native
+  polarity, lower = more anomalous) and with `quantiles = TRUE` to compute
+  `howbad = 1 - quantile` (the wrapper convention, higher = more anomalous).
+  Both polarities are visible in the returned data frame, and the
+  relationship is named in the roxygen. The `plot` / `print` / `summary` /
+  `autoplot` S3 companions work unchanged on the new tidy frame; to overlay
+  training and test scores, bind the two extractor calls with a `method`
+  label column and pass the result to `plot()`. Second of three Phase 4
+  sub-projects.
+* **Fix (gg_isopro training-path polarity).** Bug in the original
+  `gg_isopro` (PR #94): varPro's `$howbad` on an `isopro` fit uses
+  "lower = more anomalous" polarity (it is the quantile of `case.depth`),
+  but the wrapper's plot method and documentation both assume "higher =
+  more anomalous". Train scores and the new test-data scores were
+  anti-correlated until this PR's training-path flip
+  (`howbad = 1 - object$howbad`) brought them into agreement. The fix
+  surfaced because the test-data sanity check (training-as-newdata top-5
+  overlap) failed at 0/5 instead of 5/5 before the flip. Note: the two
+  vdiffr baselines recorded in PR #94 (`gg-isopro-default` and
+  `gg-isopro-threshold`) were recorded under the inverted polarity; they
+  are visually flipped relative to the new behaviour but CI skips
+  snapshots (`VDIFFR_RUN_TESTS = false`) so no failure surfaces. Re-record
+  with `VDIFFR_RUN_TESTS = true` when convenient.
 * Documentation: pedagogical pass over the varPro wrappers
   (`gg_partial_varpro`, `gg_varpro`, `gg_udependent` and their `plot.*`
   methods). Each help page now has explicit "What X is doing", "What's

diff --git a/R/gg_isopro.R b/R/gg_isopro.R
@@ -63,17 +63,63 @@
 #'   \item check whether a held-out cohort sits inside the training
 #'     distribution before scoring with a model trained elsewhere;
 #'   \item give the analyst a ranked list of "look at these first" cases
-#'     for a manual review.
+#'     for a manual review;
+#'   \item score a held-out cohort or a fresh batch of incoming data
+#'     against a fitted model and compare the test scores to the training
+#'     distribution.
 #' }
 #' The score is a \emph{rank}, not a probability of being an outlier — two
 #' observations with \code{howbad = 0.92} are both unusual, not "92\%
 #' likely to be anomalous". Pick a cutoff by looking at where the elbow
 #' rises; \code{\link{plot.gg_isopro}} can annotate either a score
 #' (\code{threshold}) or a top-percent (\code{top_n_pct}) for you.
 #'
+#' @section Scoring new data:
+#' Pass a \code{data.frame} as \code{newdata} and the extractor calls
+#' \code{\link[varPro]{predict.isopro}} twice: once with
+#' \code{quantiles = FALSE} to get the raw mean case depth per row, and once
+#' with \code{quantiles = TRUE} to get the per-row quantile of that depth
+#' against the training-data depth distribution.
+#'
+#' varPro's \code{predict.isopro} returns quantiles where \emph{smaller is
+#' more anomalous}, which is the opposite polarity of the wrapper's
+#' \code{howbad} (where \emph{higher} is more anomalous). The wrapper
+#' exposes both conventions so nothing is hidden:
+#' \itemize{
+#'   \item \code{case.depth} carries varPro's native polarity — \emph{lower
+#'     = more anomalous}. This is the unmodified output of
+#'     \code{predict(object, newdata, quantiles = FALSE)}. Use it to
+#'     cross-reference against raw varPro output.
+#'   \item \code{howbad} is the flipped, wrapper-convention version. The
+#'     relationship is \code{howbad = 1 - predict(object, newdata, quantiles = TRUE)}.
+#' }
+#'
+#' To overlay training and test scores in one plot, bind the two extractor
+#' calls with a \code{method} label column (the same column
+#' \code{\link{plot.gg_isopro}} uses to colour rnd / unsupv / auto
+#' comparisons):
+#'
+#' \preformatted{
+#' gg_train <- gg_isopro(fit)
+#' gg_test  <- gg_isopro(fit, newdata = test_df)
+#' gg_both  <- rbind(cbind(gg_train, method = "train"),
+#'                   cbind(gg_test,  method = "test"))
+#' class(gg_both) <- c("gg_isopro", "data.frame")
+#' plot(gg_both)
+#' }
+#'
 #' @param object An \code{isopro} fit returned by
 #'   \code{\link[varPro]{isopro}}.
-#' @param ... Currently unused.
+#' @param ... Currently unused. Present before \code{newdata} so that
+#'   \code{newdata} is only matched by name, preserving backward
+#'   compatibility with callers of the PR #94 signature
+#'   \code{gg_isopro(object, ...)}.
+#' @param newdata Optional \code{data.frame} of new observations to score
+#'   against the fit. Must be passed by name. When \code{NULL} (default)
+#'   the extractor returns the in-sample tidy frame from the fit's stored
+#'   \code{$case.depth} and \code{$howbad}. When supplied, each row is
+#'   scored via \code{\link[varPro]{predict.isopro}} and the same tidy
+#'   shape is returned for the test data.
 #'
 #' @return A \code{data.frame} of class \code{c("gg_isopro", "data.frame")},
 #'   one row per observation. Columns:
@@ -121,40 +167,76 @@
 #' }
 #'
 #' @export
-gg_isopro <- function(object, ...) {
+gg_isopro <- function(object, ..., newdata = NULL) {
   UseMethod("gg_isopro", object)
 }
 
 #' @export
-gg_isopro.isopro <- function(object, ...) {
+gg_isopro.isopro <- function(object, ..., newdata = NULL) {
   if (!inherits(object, "isopro")) {
     stop("gg_isopro expects a 'isopro' object from varPro::isopro().",
          call. = FALSE)
   }
 
-  howbad <- as.numeric(object$howbad)
-  depth  <- as.numeric(object$case.depth)
-  n      <- length(howbad)
+  ntree <- tryCatch(
+    as.integer(object$isoforest$ntree),
+    error = function(e) NA_integer_
+  )
+  ntree <- if (length(ntree) == 1L && !is.na(ntree)) ntree else NA_integer_
+
+  ## ---- Training path (newdata = NULL) ------------------------------------
+  if (is.null(newdata)) {
+    # varPro's $howbad uses "lower = more anomalous" polarity (it is the
+    # quantile of case.depth, low depth = anomalous). The wrapper convention
+    # is "higher = more anomalous", so flip the polarity here the same way
+    # the prediction path does (howbad = 1 - quantile).
+    howbad <- 1 - as.numeric(object$howbad)
+    depth  <- as.numeric(object$case.depth)
+    n      <- length(howbad)
+
+    gg_dta <- data.frame(
+      obs        = seq_len(n),
+      case.depth = depth,
+      howbad     = howbad
+    )
+    class(gg_dta) <- c("gg_isopro", class(gg_dta))
+    attr(gg_dta, "provenance") <- list(
+      source     = "varPro::isopro",
+      n          = n,
+      ntree      = ntree,
+      prediction = FALSE
+    )
+    return(invisible(gg_dta))
+  }
+
+  ## ---- Prediction path (newdata supplied) -------------------------------
+  if (!is.data.frame(newdata)) {
+    stop("newdata must be a data.frame.", call. = FALSE)
+  }
+
+  # Two calls to predict.isopro: raw depth and quantile-against-training.
+  # The wrapper polarity is "higher = more anomalous", so we flip the quantile:
+  #   howbad = 1 - predict(object, newdata, quantiles = TRUE)
+  # case.depth keeps varPro's native scale (lower = more anomalous), giving
+  # the user a varPro-polarity number for cross-reference.
+  depth <- as.numeric(stats::predict(object, newdata = newdata,
+                                     quantiles = FALSE))
+  q     <- as.numeric(stats::predict(object, newdata = newdata,
+                                     quantiles = TRUE))
+  howbad <- 1 - q
+  n      <- nrow(newdata)
 
   gg_dta <- data.frame(
     obs        = seq_len(n),
     case.depth = depth,
     howbad     = howbad
   )
-
   class(gg_dta) <- c("gg_isopro", class(gg_dta))
-
-  # isopro-specific provenance (the shared .gg_provenance helper only knows
-  # about rfsrc / randomForest objects, so build the list inline).
-  ntree <- tryCatch(
-    as.integer(object$isoforest$ntree),
-    error = function(e) NA_integer_
-  )
   attr(gg_dta, "provenance") <- list(
-    source = "varPro::isopro",
-    n      = n,
-    ntree  = if (length(ntree) == 1 && !is.na(ntree)) ntree else NA_integer_
+    source     = "varPro::isopro",
+    n          = n,
+    ntree      = ntree,
+    prediction = TRUE
   )
-
   invisible(gg_dta)
 }
diff --git a/dev/plans/2026-05-26-varpro-phase4-predict-isopro-design.md b/dev/plans/2026-05-26-varpro-phase4-predict-isopro-design.md
@@ -0,0 +1,146 @@
+# ggRandomForests v2.8.0 — varPro Phase 4: predict.isopro Wrapper Design
+
+**Date:** 2026-05-26
+**Author:** John Ehrlinger (design via Claude brainstorming)
+**Status:** Approved — ready for implementation planning
+**Sequencing:** Second of the Phase 4 sub-projects. Builds on PR #94 (gg_isopro for the in-sample case). `gg_beta_varpro` and `gg_ivarpro` come after. Lands as one PR before the v2.8.0 release candidate.
+
+---
+
+## Goal
+
+Let users score new observations against a fitted `varPro::isopro` model with the same tidy-data ergonomics as the in-sample `gg_isopro()` call: same return shape, same plot method, same threshold semantics.
+
+## Scope
+
+A single sub-project. Implemented as one new argument on the existing `gg_isopro()` extractor — no new exported function, no new plot method. Other Phase 4 functions (`gg_beta_varpro`, `gg_ivarpro`) are tracked separately.
+
+---
+
+## Architecture
+
+```
+varPro::isopro fit  ──┐
+                      ├──► gg_isopro(object, newdata = NULL)
+data.frame (newdata) ─┘                │
+                                       └── tidy data.frame
+                                           class: c("gg_isopro", "data.frame")
+                                           cols : obs, case.depth, howbad
+                                           attr : provenance (prediction flag)
+                                                  │
+                                            plot / print / summary / autoplot
+                                                  (unchanged from PR #94)
+```
+
+The two input paths produce the same return shape. The plot/print/summary methods do not care which path produced the object.
+
+---
+
+## Extractor signature
+
+```r
+gg_isopro(object, newdata = NULL, ...)
+```
+
+- **`object`** — an `isopro` fit from `varPro::isopro`. Method dispatch via `gg_isopro.isopro`.
+- **`newdata`** — `NULL` (default) or a `data.frame`. When `NULL`, returns the training-data tidy frame (PR #94 behaviour). When a `data.frame`, scores each row against the fit and returns the same tidy shape for the test data.
+- **`...`** — currently unused.
+
+## Internal flow when `newdata` is supplied
+
+1. Validate: `newdata` must be a `data.frame`. Otherwise `stop()` with `"newdata must be a data.frame."`.
+2. Call `predict(object, newdata = newdata, quantiles = FALSE)` → raw mean case-depth per row.
+3. Call `predict(object, newdata = newdata, quantiles = TRUE)` → quantile per row (smaller = more anomalous, per varPro's convention).
+4. **Flip polarity** for column consistency:
+   ```r
+   howbad <- 1 - quantile
+   ```
+   With the flip, `howbad` always means "higher = more anomalous", whether the row came from the training set or `newdata`. The plot method and any `threshold = ...` value the user picks from the training elbow apply unchanged.
+5. Assemble the tidy frame:
+   ```r
+   data.frame(obs = seq_len(nrow(newdata)),
+              case.depth = case_depth_vec,
+              howbad     = howbad_vec)
+   ```
+6. Set class `c("gg_isopro", "data.frame")` and attach a provenance attribute:
+   - `source = "varPro::isopro"`
+   - `n = nrow(newdata)`
+   - `ntree` (carried from `object$isoforest$ntree`)
+   - `prediction = TRUE` (new — distinguishes test-data extractor from training)
+
+## Plot / print / summary
+
+Unchanged. The new tidy frame has the same class and columns as the training case, so every S3 companion from PR #94 works as-is.
+
+## Overlay train + test (caller pattern)
+
+No new machinery in the package; the existing `method`-column auto-detect in `plot.gg_isopro` is overloaded:
+
+```r
+gg_train <- gg_isopro(fit)
+gg_test  <- gg_isopro(fit, newdata = test_df)
+gg_both  <- rbind(cbind(gg_train, method = "train"),
+                  cbind(gg_test,  method = "test"))
+class(gg_both) <- c("gg_isopro", "data.frame")
+plot(gg_both)
+```
+
+`method` is the existing special column used to colour-group rnd / unsupv / auto curves; reusing it for `train` / `test` works because the plot only cares about the column's existence, not its semantics. A `@section` in the `gg_isopro` roxygen documents this overload so it isn't a hidden trick.
+
+## Polarity: how the wrapper presents both conventions
+
+`varPro::predict.isopro(quantiles = TRUE)` returns quantiles where *smaller is more anomalous* (a row whose case depth sits in the lower tail of the training depth distribution). `gg_isopro`'s `howbad` is the opposite: *higher is more anomalous*. The wrapper is **not** trying to hide the conflict — it shows both polarities by keeping both columns:
+
+- `case.depth` carries the raw mean depth from `predict(quantiles = FALSE)`. **Lower = more anomalous.** This is varPro's native scale, exposed directly, with no transformation. A user who wants to cross-reference against `varPro::predict.isopro()` output can do it on this column.
+- `howbad` carries `1 - predict(quantiles = TRUE)`. **Higher = more anomalous.** This is the wrapper convention, and it matches what the training-path `gg_isopro()` already produces. The plot method's elbow shape, the `threshold` annotation, and the `top_n_pct` quantile all assume this polarity.
+
+The roxygen must name this transformation explicitly. A user who reads only the `howbad` column should still come away understanding: (i) it isn't byte-identical to `predict.isopro(quantiles = TRUE)`, (ii) the relationship is `howbad = 1 - quantile`, and (iii) `case.depth` is the unmodified varPro number if you need the raw measure.
+
+## Validation
+
+- `newdata` is supplied but isn't a `data.frame` → `stop("newdata must be a data.frame.")`.
+- `nrow(newdata) == 0` → empty `gg_isopro` frame with the same columns; downstream plot handles zero rows by erroring with a clear ggplot message. No special-case in the extractor.
+- Unknown columns / NAs in `newdata` → pass through to `predict.isopro`; varPro decides.
+
+## Tests (mirroring the Phase 1–4a coverage)
+
+1. **Shape**: `gg_isopro(fit, newdata = test_df)` returns `c("gg_isopro", "data.frame")` with columns `obs / case.depth / howbad`, `nrow == nrow(newdata)`.
+2. **Polarity flip**: synthetic check that `howbad` is in `[0, 1]` and corresponds to `1 - predict(..., quantiles = TRUE)` for the same rows.
+3. **Sanity check**: scoring the training set as newdata produces `howbad` values close to (but not necessarily identical to) `fit$howbad`. Tolerance is loose because varPro may use a slightly different code path for `predict` vs the in-bag scoring; the test asserts the same range and the same per-row ordering for the top-5 most anomalous rows.
+4. **Provenance**: returned object's provenance attribute has `prediction = TRUE` and `n == nrow(newdata)`.
+5. **Validation error**: `gg_isopro(fit, newdata = "not a df")` errors with `"newdata must be a data.frame"`.
+6. **Overlay smoke test**: rbind of train + test extractor outputs with a `method` label column plots without error; every patchwork sub-plot builds.
+
+## Snapshots
+
+One new `vdiffr::expect_doppelganger` inside the existing `VDIFFR_RUN_TESTS` guard: `gg-isopro-predict-overlay` — train + test bound with a `method` column, default `plot()`. Skip cleanly without the env var.
+
+## Documentation
+
+- Extend the existing `gg_isopro` roxygen with:
+  - A new `@param newdata` line in the terse register.
+  - A short `@section Scoring new data` block in the narrative register, written to make the polarity transformation explicit:
+    - What `newdata` does.
+    - The two `predict.isopro` calls and how their outputs map to the two columns (raw depth → `case.depth`, `1 - quantile` → `howbad`).
+    - One sentence naming the transformation in code form (e.g. "`howbad = 1 - predict(fit, newdata, quantiles = TRUE)`") so a user diffing against raw `predict()` output sees exactly where the difference comes from.
+    - The train/test overlay caller pattern.
+- Update the existing "What you use this for" section to mention the new-data use case (a held-out cohort, a production scoring scenario).
+
+## Files
+
+- **Modify**: `R/gg_isopro.R` (signature + new internal path), `tests/testthat/test_gg_isopro.R` (six new tests), `tests/testthat/test_snapshots.R` (one snapshot), `NEWS.md`, `DESCRIPTION` (version bump to the next available `2.7.3.900x` increment — `.9010` if PR #95 has landed by then, otherwise the implementer picks the next free slot above `.9008`).
+- **New**: none.
+
+## Acceptance criteria
+
+- `R CMD check --as-cran`: 0 errors / 0 warnings / 0 notes.
+- Full `devtools::test()`: 0 failures. New tests pass; gg_isopro coverage from PR #94 (43 expectations) still green.
+- Roxygen produced under markdown mode (PR #95 enables this; if #95 hasn't merged, write in Rd-style and document() will produce valid Rd either way).
+- One PR before the v2.8.0 release candidate.
+
+## Out of scope
+
+- A new function (e.g. `gg_isopro_predict()`) — rejected in favour of one optional argument.
+- An S3 `predict.gg_isopro()` method — rejected because gg_isopro is a data frame and doesn't carry the fit.
+- Generalising the `method`-column auto-detect to *any* grouping column. Today we reuse `method`; if real friction emerges, generalise in a later release.
+- Exposing `quantiles = FALSE` to the caller. Internally we call both; externally the user gets the unified columns.