Skip to content

Commit 6e64c20

Browse files
ehrlingerclaude
andauthored
varPro Phase 2: gg_varpro — honest boxplot + faithful + conditional (#85) (#85)
* docs: add varPro Phase 1 design spec (gg_partial_varpro) Captures the approved Phase 1 design for gg_partial_varpro: rename/deprecation shim, extended signature with scale + time args, survival A+C paths, provenance attribute, honest mortality @details roxygen block, and full file/test/pkgdown change inventory. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: add varPro Phase 1 implementation plan (gg_partial_varpro) 8-task plan: branch open, A-path extractor (TDD), varPro Imports promotion, deprecation shim, plot method with honest mortality label, autoplot/print/summary methods, C-path survival extension, vdiffr snapshots, and final CRAN gate + PR. All steps have complete code. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: add varPro Phase 2 (gg_varpro) design spec Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: add varPro Phase 2 (gg_varpro) implementation plan Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: open v2.7.3.9003 dev cycle (Phase 2 gg_varpro) * feat: add gg_varpro extractor with honest box stats (TDD) Implements gg_varpro() which extracts per-tree importance from varpro objects, builds honest box statistics (q05/q15/q85/q95), and supports classification conditional importance. Adapts design to actual varPro::importance() API (returns data.frame, not list with imp.tree); reconstructs per-tree matrix from object$results using the same aggregation as the varPro workhorse. 22 tests pass, 0 lintr warnings, 0 new R CMD check errors/warnings. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(gg_varpro): correct z-normalisation formula + add alignment test varPro's importance() uses z_j = mean(imp_ij) / sd_j (no sqrt(ntree)), so .varpro_imp_stats divides by sd_j only (not sd_j/sqrt(ntree)). Adds test verifying mean(z_ij) ≈ aggregate z_j within tolerance 0.5. Also bumps NEWS.md header to 2.7.3.9003. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(gg_varpro): faithful/local.std coercion, nvar by median z, provenance - .validate_varpro_imp_inputs: add faithful param; coerce local.std=FALSE with message() when faithful=TRUE (raw scale required for mean-dot glyph) - provenance$local.std records resolved value after coercion - nvar truncation now sorts by median z (spec requirement), not aggregate z - Add test: faithful=TRUE + local.std=TRUE emits message, provenance=FALSE Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(gg_varpro): code quality fixes from reviewer - Fix row-order/factor-level mismatch: reorder imp_df & stats_df rows to match factor levels after nvar truncation - Add comment explaining modular decode stability in .build_varpro_imp_tree - Fix @param nvar: "median z" (not "aggregate z") - Fix @return $imp: "Summary data frame" (not "Long tidy") - Fix @Seealso: remove task-management language - Tighten z-alignment test tolerance 0.5 -> 0.1 - Add tests: nvar > p, local.std=FALSE raw-scale path Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: add plot.gg_varpro — honest boxplot + faithful + conditional (TDD) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(plot.gg_varpro): jitter axis, conditional message, internal tags - geom_jitter: width=0.15, height=0 (jitters within categorical band, not along importance axis — coord_flip inverts width/height semantics) - Add message() when conditional plot ignores a non-default type argument - Add @Keywords internal to .plot_varpro_main, .plot_varpro_conditional, .varpro_imp_ylabel helpers Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: add autoplot/print/summary S3 methods for gg_varpro (TDD) Implements autoplot.gg_varpro, print.gg_varpro, and summary.gg_varpro following the existing gg_partial_varpro pattern; 40 tests pass, 0 errors. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test: add vdiffr snapshots for gg_varpro (default, faithful, conditional) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(plot.gg_varpro): replace NA/NaN conditional z with 0, update snapshot geom_col suppresses bars silently for NA y values and emits remove_missing warnings. Coerce non-finite z to 0 before plotting to eliminate 23 spurious warnings. Update gg-varpro-conditional.svg baseline to match. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: update pkgdown and NEWS for gg_varpro (v2.7.3.9003) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: lint and brittle local.std test for CI (R 4.6.0 Linux) - Split @importFrom ggplot2 line in plot.gg_varpro.R (159 chars → 3 lines of ≤76 chars each) to satisfy the 120-char line_length_linter. - Replace brittle `local.std=FALSE produces raw-scale stats` test: the old test asserted medians differ between local.std=TRUE/FALSE, which fails when column SDs happen to be ≈ 1 (can occur on Linux R 4.6.0 with ntree=25). New mechanistic test uses faithful=TRUE to obtain the raw imp.tree matrix, then directly verifies: • local.std=FALSE: stats$median == median(imp.tree[, var]) • local.std=TRUE: stats$median == median(imp.tree[, var] / sd_j) This is platform-independent and tests the actual computation paths. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: address Copilot review — scale consistency in plot.gg_varpro Copilot identified three related scale-consistency bugs and one doc error: 1. Remove local.std coercion when faithful=TRUE (gg_varpro.R) Previously faithful=TRUE silently forced local.std=FALSE, causing $stats to be on the raw scale while plot() defaulted to type='z' (mislabeled axis, cutoff line in wrong units). Removed the coercion; faithful=TRUE + local.std=TRUE now works correctly — both $stats and the overlay are on the z scale. 2. Auto-detect type from provenance (plot.gg_varpro.R) plot.gg_varpro() now infers type='z' when local.std=TRUE and type='raw' when local.std=FALSE, eliminating the axis/cutoff mismatch for callers who don't specify type explicitly. Explicit type that conflicts with local.std still raises an error (both directions: raw+local.std=TRUE and z+local.std=FALSE). 3. Make faithful overlay type-aware (plot.gg_varpro.R) The jitter/mean-dot overlay previously always used z-scale regardless of type, so box (raw) and points (z) were on different scales. Overlay now uses imp_raw/sd_j for type='z' and imp_raw for type='raw', matching the box stats in all cases. 4. Fix z-formula in plan doc (dev/plans/…) Architecture line still said z_ij = imp_ij * sqrt(ntree) / sd_j; corrected to z_ij = imp_ij / sd_j (consistent with implementation and tests). Tests: updated coercion test → valid-with-local.std=TRUE assertion; added type='z'+local.std=FALSE error test. Regenerated gg-varpro- faithful.svg snapshot (now z-scale, consistent with box). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: regenerate plot.gg_varpro.Rd to match missing-type signature Codoc check failed because the Rd file still declared `type = c("z", "raw")` while the code uses `type` with no default (auto-detected via missing(type) from provenance). Ran devtools::document() to sync the \usage line. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent eba71c4 commit 6e64c20

26 files changed

Lines changed: 4816 additions & 2 deletions

DESCRIPTION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
Package: ggRandomForests
22
Type: Package
33
Title: Visually Exploring Random Forests
4-
Version: 2.7.3.9002
4+
Version: 2.7.3.9003
55
Date: 2026-05-20
66
Authors@R: person("John", "Ehrlinger",
77
role = c("aut", "cre"),

NAMESPACE

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ S3method(autoplot,gg_rfsrc)
1010
S3method(autoplot,gg_roc)
1111
S3method(autoplot,gg_survival)
1212
S3method(autoplot,gg_variable)
13+
S3method(autoplot,gg_varpro)
1314
S3method(autoplot,gg_vimp)
1415
S3method(calc_roc,randomForest)
1516
S3method(calc_roc,rfsrc)
@@ -38,6 +39,7 @@ S3method(plot,gg_rfsrc)
3839
S3method(plot,gg_roc)
3940
S3method(plot,gg_survival)
4041
S3method(plot,gg_variable)
42+
S3method(plot,gg_varpro)
4143
S3method(plot,gg_vimp)
4244
S3method(print,gg_brier)
4345
S3method(print,gg_error)
@@ -49,6 +51,7 @@ S3method(print,gg_rfsrc)
4951
S3method(print,gg_roc)
5052
S3method(print,gg_survival)
5153
S3method(print,gg_variable)
54+
S3method(print,gg_varpro)
5255
S3method(print,gg_vimp)
5356
S3method(print,summary.gg)
5457
S3method(summary,gg_brier)
@@ -61,6 +64,7 @@ S3method(summary,gg_rfsrc)
6164
S3method(summary,gg_roc)
6265
S3method(summary,gg_survival)
6366
S3method(summary,gg_variable)
67+
S3method(summary,gg_varpro)
6468
S3method(summary,gg_vimp)
6569
export(calc_auc)
6670
export(calc_roc)
@@ -74,6 +78,7 @@ export(gg_rfsrc)
7478
export(gg_roc)
7579
export(gg_survival)
7680
export(gg_variable)
81+
export(gg_varpro)
7782
export(gg_vimp)
7883
export(kaplan)
7984
export(nelson)
@@ -87,13 +92,21 @@ importFrom(dplyr,select)
8792
importFrom(ggplot2,.data)
8893
importFrom(ggplot2,aes)
8994
importFrom(ggplot2,autoplot)
95+
importFrom(ggplot2,coord_flip)
9096
importFrom(ggplot2,facet_wrap)
9197
importFrom(ggplot2,geom_boxplot)
98+
importFrom(ggplot2,geom_col)
99+
importFrom(ggplot2,geom_hline)
100+
importFrom(ggplot2,geom_jitter)
92101
importFrom(ggplot2,geom_line)
102+
importFrom(ggplot2,geom_point)
93103
importFrom(ggplot2,geom_ribbon)
104+
importFrom(ggplot2,geom_vline)
94105
importFrom(ggplot2,ggplot)
95106
importFrom(ggplot2,labs)
107+
importFrom(ggplot2,scale_fill_manual)
96108
importFrom(ggplot2,theme)
109+
importFrom(ggplot2,theme_minimal)
97110
importFrom(parallel,mclapply)
98111
importFrom(patchwork,wrap_plots)
99112
importFrom(randomForest,randomForest)
@@ -116,4 +129,5 @@ importFrom(tidyr,all_of)
116129
importFrom(tidyr,pivot_longer)
117130
importFrom(utils,head)
118131
importFrom(utils,tail)
132+
importFrom(varPro,importance)
119133
importFrom(varPro,partialpro)

NEWS.md

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,22 @@
11
Package: ggRandomForests
2-
Version: 2.7.3.9002
2+
Version: 2.7.3.9003
3+
4+
ggRandomForests v2.8.0 (development) — continued
5+
=================================================
6+
* **varPro variable importance: `gg_varpro()` (#85).**
7+
- `gg_varpro()` extracts per-tree importance scores from a fitted
8+
`varpro` object and renders an honest boxplot — hinges at the
9+
15th/85th percentile, whiskers at the 5th/95th — of the per-tree
10+
z-score distribution per variable. Variables whose aggregate
11+
z > `cutoff` (default 0.79) are colour-highlighted.
12+
- `faithful = TRUE` overlays individual per-tree z-scores as jittered
13+
semi-transparent points with a white-outlined mean dot, reproducing
14+
the distributional view from varPro's internal `bxp` output.
15+
- `conditional = TRUE` (classification forests only) extracts
16+
`$conditional.z` and renders class-conditional importance as a
17+
`facet_wrap(~class, nrow=1)` bar chart.
18+
- `local.std = FALSE` enables `plot(..., type = "raw")` to display
19+
raw per-tree importance instead of z-normalised values.
320

421
ggRandomForests v2.8.0 (development)
522
====================================

R/autoplot_methods.R

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,7 @@ NULL
3636
#' \item{\code{gg_partial_rfsrc}}{Partial dependence (via \code{partial.rfsrc})}
3737
#' \item{\code{gg_partial_varpro}}{Partial dependence (via \code{varPro})}
3838
#' \item{\code{gg_partialpro}}{Partial dependence via \code{varPro} (deprecated alias)}
39+
#' \item{\code{gg_varpro}}{Variable importance from \code{varPro}}
3940
#' \item{\code{gg_roc}}{ROC curve}
4041
#' \item{\code{gg_survival}}{Survival / cumulative hazard curves}
4142
#' \item{\code{gg_brier}}{Time-resolved Brier score and CRPS}
@@ -122,3 +123,9 @@ autoplot.gg_survival <- function(object, ...) {
122123
autoplot.gg_brier <- function(object, ...) {
123124
plot(object, ...)
124125
}
126+
127+
#' @rdname autoplot.gg
128+
#' @export
129+
autoplot.gg_varpro <- function(object, ...) {
130+
plot(object, ...)
131+
}

R/gg_varpro.R

Lines changed: 269 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,269 @@
1+
##=============================================================================
2+
#' Variable importance data from a varPro model
3+
#'
4+
#' Extracts per-tree importance scores from a fitted \code{varpro} object,
5+
#' summarises them into an honest boxplot-ready data structure (hinges at
6+
#' 15th/85th percentile, whiskers at 5th/95th), and optionally retains
7+
#' class-conditional importance for classification forests.
8+
#'
9+
#' @param object A fitted \code{varpro} object (required).
10+
#' @param local.std Logical; default \code{TRUE}. When \code{TRUE} the
11+
#' per-tree importances are normalised to z-scale before computing box
12+
#' statistics. Set to \code{FALSE} to retain the raw importance scale
13+
#' (required for \code{type = "raw"} in \code{plot.gg_varpro}).
14+
#' @param cutoff Numeric z-score threshold for variable selection; default
15+
#' \code{0.79}. Variables with aggregate z > cutoff are flagged
16+
#' \code{selected = TRUE} in \code{$imp}.
17+
#' @param faithful Logical; default \code{FALSE}. When \code{TRUE},
18+
#' \code{$imp.tree} is retained so \code{plot.gg_varpro} can
19+
#' overlay per-tree jitter points in \code{plot.gg_varpro}.
20+
#' @param conditional Logical; default \code{FALSE}. When \code{TRUE}
21+
#' (classification forests only) extracts the \code{$conditional.z}
22+
#' matrix and stores it as \code{$conditional}.
23+
#' @param nvar Integer; retain only the top \code{nvar} variables (by
24+
#' median per-tree z) after applying the cutoff filter. \code{NULL} keeps all.
25+
#' @param ... Additional arguments passed to \code{varPro::importance()}.
26+
#'
27+
#' @return A named list of class \code{"gg_varpro"} with elements:
28+
#' \describe{
29+
#' \item{\code{$imp}}{Summary data frame: \code{variable} (factor with
30+
#' levels ordered by descending median per-tree z), \code{z} (aggregate
31+
#' z-score from \code{importance()}), \code{selected} (logical,
32+
#' \code{z > cutoff}).}
33+
#' \item{\code{$imp.tree}}{\code{NULL} when \code{faithful = FALSE};
34+
#' otherwise an ntree x p matrix of per-tree importance values.}
35+
#' \item{\code{$stats}}{Per-variable summary: \code{variable},
36+
#' \code{median}, \code{q05}, \code{q15}, \code{q85}, \code{q95}
37+
#' (on z-scale when \code{local.std = TRUE}, raw when \code{FALSE}),
38+
#' plus \code{mean} (raw importance mean, always stored).}
39+
#' \item{\code{$conditional}}{\code{NULL} when \code{conditional = FALSE};
40+
#' otherwise a data frame with columns \code{variable}, \code{class},
41+
#' \code{z} (one row per variable x class combination).}
42+
#' }
43+
#' A \code{"provenance"} attribute carries \code{family}, \code{local.std},
44+
#' \code{cutoff}, \code{faithful}, \code{conditional}, \code{xvar.names},
45+
#' and \code{n}.
46+
#'
47+
#' @seealso \code{\link{plot.gg_varpro}}, \code{\link{gg_vimp}}
48+
#'
49+
#' @examples
50+
#' \donttest{
51+
#' set.seed(42)
52+
#' vp <- varPro::varpro(mpg ~ ., data = mtcars, ntree = 50)
53+
#' gg <- gg_varpro(vp)
54+
#' print(gg)
55+
#' plot(gg)
56+
#' }
57+
#'
58+
#' @importFrom varPro importance
59+
#' @export
60+
gg_varpro <- function(object,
61+
local.std = TRUE,
62+
cutoff = 0.79,
63+
faithful = FALSE,
64+
conditional = FALSE,
65+
nvar = NULL,
66+
...) {
67+
68+
## ---- Validation + coercion ------------------------------------------------
69+
local.std <- .validate_varpro_imp_inputs(object, local.std, faithful, conditional)
70+
71+
## ---- Call importance (local.std=FALSE gives mean & std columns) -----------
72+
imp_out <- varPro::importance(object, local.std = FALSE, ...)
73+
74+
## ---- Build per-tree importance matrix from object$results -----------------
75+
imp_tree_mat <- .build_varpro_imp_tree(object)
76+
77+
## ---- Build tidy data structures ------------------------------------------
78+
dfs <- .build_varpro_imp_dfs(imp_out, imp_tree_mat, object$family,
79+
cutoff, nvar, faithful, local.std, conditional)
80+
81+
## ---- Assemble result ------------------------------------------------------
82+
result <- structure(
83+
list(
84+
imp = dfs$imp,
85+
imp.tree = dfs$imp_tree,
86+
stats = dfs$stats,
87+
conditional = dfs$conditional
88+
),
89+
class = c("gg_varpro", "list")
90+
)
91+
92+
attr(result, "provenance") <- list(
93+
family = object$family,
94+
local.std = local.std, # resolved value (may differ from arg when faithful=TRUE)
95+
cutoff = cutoff,
96+
faithful = faithful,
97+
conditional = conditional,
98+
xvar.names = object$xvar.names,
99+
n = nrow(object$x)
100+
)
101+
102+
result
103+
}
104+
105+
## ---- Internal helpers -------------------------------------------------------
106+
107+
#' @keywords internal
108+
.validate_varpro_imp_inputs <- function(object, local.std, faithful, conditional) {
109+
if (missing(object) || is.null(object)) {
110+
stop("'object' must be a fitted varpro object.", call. = FALSE)
111+
}
112+
if (!inherits(object, "varpro")) {
113+
stop("'object' must be a varpro fit (class \"varpro\").", call. = FALSE)
114+
}
115+
if (conditional && !identical(object$family, "class")) {
116+
stop("conditional=TRUE requires a classification forest ",
117+
"(object$family == \"class\").", call. = FALSE)
118+
}
119+
invisible(local.std)
120+
}
121+
122+
#' Build per-tree importance matrix from varpro object results
123+
#'
124+
#' Reconstructs the ntree x p per-tree importance matrix from
125+
#' \code{object$results}, replicating the aggregation performed internally
126+
#' by \code{varPro:::.importance.varpro.workhorse}.
127+
#' @keywords internal
128+
.build_varpro_imp_tree <- function(object) {
129+
dta <- object$results
130+
xvar.names <- object$xvar.names
131+
132+
dta$variable <- factor(xvar.names[dta$variable])
133+
xvarused.names <- levels(dta$variable)
134+
p <- length(xvarused.names)
135+
wt.vec <- rep(1, p)
136+
names(wt.vec) <- xvarused.names
137+
138+
trees <- sort(unique(dta$tree))
139+
ntree.used <- length(trees)
140+
tree.idx <- match(dta$tree, trees)
141+
var.idx <- match(as.character(dta$variable), xvarused.names)
142+
grp <- tree.idx + (var.idx - 1L) * ntree.used
143+
144+
w <- dta$n.oob
145+
numer <- dta$imp * w
146+
numer[is.na(numer)] <- 0
147+
numer.sum <- rowsum(numer, grp, reorder = FALSE)
148+
denom.sum <- rowsum(w, grp, reorder = FALSE)
149+
ratio <- as.numeric(numer.sum[, 1L] / denom.sum[, 1L])
150+
ratio[!is.finite(ratio)] <- 0
151+
152+
imp_tree <- matrix(0, nrow = ntree.used, ncol = p,
153+
dimnames = list(as.character(trees), xvarused.names))
154+
## g.rows are the grp keys (1..ntree.used*p); decode back to (tree, var) indices.
155+
## reorder=FALSE preserves insertion order but rownames() always give the
156+
## original grp key values, so the modular decode is stable regardless of order.
157+
g.rows <- as.integer(rownames(numer.sum))
158+
imp_tree[cbind(((g.rows - 1L) %% ntree.used) + 1L,
159+
((g.rows - 1L) %/% ntree.used) + 1L)] <- ratio
160+
sweep(imp_tree, 2L, wt.vec, `*`)
161+
}
162+
163+
#' Compute per-variable box statistics from a per-tree importance matrix
164+
#'
165+
#' When \code{local.std = TRUE} the columns of \code{mat} are standardised
166+
#' to unit variance before computing quantiles so that the display scale
167+
#' matches the aggregate z-score. The \code{mean} column always stores raw
168+
#' (unstandardised) column means.
169+
#' @keywords internal
170+
.varpro_imp_stats <- function(mat, local.std = TRUE) {
171+
if (local.std) {
172+
sd_j <- apply(mat, 2L, stats::sd, na.rm = TRUE)
173+
sd_j[sd_j < .Machine$double.eps] <- 1 # guard zero-variance columns
174+
# z_ij = imp_ij / sd_j so that mean(z_ij) == aggregate z_j from importance()
175+
display_mat <- sweep(mat, 2L, sd_j, FUN = "/")
176+
} else {
177+
display_mat <- mat
178+
}
179+
data.frame(
180+
variable = colnames(mat),
181+
median = apply(display_mat, 2L, stats::quantile, probs = 0.50,
182+
na.rm = TRUE),
183+
q05 = apply(display_mat, 2L, stats::quantile, probs = 0.05,
184+
na.rm = TRUE),
185+
q15 = apply(display_mat, 2L, stats::quantile, probs = 0.15,
186+
na.rm = TRUE),
187+
q85 = apply(display_mat, 2L, stats::quantile, probs = 0.85,
188+
na.rm = TRUE),
189+
q95 = apply(display_mat, 2L, stats::quantile, probs = 0.95,
190+
na.rm = TRUE),
191+
mean = colMeans(mat, na.rm = TRUE),
192+
stringsAsFactors = FALSE,
193+
row.names = NULL
194+
)
195+
}
196+
197+
#' Build the tidy importance data structures from importance() output
198+
#' @keywords internal
199+
.build_varpro_imp_dfs <- function(imp_out, imp_tree_mat, family, cutoff, nvar,
200+
faithful, local.std, conditional) {
201+
is_class <- identical(family, "class")
202+
203+
## Unpack importance() data frame — different shapes per family
204+
if (is_class && is.list(imp_out) && !is.data.frame(imp_out)) {
205+
imp_df_raw <- imp_out$unconditional
206+
cond_mat <- if (conditional) imp_out$conditional.z else NULL
207+
} else {
208+
imp_df_raw <- imp_out
209+
cond_mat <- NULL
210+
}
211+
212+
## Replace NaN z with 0 for sorting/selection
213+
z_vec <- imp_df_raw$z
214+
z_vec[!is.finite(z_vec)] <- 0
215+
216+
## $imp: one row per variable
217+
imp_df <- data.frame(
218+
variable = rownames(imp_df_raw),
219+
z = as.numeric(z_vec),
220+
selected = as.numeric(z_vec) > cutoff,
221+
stringsAsFactors = FALSE
222+
)
223+
224+
## Keep only columns present in both imp_df and imp_tree_mat
225+
keep_vars <- imp_df$variable[imp_df$variable %in% colnames(imp_tree_mat)]
226+
imp_tree_k <- imp_tree_mat[, keep_vars, drop = FALSE]
227+
228+
## $stats: box quantiles per variable (full set before nvar truncation)
229+
stats_df <- .varpro_imp_stats(imp_tree_k, local.std = local.std)
230+
231+
## Apply nvar truncation by median z (spec: "top-nvar by median z")
232+
if (!is.null(nvar)) {
233+
top_vars <- stats_df$variable[order(-stats_df$median)][
234+
seq_len(min(nvar, nrow(stats_df)))]
235+
imp_df <- imp_df[imp_df$variable %in% top_vars, , drop = FALSE]
236+
stats_df <- stats_df[stats_df$variable %in% top_vars, , drop = FALSE]
237+
}
238+
239+
## Order $imp and $stats rows + factor levels by descending median z
240+
var_order <- stats_df$variable[order(-stats_df$median)]
241+
imp_df$variable <- factor(imp_df$variable, levels = var_order)
242+
stats_df$variable <- factor(stats_df$variable, levels = var_order)
243+
## Reorder rows to match factor levels (avoids row-order / level mismatch)
244+
imp_df <- imp_df[order(imp_df$variable), , drop = FALSE]
245+
stats_df <- stats_df[order(stats_df$variable), , drop = FALSE]
246+
rownames(imp_df) <- NULL
247+
rownames(stats_df) <- NULL
248+
249+
## $conditional: tidy class-conditional z-scores
250+
cond_df <- NULL
251+
if (!is.null(cond_mat)) {
252+
cond_df <- data.frame(
253+
variable = rep(rownames(cond_mat), ncol(cond_mat)),
254+
class = rep(colnames(cond_mat), each = nrow(cond_mat)),
255+
z = as.vector(cond_mat),
256+
stringsAsFactors = FALSE
257+
)
258+
cond_df <- cond_df[cond_df$variable %in% keep_vars, , drop = FALSE]
259+
cond_df$variable <- factor(cond_df$variable,
260+
levels = levels(imp_df$variable))
261+
}
262+
263+
list(
264+
imp = imp_df,
265+
imp_tree = if (faithful) imp_tree_k else NULL,
266+
stats = stats_df,
267+
conditional = cond_df
268+
)
269+
}

0 commit comments

Comments
 (0)