|
1 | 1 | ##============================================================================= |
2 | 2 | #' Partial dependence data from a varPro model |
3 | 3 | #' |
4 | | -#' \code{varpro::partialpro} returns one list, with continuous and |
| 4 | +#' \code{varPro::partialpro} returns one list, with continuous and |
5 | 5 | #' categorical predictors mixed together. This function splits that list into |
6 | 6 | #' two tidy data frames, one for each kind, and resolves the y-axis label the |
7 | 7 | #' plot method will use. |
8 | 8 | #' |
9 | | -#' @param part_dta Partial plot data from \code{varpro::partialpro}. Each |
| 9 | +#' @section What partialpro is doing: |
| 10 | +#' A partial dependence curve answers the question, "if I hold a single |
| 11 | +#' variable at a grid of values and average out everything else, how does |
| 12 | +#' the model's prediction move?" That is the same question \code{rfsrc} |
| 13 | +#' partial dependence answers. What \code{varPro::partialpro} adds is two |
| 14 | +#' wrinkles that are worth understanding before you read the curves. |
| 15 | +#' |
| 16 | +#' First, \code{partialpro} filters the partial grid through an isolation |
| 17 | +#' forest (Unlimited Virtual Twins, or UVT) so that unlikely combinations |
| 18 | +#' of the focal variable with the rest of the data are downweighted. The |
| 19 | +#' \code{rfsrc} version, by contrast, averages over the full marginal grid |
| 20 | +#' regardless of plausibility. So when a covariate is highly correlated |
| 21 | +#' with others, the two methods can disagree, and \code{partialpro}'s |
| 22 | +#' curve is the one restricted to the data manifold. |
| 23 | +#' |
| 24 | +#' Second, \code{partialpro} fits a local polynomial model to the |
| 25 | +#' predicted values rather than just plotting their mean. That gives |
| 26 | +#' three parallel curves per variable, stored as \code{yhat.par}, |
| 27 | +#' \code{yhat.nonpar}, and \code{yhat.causal}, which the plot method |
| 28 | +#' overlays so you can see whether a smooth parametric story and the |
| 29 | +#' raw forest predictions are telling you the same thing. |
| 30 | +#' |
| 31 | +#' Interpretation of the y-axis depends on the outcome (per |
| 32 | +#' \code{varPro::partialpro}): response scale for regression, log-odds of |
| 33 | +#' the target class for classification, and either ensemble mortality |
| 34 | +#' (default) or RMST (if the original \code{varpro} call set |
| 35 | +#' \code{rmst}) for survival. |
| 36 | +#' |
| 37 | +#' @section What's in the output: |
| 38 | +#' We split \code{partialpro}'s mixed list into two tidy data frames so |
| 39 | +#' the plot method does not have to. A variable with more than |
| 40 | +#' \code{cat_limit} distinct grid points goes into \code{$continuous}, |
| 41 | +#' one row per grid point with the column means of \code{yhat.par}, |
| 42 | +#' \code{yhat.nonpar}, and \code{yhat.causal} stored as |
| 43 | +#' \code{parametric}, \code{nonparametric}, and \code{causal}. A |
| 44 | +#' variable at or below \code{cat_limit} goes into \code{$categorical}, |
| 45 | +#' one row per observation per category level, carrying the same three |
| 46 | +#' columns unaveraged so the plot method can draw boxplots. Path C |
| 47 | +#' (\code{scale \%in\% c("surv","chf")}) takes a different route: we |
| 48 | +#' hand the underlying \code{rfsrc} forest to \code{gg_partial_rfsrc} so |
| 49 | +#' you get a survival-probability or cumulative-hazard curve on the |
| 50 | +#' usual rfsrc scale instead. |
| 51 | +#' |
| 52 | +#' @section What you use this for: |
| 53 | +#' \itemize{ |
| 54 | +#' \item read the marginal shape of a relationship the varpro model |
| 55 | +#' found important — monotone, threshold, U-shape, flat; |
| 56 | +#' \item compare the three partialpro estimators on the same variable |
| 57 | +#' and flag the ones where parametric and nonparametric disagree — |
| 58 | +#' those are the candidates for closer inspection; |
| 59 | +#' \item report a survival partial dependence on the probability or |
| 60 | +#' cumulative-hazard scale (\code{scale = "surv"} or \code{"chf"}) |
| 61 | +#' rather than the unbounded mortality scale. |
| 62 | +#' } |
| 63 | +#' A varpro partial dependence curve is a description of the model, not |
| 64 | +#' a causal effect. The \code{causal} column is varpro's local |
| 65 | +#' estimator, not a structural causal claim about the data-generating |
| 66 | +#' process. |
| 67 | +#' |
| 68 | +#' @param part_dta Partial plot data from \code{varPro::partialpro}. Each |
10 | 69 | #' element must contain \code{xvirtual}, \code{xorg}, \code{yhat.par}, |
11 | 70 | #' \code{yhat.nonpar}, and \code{yhat.causal}. Supply at least one of |
12 | 71 | #' \code{part_dta} or \code{object}. |
13 | 72 | #' @param object A fitted \code{varpro} object, the forest the partial data |
14 | 73 | #' came from. When supplied it provides the provenance metadata, and when |
15 | 74 | #' \code{part_dta} is \code{NULL} it is passed to |
16 | | -#' \code{varpro::partialpro(object)} for you. Required when |
| 75 | +#' \code{varPro::partialpro(object)} for you. Required when |
17 | 76 | #' \code{scale \%in\% c("surv","chf")}. |
18 | 77 | #' @param scale Character; sets the y-axis label and, for survival forests, |
19 | 78 | #' the output type. One of \code{"auto"} (default), \code{"mortality"}, |
|
0 commit comments