feat: gg_isopro — varPro Phase 4 anomaly-score wrapper#94
Merged
Conversation
First of three Phase 4 sub-projects (isopro -> beta.varpro -> ivarpro).
Tidy-data wrapper + plot method for varPro::isopro anomaly scores.
Single fit per call (Phase 1-3 pattern), patchwork of elbow + density
with panel=c('both','elbow','density'); threshold/top_n_pct annotation
with threshold-wins precedence; ground-truth evaluation deferred.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #94 +/- ##
==========================================
+ Coverage 87.31% 87.62% +0.31%
==========================================
Files 38 40 +2
Lines 3097 3193 +96
==========================================
+ Hits 2704 2798 +94
- Misses 393 395 +2
🚀 New features to boost your workflow:
|
… use it
A core goal of the v2.8.0 varPro integration is to make the package self-
teaching: a reader who has not used varPro before should learn what each
function is doing and what they would use it for, just from the help page.
The first pass on gg_isopro was correctly written in voice but too thin;
it described the mechanics of the wrapper without explaining the method.
Add to gg_isopro:
- "What isopro is doing" — isolation forests, geometric intuition for
why shallow depth means anomalous, what the three methods (rnd /
unsupv / auto) actually do and how they differ.
- "What's in the output" — case.depth vs howbad: raw depth and its
[0,1]-rescaled cousin, why both are kept.
- "What you use this for" — screening for data-entry errors, cohort-
distribution checks, ranked review lists. The score is a rank, not
a probability.
- Liu/Ting/Zhou 2008 reference.
Add to plot.gg_isopro:
- "Reading the elbow" — the bend is the cutoff; the plot is for seeing
where it is, not reading single scores.
- "Reading the density" — single mode + thin right tail is the
picture; bimodal means two populations.
- "Comparing methods" — agreement vs divergence across rnd/unsupv/auto
is the actual signal.
Voice-only expansion; no API or behavioural change.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2 tasks
pkgdown's build_reference_index() requires every exported topic to be listed in _pkgdown.yml. Added an 'Anomaly Detection' section after Variable Importance so gg_isopro and plot.gg_isopro are indexed; this unblocks the pkgdown CI job on PR #94.
There was a problem hiding this comment.
Pull request overview
Adds Phase 4 support for varPro::isopro anomaly-score workflows by introducing a tidy extractor (gg_isopro) plus plot/print/summary/autoplot S3 methods and accompanying tests/snapshots/docs.
Changes:
- Introduces
gg_isopro()(S3 generic +isopromethod) returning a tidygg_isoprodata frame with provenance. - Adds
plot.gg_isopro()to render ranked “elbow” and score density panels (patchwork composite or single-panel viapanel=), including optional threshold annotation and method-aware colouring. - Adds
print.gg_isopro(),summary.gg_isopro(),autoplot.gg_isopro()plus testthat coverage and vdiffr snapshots; updates docs, pkgdown reference, and version/NEWS.
Reviewed changes
Copilot reviewed 13 out of 18 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/testthat/test_snapshots.R | Adds vdiffr snapshots for gg_isopro plots (guarded by env var + vdiffr availability). |
| tests/testthat/test_gg_isopro.R | New unit tests for extractor output, plot return shapes, threshold behavior, method grouping, and S3 companions. |
| R/summary_methods.R | Adds summary.gg_isopro() implementation. |
| R/print_methods.R | Adds print.gg_isopro() implementation. |
| R/plot.gg_isopro.R | New plot method + helpers for elbow/density panels and threshold resolution. |
| R/gg_isopro.R | New extractor generic + gg_isopro.isopro() implementation and documentation. |
| R/autoplot_methods.R | Adds autoplot.gg_isopro() method. |
| NEWS.md | Documents new feature and bumps dev version string. |
| NAMESPACE | Registers new S3 methods and exports gg_isopro. |
| man/summary.gg.Rd | Adds alias/usage for summary.gg_isopro. |
| man/print.gg.Rd | Adds alias/usage for print.gg_isopro. |
| man/plot.gg_isopro.Rd | New Rd for plot.gg_isopro. |
| man/gg_isopro.Rd | New Rd for gg_isopro. |
| man/autoplot.gg.Rd | Adds alias/usage for autoplot.gg_isopro. |
| dev/plans/2026-05-26-varpro-phase4-gg-isopro-plan.md | Adds implementation plan (internal dev artifact). |
| dev/plans/2026-05-26-varpro-phase4-gg-isopro-design.md | Adds design spec (internal dev artifact). |
| DESCRIPTION | Bumps package version to 2.7.3.9008. |
| _pkgdown.yml | Adds “Anomaly Detection” reference section for pkgdown site. |
Files not reviewed (5)
- man/autoplot.gg.Rd: Language not supported
- man/gg_isopro.Rd: Language not supported
- man/plot.gg_isopro.Rd: Language not supported
- man/print.gg.Rd: Language not supported
- man/summary.gg.Rd: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- autoplot.gg_isopro uses plot() generic for S3 dispatch
- roxygen converted from markdown to Rd-style (\code{} / \link{})
- .resolve_isopro_threshold validates threshold in [0,1] and top_n_pct in (0,100)
- panel='both' uses patchwork::wrap_plots() for consistency
ehrlinger
added a commit
that referenced
this pull request
May 26, 2026
Add Roxygen: list(markdown = TRUE) to DESCRIPTION so devtools::document()
auto-converts backticks / [fn()] / [pkg::fn()] in source roxygen to
\code{} / \link{} / \link[pkg]{} in the generated Rd. Existing Rd-style
markup keeps working; both styles now coexist. Saves the manual
conversion work the Copilot review on PR #94 flagged.
Two source-roxygen edits needed to keep R CMD check clean under markdown:
- R/help.R: randomForest[SRC] -> randomForestSRC (markdown read [SRC]
as an unfinished link reference, producing a missing-link warning).
- R/gg_rfsrc.R::bootstrap_survival: 95\% -> 95% (markdown over-escaped
the backslash, producing a malformed Rd with shifted section order).
Regenerates all 31 Rd files. No functional or rendered-content change.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…locomp lint The validation Copilot asked for in the previous round pushed .resolve_isopro_threshold to cyclomatic complexity 38, well over the project's 20-line budget. Factor the per-argument checks into a small helper (.check_threshold_arg) parameterised by name/lo/hi/closure; .resolve_isopro_threshold now reads as the three-branch decision it actually is. Same external behaviour; 43 tests still pass.
ehrlinger
added a commit
that referenced
this pull request
May 26, 2026
Add Roxygen: list(markdown = TRUE) to DESCRIPTION so devtools::document()
auto-converts backticks / [fn()] / [pkg::fn()] in source roxygen to
\code{} / \link{} / \link[pkg]{} in the generated Rd. Existing Rd-style
markup keeps working; both styles now coexist. Saves the manual
conversion work the Copilot review on PR #94 flagged.
Two source-roxygen edits needed to keep R CMD check clean under markdown:
- R/help.R: randomForest[SRC] -> randomForestSRC (markdown read [SRC]
as an unfinished link reference, producing a missing-link warning).
- R/gg_rfsrc.R::bootstrap_survival: 95\% -> 95% (markdown over-escaped
the backslash, producing a malformed Rd with shifted section order).
Regenerates all 31 Rd files. No functional or rendered-content change.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ehrlinger
added a commit
that referenced
this pull request
May 26, 2026
…pro, gg_udependent) (#95) * docs(gg_partial_varpro): teach what varPro partialpro is doing * docs(gg_varpro): teach what varpro variable priority is doing * docs(gg_udependent): teach what cross-variable dependency is doing * chore: open v2.7.3.9009 + NEWS for varPro pedagogical doc audit * docs: enable roxygen2 markdown package-wide Add Roxygen: list(markdown = TRUE) to DESCRIPTION so devtools::document() auto-converts backticks / [fn()] / [pkg::fn()] in source roxygen to \code{} / \link{} / \link[pkg]{} in the generated Rd. Existing Rd-style markup keeps working; both styles now coexist. Saves the manual conversion work the Copilot review on PR #94 flagged. Two source-roxygen edits needed to keep R CMD check clean under markdown: - R/help.R: randomForest[SRC] -> randomForestSRC (markdown read [SRC] as an unfinished link reference, producing a missing-link warning). - R/gg_rfsrc.R::bootstrap_survival: 95\% -> 95% (markdown over-escaped the backslash, producing a malformed Rd with shifted section order). Regenerates all 31 Rd files. No functional or rendered-content change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: address PR #95 Copilot review - plot.gg_udependent: clarify that truly isolated nodes are dropped by gg_udependent() before plotting; reword 'Isolated' as 'low-degree'. - gg_partial_varpro: fix varpro::partialpro -> varPro::partialpro (six instances) so \link{} renders correctly. - plot.gg_varpro: clarify the cutoff line lives in z-units on the default axis and in raw-importance units when type='raw'; the numeric is the same, the scale is not. - gg_varpro reference: complete the dangling 'arXiv 2409.' with the full arXiv:2409.09003 identifier and an https://arxiv.org link. * docs: regenerate gg_isopro Rd under markdown mode post-rebase --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3 tasks
ehrlinger
added a commit
that referenced
this pull request
May 26, 2026
Address Copilot review on PR #96: placing newdata as the 2nd positional argument would change positional matching for any caller of the PR #94 signature gg_isopro(object, ...). Moving newdata after ... means it can only be matched by name, so existing positional calls are unaffected. All tests already pass newdata by name; no test changes needed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ehrlinger
added a commit
that referenced
this pull request
May 26, 2026
…(+ training-path polarity fix) (#96) * docs: design spec for varPro Phase 4 — predict.isopro wrapper Second sub-project of Phase 4 (gg_beta_varpro and gg_ivarpro come after). Adds a newdata argument to gg_isopro() so a fitted isopro model can score new observations into the same tidy gg_isopro frame. The polarity flip between varPro's predict.isopro (smaller = anomalous) and the package's howbad (higher = anomalous) is hidden inside the wrapper; the column is semantically the same whether you score training or test data. Train/test overlay reuses the existing method-column auto-detect in plot.gg_isopro, explicitly documented. * docs: sharpen polarity language in predict.isopro spec After review discussion: rename the 'Polarity reminder' section to 'Polarity: how the wrapper presents both conventions' and rewrite it so it explicitly names that case.depth keeps varPro's native polarity while howbad carries the flipped version. Documentation section gains a concrete-code-form requirement so the implementer writes the transformation as 'howbad = 1 - predict(fit, newdata, quantiles=TRUE)' in the roxygen. Same design (Option A), clearer framing. * docs: implementation plan for varPro Phase 4b predict.isopro wrapper * chore: open v2.7.3.9010 dev cycle (varPro Phase 4b predict.isopro) * feat(gg_isopro): newdata argument for predict.isopro scoring * test(gg_isopro): newdata validation and polarity-flip sanity checks Adds three sanity tests for the predict.isopro path: newdata type validation, training-as-newdata top-5 ordering agreement, and the howbad = 1 - quantile relationship. The top-5 ordering test caught a real polarity bug in the training path: gg_isopro.isopro was returning howbad = object$howbad directly, but varPro's $howbad uses "lower = more anomalous" polarity (it is the quantile of case.depth, low depth = anomalous). The wrapper convention is "higher = more anomalous". Flip the training path the same way the prediction path does (1 - quantile) so train and test scores live on the same polarity. Also drop backticks from the newdata validation error so the regex match in the new tests is unambiguous. * test(gg_isopro): train + test overlay via the method-column path * docs(gg_isopro): document newdata arg and the polarity flip * test: vdiffr snapshot for gg_isopro train+test overlay * docs: NEWS entry for varPro Phase 4b predict.isopro wrapper + training-path polarity fix * refactor(gg_isopro): move newdata after ... for back-compat Address Copilot review on PR #96: placing newdata as the 2nd positional argument would change positional matching for any caller of the PR #94 signature gg_isopro(object, ...). Moving newdata after ... means it can only be matched by name, so existing positional calls are unaffected. All tests already pass newdata by name; no test changes needed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
First of three Phase 4 sub-projects: a tidy-data wrapper and plot method for
varPro::isoproisolation-forest anomaly scores.gg_isopro(fit)returns adata.frame(one row per observation) with columnsobs,case.depth,howbad; classc("gg_isopro", "data.frame"); provenance attribute.plot.gg_isopro()returns a patchwork of a ranked elbow and a score density by default;panel = "elbow"orpanel = "density"returns a single ggplot.threshold(score-space) ortop_n_pct(quantile-space) annotates a reference line; both supplied →thresholdwins with a message.dplyr::bind_rows()the outputs of threegg_isopro()calls with amethodlabel; the plot auto-detects and colours.print/summary/autoplotcompanions.Test plan
devtools::test()— 14 new test_that blocks, 36 expectations, 0 failuresdevtools::check(args = "--as-cran")— 0 errors, 0 warnings, 0 notesVDIFFR_RUN_TESTS=true)Spec: `dev/plans/2026-05-26-varpro-phase4-gg-isopro-design.md`
Plan: `dev/plans/2026-05-26-varpro-phase4-gg-isopro-plan.md`
Next Phase 4 sub-projects: `gg_beta_varpro`, then `gg_ivarpro` (and `predict.isopro` follow-up).
🤖 Generated with Claude Code