Skip to content

feat: gg_isopro — varPro Phase 4 anomaly-score wrapper#94

Merged
ehrlinger merged 14 commits into
mainfrom
feat/varpro-phase4-gg-isopro
May 26, 2026
Merged

feat: gg_isopro — varPro Phase 4 anomaly-score wrapper#94
ehrlinger merged 14 commits into
mainfrom
feat/varpro-phase4-gg-isopro

Conversation

@ehrlinger
Copy link
Copy Markdown
Owner

Summary

First of three Phase 4 sub-projects: a tidy-data wrapper and plot method for varPro::isopro isolation-forest anomaly scores.

  • gg_isopro(fit) returns a data.frame (one row per observation) with columns obs, case.depth, howbad; class c("gg_isopro", "data.frame"); provenance attribute.
  • plot.gg_isopro() returns a patchwork of a ranked elbow and a score density by default; panel = "elbow" or panel = "density" returns a single ggplot. threshold (score-space) or top_n_pct (quantile-space) annotates a reference line; both supplied → threshold wins with a message.
  • Multi-method comparison: dplyr::bind_rows() the outputs of three gg_isopro() calls with a method label; the plot auto-detects and colours.
  • print / summary / autoplot companions.

Test plan

  • devtools::test() — 14 new test_that blocks, 36 expectations, 0 failures
  • devtools::check(args = "--as-cran") — 0 errors, 0 warnings, 0 notes
  • vdiffr snapshots added (skip cleanly without VDIFFR_RUN_TESTS=true)

Spec: `dev/plans/2026-05-26-varpro-phase4-gg-isopro-design.md`
Plan: `dev/plans/2026-05-26-varpro-phase4-gg-isopro-plan.md`

Next Phase 4 sub-projects: `gg_beta_varpro`, then `gg_ivarpro` (and `predict.isopro` follow-up).

🤖 Generated with Claude Code

ehrlinger and others added 10 commits May 26, 2026 12:26
First of three Phase 4 sub-projects (isopro -> beta.varpro -> ivarpro).
Tidy-data wrapper + plot method for varPro::isopro anomaly scores.
Single fit per call (Phase 1-3 pattern), patchwork of elbow + density
with panel=c('both','elbow','density'); threshold/top_n_pct annotation
with threshold-wins precedence; ground-truth evaluation deferred.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented May 26, 2026

Codecov Report

❌ Patch coverage is 97.91667% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 87.62%. Comparing base (af0405c) to head (a73baba).

Files with missing lines Patch % Lines
R/gg_isopro.R 91.30% 2 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main      #94      +/-   ##
==========================================
+ Coverage   87.31%   87.62%   +0.31%     
==========================================
  Files          38       40       +2     
  Lines        3097     3193      +96     
==========================================
+ Hits         2704     2798      +94     
- Misses        393      395       +2     
Files with missing lines Coverage Δ
R/autoplot_methods.R 86.66% <100.00%> (+0.95%) ⬆️
R/plot.gg_isopro.R 100.00% <100.00%> (ø)
R/print_methods.R 89.79% <100.00%> (+0.21%) ⬆️
R/summary_methods.R 92.85% <100.00%> (+0.50%) ⬆️
R/gg_isopro.R 91.30% <91.30%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

… use it

A core goal of the v2.8.0 varPro integration is to make the package self-
teaching: a reader who has not used varPro before should learn what each
function is doing and what they would use it for, just from the help page.
The first pass on gg_isopro was correctly written in voice but too thin;
it described the mechanics of the wrapper without explaining the method.

Add to gg_isopro:
  - "What isopro is doing" — isolation forests, geometric intuition for
    why shallow depth means anomalous, what the three methods (rnd /
    unsupv / auto) actually do and how they differ.
  - "What's in the output" — case.depth vs howbad: raw depth and its
    [0,1]-rescaled cousin, why both are kept.
  - "What you use this for" — screening for data-entry errors, cohort-
    distribution checks, ranked review lists. The score is a rank, not
    a probability.
  - Liu/Ting/Zhou 2008 reference.

Add to plot.gg_isopro:
  - "Reading the elbow" — the bend is the cutoff; the plot is for seeing
    where it is, not reading single scores.
  - "Reading the density" — single mode + thin right tail is the
    picture; bimodal means two populations.
  - "Comparing methods" — agreement vs divergence across rnd/unsupv/auto
    is the actual signal.

Voice-only expansion; no API or behavioural change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
pkgdown's build_reference_index() requires every exported topic to be
listed in _pkgdown.yml. Added an 'Anomaly Detection' section after
Variable Importance so gg_isopro and plot.gg_isopro are indexed; this
unblocks the pkgdown CI job on PR #94.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Phase 4 support for varPro::isopro anomaly-score workflows by introducing a tidy extractor (gg_isopro) plus plot/print/summary/autoplot S3 methods and accompanying tests/snapshots/docs.

Changes:

  • Introduces gg_isopro() (S3 generic + isopro method) returning a tidy gg_isopro data frame with provenance.
  • Adds plot.gg_isopro() to render ranked “elbow” and score density panels (patchwork composite or single-panel via panel=), including optional threshold annotation and method-aware colouring.
  • Adds print.gg_isopro(), summary.gg_isopro(), autoplot.gg_isopro() plus testthat coverage and vdiffr snapshots; updates docs, pkgdown reference, and version/NEWS.

Reviewed changes

Copilot reviewed 13 out of 18 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tests/testthat/test_snapshots.R Adds vdiffr snapshots for gg_isopro plots (guarded by env var + vdiffr availability).
tests/testthat/test_gg_isopro.R New unit tests for extractor output, plot return shapes, threshold behavior, method grouping, and S3 companions.
R/summary_methods.R Adds summary.gg_isopro() implementation.
R/print_methods.R Adds print.gg_isopro() implementation.
R/plot.gg_isopro.R New plot method + helpers for elbow/density panels and threshold resolution.
R/gg_isopro.R New extractor generic + gg_isopro.isopro() implementation and documentation.
R/autoplot_methods.R Adds autoplot.gg_isopro() method.
NEWS.md Documents new feature and bumps dev version string.
NAMESPACE Registers new S3 methods and exports gg_isopro.
man/summary.gg.Rd Adds alias/usage for summary.gg_isopro.
man/print.gg.Rd Adds alias/usage for print.gg_isopro.
man/plot.gg_isopro.Rd New Rd for plot.gg_isopro.
man/gg_isopro.Rd New Rd for gg_isopro.
man/autoplot.gg.Rd Adds alias/usage for autoplot.gg_isopro.
dev/plans/2026-05-26-varpro-phase4-gg-isopro-plan.md Adds implementation plan (internal dev artifact).
dev/plans/2026-05-26-varpro-phase4-gg-isopro-design.md Adds design spec (internal dev artifact).
DESCRIPTION Bumps package version to 2.7.3.9008.
_pkgdown.yml Adds “Anomaly Detection” reference section for pkgdown site.
Files not reviewed (5)
  • man/autoplot.gg.Rd: Language not supported
  • man/gg_isopro.Rd: Language not supported
  • man/plot.gg_isopro.Rd: Language not supported
  • man/print.gg.Rd: Language not supported
  • man/summary.gg.Rd: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread R/autoplot_methods.R Outdated
Comment thread R/plot.gg_isopro.R Outdated
Comment thread R/plot.gg_isopro.R Outdated
Comment thread R/plot.gg_isopro.R Outdated
Comment thread R/gg_isopro.R Outdated
- autoplot.gg_isopro uses plot() generic for S3 dispatch
- roxygen converted from markdown to Rd-style (\code{} / \link{})
- .resolve_isopro_threshold validates threshold in [0,1] and top_n_pct in (0,100)
- panel='both' uses patchwork::wrap_plots() for consistency
ehrlinger added a commit that referenced this pull request May 26, 2026
Add Roxygen: list(markdown = TRUE) to DESCRIPTION so devtools::document()
auto-converts backticks / [fn()] / [pkg::fn()] in source roxygen to
\code{} / \link{} / \link[pkg]{} in the generated Rd. Existing Rd-style
markup keeps working; both styles now coexist. Saves the manual
conversion work the Copilot review on PR #94 flagged.

Two source-roxygen edits needed to keep R CMD check clean under markdown:
- R/help.R: randomForest[SRC] -> randomForestSRC (markdown read [SRC]
  as an unfinished link reference, producing a missing-link warning).
- R/gg_rfsrc.R::bootstrap_survival: 95\% -> 95% (markdown over-escaped
  the backslash, producing a malformed Rd with shifted section order).

Regenerates all 31 Rd files. No functional or rendered-content change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…locomp lint

The validation Copilot asked for in the previous round pushed
.resolve_isopro_threshold to cyclomatic complexity 38, well over the
project's 20-line budget. Factor the per-argument checks into a small
helper (.check_threshold_arg) parameterised by name/lo/hi/closure;
.resolve_isopro_threshold now reads as the three-branch decision it
actually is. Same external behaviour; 43 tests still pass.
@ehrlinger ehrlinger merged commit 6f4d649 into main May 26, 2026
15 checks passed
ehrlinger added a commit that referenced this pull request May 26, 2026
Add Roxygen: list(markdown = TRUE) to DESCRIPTION so devtools::document()
auto-converts backticks / [fn()] / [pkg::fn()] in source roxygen to
\code{} / \link{} / \link[pkg]{} in the generated Rd. Existing Rd-style
markup keeps working; both styles now coexist. Saves the manual
conversion work the Copilot review on PR #94 flagged.

Two source-roxygen edits needed to keep R CMD check clean under markdown:
- R/help.R: randomForest[SRC] -> randomForestSRC (markdown read [SRC]
  as an unfinished link reference, producing a missing-link warning).
- R/gg_rfsrc.R::bootstrap_survival: 95\% -> 95% (markdown over-escaped
  the backslash, producing a malformed Rd with shifted section order).

Regenerates all 31 Rd files. No functional or rendered-content change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ehrlinger added a commit that referenced this pull request May 26, 2026
…pro, gg_udependent) (#95)

* docs(gg_partial_varpro): teach what varPro partialpro is doing

* docs(gg_varpro): teach what varpro variable priority is doing

* docs(gg_udependent): teach what cross-variable dependency is doing

* chore: open v2.7.3.9009 + NEWS for varPro pedagogical doc audit

* docs: enable roxygen2 markdown package-wide

Add Roxygen: list(markdown = TRUE) to DESCRIPTION so devtools::document()
auto-converts backticks / [fn()] / [pkg::fn()] in source roxygen to
\code{} / \link{} / \link[pkg]{} in the generated Rd. Existing Rd-style
markup keeps working; both styles now coexist. Saves the manual
conversion work the Copilot review on PR #94 flagged.

Two source-roxygen edits needed to keep R CMD check clean under markdown:
- R/help.R: randomForest[SRC] -> randomForestSRC (markdown read [SRC]
  as an unfinished link reference, producing a missing-link warning).
- R/gg_rfsrc.R::bootstrap_survival: 95\% -> 95% (markdown over-escaped
  the backslash, producing a malformed Rd with shifted section order).

Regenerates all 31 Rd files. No functional or rendered-content change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: address PR #95 Copilot review

- plot.gg_udependent: clarify that truly isolated nodes are dropped by
  gg_udependent() before plotting; reword 'Isolated' as 'low-degree'.
- gg_partial_varpro: fix varpro::partialpro -> varPro::partialpro (six
  instances) so \link{} renders correctly.
- plot.gg_varpro: clarify the cutoff line lives in z-units on the
  default axis and in raw-importance units when type='raw'; the
  numeric is the same, the scale is not.
- gg_varpro reference: complete the dangling 'arXiv 2409.' with the
  full arXiv:2409.09003 identifier and an https://arxiv.org link.

* docs: regenerate gg_isopro Rd under markdown mode post-rebase

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ehrlinger added a commit that referenced this pull request May 26, 2026
Address Copilot review on PR #96: placing newdata as the 2nd
positional argument would change positional matching for any
caller of the PR #94 signature gg_isopro(object, ...). Moving
newdata after ... means it can only be matched by name, so
existing positional calls are unaffected. All tests already pass
newdata by name; no test changes needed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ehrlinger added a commit that referenced this pull request May 26, 2026
…(+ training-path polarity fix) (#96)

* docs: design spec for varPro Phase 4 — predict.isopro wrapper

Second sub-project of Phase 4 (gg_beta_varpro and gg_ivarpro come after).
Adds a newdata argument to gg_isopro() so a fitted isopro model can score
new observations into the same tidy gg_isopro frame. The polarity flip
between varPro's predict.isopro (smaller = anomalous) and the package's
howbad (higher = anomalous) is hidden inside the wrapper; the column is
semantically the same whether you score training or test data. Train/test
overlay reuses the existing method-column auto-detect in plot.gg_isopro,
explicitly documented.

* docs: sharpen polarity language in predict.isopro spec

After review discussion: rename the 'Polarity reminder' section to
'Polarity: how the wrapper presents both conventions' and rewrite it
so it explicitly names that case.depth keeps varPro's native polarity
while howbad carries the flipped version. Documentation section gains
a concrete-code-form requirement so the implementer writes the
transformation as 'howbad = 1 - predict(fit, newdata, quantiles=TRUE)'
in the roxygen. Same design (Option A), clearer framing.

* docs: implementation plan for varPro Phase 4b predict.isopro wrapper

* chore: open v2.7.3.9010 dev cycle (varPro Phase 4b predict.isopro)

* feat(gg_isopro): newdata argument for predict.isopro scoring

* test(gg_isopro): newdata validation and polarity-flip sanity checks

Adds three sanity tests for the predict.isopro path: newdata type
validation, training-as-newdata top-5 ordering agreement, and the
howbad = 1 - quantile relationship.

The top-5 ordering test caught a real polarity bug in the training
path: gg_isopro.isopro was returning howbad = object$howbad directly,
but varPro's $howbad uses "lower = more anomalous" polarity (it is the
quantile of case.depth, low depth = anomalous). The wrapper convention
is "higher = more anomalous". Flip the training path the same way the
prediction path does (1 - quantile) so train and test scores live on
the same polarity. Also drop backticks from the newdata validation
error so the regex match in the new tests is unambiguous.

* test(gg_isopro): train + test overlay via the method-column path

* docs(gg_isopro): document newdata arg and the polarity flip

* test: vdiffr snapshot for gg_isopro train+test overlay

* docs: NEWS entry for varPro Phase 4b predict.isopro wrapper + training-path polarity fix

* refactor(gg_isopro): move newdata after ... for back-compat

Address Copilot review on PR #96: placing newdata as the 2nd
positional argument would change positional matching for any
caller of the PR #94 signature gg_isopro(object, ...). Moving
newdata after ... means it can only be matched by name, so
existing positional calls are unaffected. All tests already pass
newdata by name; no test changes needed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ehrlinger ehrlinger deleted the feat/varpro-phase4-gg-isopro branch May 28, 2026 14:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants