Skip to content

fix: gg_variable.randomForest classification — yhat.* columns + smooth bugs (#87)#88

Merged
ehrlinger merged 11 commits into
mainfrom
fix/rf-87-gg-variable-classification
May 21, 2026
Merged

fix: gg_variable.randomForest classification — yhat.* columns + smooth bugs (#87)#88
ehrlinger merged 11 commits into
mainfrom
fix/rf-87-gg-variable-classification

Conversation

@ehrlinger
Copy link
Copy Markdown
Owner

Summary

  • `gg_variable.randomForest` for classification forests now stores per-class OOB vote fractions as `yhat.` columns (from `object$votes`), matching the rfsrc path. Previously a single `yhat` factor (class labels from `object$predicted`) was stored, preventing the multi-class pivot in `plot.gg_variable` from firing. Vote fractions are row-normalised to [0, 1] even when fit with `norm.votes = FALSE`.
  • Binary classification `smooth = TRUE`: added explicit `aes(x, y)` to `geom_smooth` so `layer_data()` no longer errors.
  • Multi-class numeric path: added missing `smooth = TRUE` block.
  • 6 new unit tests + 2 vdiffr snapshots.

Housekeeping

Closes #81, Closes #82 (both were fully fixed by PR #83 but auto-close never fired).

Test plan

  • `devtools::test()` — all tests pass, 0 failures
  • `devtools::check(args="--as-cran")` — 0 errors, 0 warnings, 0 notes

🤖 Generated with Claude Code

@codecov
Copy link
Copy Markdown

codecov Bot commented May 21, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 87.11%. Comparing base (cf6229a) to head (99be00e).

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main      #88      +/-   ##
==========================================
+ Coverage   86.39%   87.11%   +0.71%     
==========================================
  Files          38       38              
  Lines        3022     3043      +21     
==========================================
+ Hits         2611     2651      +40     
+ Misses        411      392      -19     
Files with missing lines Coverage Δ
R/gg_variable.R 85.41% <100.00%> (+2.85%) ⬆️
R/plot.gg_variable.R 82.13% <100.00%> (+5.94%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

ehrlinger added a commit that referenced this pull request May 21, 2026
- design doc: add oob=TRUE to gg_roc new signature (was omitted)
- design doc: replace proposed calc_roc_one_vs_rest() with existing
  .rf_prob_matrix/.rf_one_class_roc/.rf_macro_average_roc helpers;
  document that output uses sens/spec/pct (package contract), not fpr/tpr
- plan #87: add norm.votes=FALSE row-normalization to votes extraction step
- plan #87: tighten patchwork test expectation (iris 4-predictor default
  must return patchwork, not patchwork||ggplot, to catch list regressions)
- plan #88: add pct to per_class column assertions (both occurrences)
- plan #88: correct rfsrc per_class rationale (argument is for API
  discoverability, not to prevent "unused argument" error — rfsrc uses ...)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ehrlinger added a commit that referenced this pull request May 21, 2026
…s (PR #87 & #88) (#87)

* docs: add randomForest finishing-work design spec (PR #87 + #88)

Covers two v2.8.0 completion PRs:
- PR #87: gg_variable.randomForest classification yhat fix + stale #81/#82 close
- PR #88: multi-class gg_roc with per_class=TRUE (issue #72, no CIs)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: add implementation plans for PR #87 (gg_variable classification) and PR #88 (per_class ROC)

* docs: address Copilot review on PR #87 — 6 plan/spec corrections

- design doc: add oob=TRUE to gg_roc new signature (was omitted)
- design doc: replace proposed calc_roc_one_vs_rest() with existing
  .rf_prob_matrix/.rf_one_class_roc/.rf_macro_average_roc helpers;
  document that output uses sens/spec/pct (package contract), not fpr/tpr
- plan #87: add norm.votes=FALSE row-normalization to votes extraction step
- plan #87: tighten patchwork test expectation (iris 4-predictor default
  must return patchwork, not patchwork||ggplot, to catch list regressions)
- plan #88: add pct to per_class column assertions (both occurrences)
- plan #88: correct rfsrc per_class rationale (argument is for API
  discoverability, not to prevent "unused argument" error — rfsrc uses ...)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ehrlinger and others added 10 commits May 21, 2026 12:45
…tion (PR #87)

Tests verify that gg_variable.randomForest classification forests produce:
- Per-class vote fraction columns (yhat.setosa, yhat.versicolor, yhat.virginica)
- No bare yhat column for multi-class prediction
- Valid vote fractions (0-1) that row-sum to 1
- Plottable output for multi-xvar case
- Layer_data access on single-xvar plots

All three tests currently FAIL, as expected (TDD red phase).
Implementation fix comes in T2.
…hat.* columns

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… update stale oob comment

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
#87)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…indices

Strip the yhat. prefix from column names when building the outcome column
in the multi-class pivot loop (line ~169), so facet labels show "setosa"/
"versicolor"/"virginica" instead of 1/2/3. Add a regression test that
verifies p@data$outcome contains class names after plotting.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…t review)

Plan #87 was corrected (via PR #87 Copilot review) to assert patchwork
specifically for the 4-predictor iris default-plot case, rather than the
loose patchwork||ggplot form that wouldn't catch a regression to a bare
list (issue #80).  Align the implementation test to match.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ehrlinger ehrlinger force-pushed the fix/rf-87-gg-variable-classification branch from d511d33 to bd27790 Compare May 21, 2026 16:52
@ehrlinger ehrlinger requested a review from Copilot May 21, 2026 18:15
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes the randomForest classification path for variable-dependence extraction/plotting by switching to per-class vote fractions (yhat.<class>) and addressing smooth=TRUE issues in plot.gg_variable, with accompanying regression tests and snapshot coverage.

Changes:

  • gg_variable.randomForest() (classification) now emits per-class yhat.<class> columns derived from object$votes, normalized to probabilities.
  • plot.gg_variable() multi-class reshape now uses class names for the outcome facet column; classification smooth=TRUE now supplies explicit aes(x, y) (binary) and adds the missing smooth layer (multi-class numeric).
  • Adds new unit tests + vdiffr snapshots; bumps package version/date and documents the fixes in NEWS.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
R/gg_variable.R Adjusts randomForest classification extraction to store per-class vote fractions as yhat.<class> columns.
R/plot.gg_variable.R Fixes multi-class pivot outcome labels and ensures smooth=TRUE works in affected classification paths.
tests/testthat/test_gg_variable.R Adds tests validating yhat.<class> columns, normalization, patchwork return, and smooth/layer_data() behavior.
tests/testthat/test_snapshots.R Adds vdiffr snapshots for randomForest classification gg_variable plots.
NEWS.md Documents the gg_variable.randomForest + plot.gg_variable fixes and increments dev version.
DESCRIPTION Bumps package version and date.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread R/gg_variable.R Outdated
Comment thread R/plot.gg_variable.R Outdated
- gg_variable.randomForest: replace silent no-op oob override with an
  explicit warning when oob=FALSE is supplied, since in-bag class
  probabilities are not available via the randomForest API.
- plot.gg_variable: set outcome factor levels from gg_dta_y column order
  rather than factor() default (alphabetical), so multi-class facet panels
  follow the model's class ordering regardless of locale.
- Add two new tests: oob=FALSE warning, and outcome factor level order.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@ehrlinger ehrlinger merged commit ae0b18e into main May 21, 2026
15 checks passed
ehrlinger added a commit that referenced this pull request May 22, 2026
…) (#91)

* chore: open v2.7.3.9006 dev increment (PR #88)

* feat: gg_roc.randomForest per_class=TRUE — per-class OvR ROC + named AUC

* test: gg_roc per_class binary no-op and which_outcome conflict tests

* feat: plot.gg_roc per_class=TRUE — overlay and facet panel paths

* feat: summary.gg_roc handles named AUC vector for per_class=TRUE

* test: add vdiffr snapshots for per_class ROC overlay and facet (PR #88)

* docs: NEWS for PR #88 per_class ROC + roxygen for per_class/panel args

- NEWS.md: feature entry + version line bump to 2.7.3.9006
- Document gg_roc(per_class=) and plot.gg_roc(panel=) — fixes codoc warnings
- Replace em-dashes in R/plot.gg_roc.R with ASCII (non-ASCII warning)
- .Rbuildignore: exclude stray .superpowers/ dir

R CMD check --as-cran: 0 errors, 0 warnings, 0 notes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor: extract .plot_gg_roc_per_class helper to satisfy cyclocomp lint

plot.gg_roc cyclomatic complexity hit 21 (lint budget 20) after the
per_class branch. Move per-class rendering into a private helper; the
method now delegates with a single return() call.

* fix: move plot.gg_roc panel arg after ... for positional back-compat (#91 review)

Copilot review: panel before ... is backward-incompatible — a positional
caller like plot(x, 1, FALSE) previously routed FALSE into ..., but with
panel as the 3rd formal it would bind to panel and error on match.arg.
Placing panel after ... makes it name-only; all test calls already use
panel = by name.

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants