Thank you for your interest in contributing! This guide is written for R programmers who are new to package development. It covers everything from setting up your environment to opening a pull request.
- What you will need
- Getting the code
- Package structure
- The gg_* design pattern
- Making a change
- Writing tests
- Documentation standards
- Code style
- Running the full check suite
- Opening a pull request
- Getting help
| Tool | Why | Install |
|---|---|---|
| R >= 4.4.0 | Runtime | https://cloud.r-project.org |
| RStudio or Positron | IDE with R-aware tools | https://posit.co/downloads |
| Git | Version control | https://git-scm.com |
| Quarto | Builds the vignette | https://quarto.org/docs/get-started |
Install the development helper packages in R:
install.packages(c(
"devtools", # load, test, check, document in one place
"testthat", # testing framework
"roxygen2", # builds man/ pages from inline comments
"lintr", # style linter
"covr", # test coverage measurement
"pkgdown" # builds the website
))Install the package dependencies (randomForestSRC >= 3.4.0 is required):
install.packages(c("randomForestSRC", "randomForest", "ggplot2",
"dplyr", "tidyr", "survival"))# Fork the repo on GitHub first, then:
git clone https://github.com/YOUR-USERNAME/ggRandomForests.git
cd ggRandomForests
# Add the upstream remote so you can pull future changes
git remote add upstream https://github.com/ehrlinger/ggRandomForests.gitOpen ggRandomForests.Rproj in RStudio/Positron. Then load the package in development mode — this makes all functions available without installing:
devtools::load_all() # shortcut: Ctrl+Shift+LConfirm it works:
library(randomForestSRC)
rf <- rfsrc(Species ~ ., data = iris)
plot(gg_error(rf))R/ All source code — one file per function family
gg_*.R Data extraction functions (return gg_* objects)
plot.gg_*.R S3 plot methods (return ggplot objects)
help.R Package-level ?ggRandomForests documentation
zzz.R .onAttach startup message
man/ Auto-generated — never edit by hand
*.Rd Built from Roxygen comments by devtools::document()
tests/
testthat/
test_*.R One test file per R/ source file
vignettes/
ggRandomForests.qmd Main package vignette (Quarto)
DESCRIPTION Package metadata, dependencies, version
NAMESPACE Exports/imports — auto-generated by roxygen2
NEWS.md Changelog — add an entry for every user-visible change
Key rule: the man/ and NAMESPACE files are always auto-generated. Run devtools::document() after any Roxygen change and commit the updated files.
Every feature in this package follows the same two-step pattern:
forest object
│
▼
gg_*(forest) ← R/gg_*.R — data extraction, returns a gg_* data.frame
│
▼
plot(gg_object) ← R/plot.gg_*.R — builds and returns a ggplot2 object
Why two steps? Keeping data and plotting separate means users can inspect, save, transform, or combine the intermediate data before plotting, and apply any ggplot2 layers they want on top of the returned object.
A gg_* object is just a data.frame with extra class attributes:
# Example: what gg_vimp returns
class(gg_vimp(rf))
# [1] "gg_vimp" "data.frame"The extra class ("gg_vimp") lets R dispatch plot(gg_dta) to plot.gg_vimp automatically through R's S3 system.
Most gg_* functions support both randomForestSRC and randomForest objects. The pattern is:
# 1. Generic — dispatches based on class of `object`
gg_vimp <- function(object, ...) {
UseMethod("gg_vimp", object)
}
# 2. rfsrc method
gg_vimp.rfsrc <- function(object, ...) { ... }
# 3. randomForest method
gg_vimp.randomForest <- function(object, ...) { ... }Both methods should return an identically structured gg_* object so that plot.gg_vimp works for either.
Suppose you want to add gg_depth() to plot average tree depth. Here is the skeleton:
# R/gg_depth.R
#' Tree depth data object
#'
#' Extracts average depth statistics per tree from a random forest.
#'
#' @param object A fitted \code{\link[randomForestSRC]{rfsrc}} or
#' \code{\link[randomForest]{randomForest}} object.
#' @param ... Optional arguments passed to methods.
#'
#' @return A \code{gg_depth} \code{data.frame} with columns \code{ntree}
#' and \code{depth}.
#'
#' @seealso \code{\link{plot.gg_depth}}
#'
#' @examples
#' rf <- rfsrc(Species ~ ., data = iris)
#' plot(gg_depth(rf))
#'
#' @export
gg_depth <- function(object, ...) {
UseMethod("gg_depth", object)
}
#' @export
gg_depth.rfsrc <- function(object, ...) {
# ... extract depth data ...
gg_dta <- data.frame(ntree = seq_len(object$ntree), depth = depths)
class(gg_dta) <- c("gg_depth", class(gg_dta))
invisible(gg_dta)
}Then create R/plot.gg_depth.R following the same pattern as plot.gg_error.R.
Always work on a branch — never commit directly to main:
git checkout -b my-feature-nameThe development cycle is:
devtools::load_all() # reload after editing source
devtools::test() # run tests
devtools::document() # rebuild man/ from Roxygen comments
devtools::check() # full R CMD check (slow — run before PR)Tests live in tests/testthat/ and are named test_<source_file>.R to match the file they cover. The framework is testthat.
# tests/testthat/test_gg_depth.R
test_that("gg_depth returns correct class for rfsrc", {
rf <- randomForestSRC::rfsrc(Species ~ ., data = iris, ntree = 50)
gg_dta <- gg_depth(rf)
expect_s3_class(gg_dta, "gg_depth")
expect_s3_class(gg_dta, "data.frame")
expect_true(all(c("ntree", "depth") %in% names(gg_dta)))
expect_equal(nrow(gg_dta), rf$ntree)
})
test_that("plot.gg_depth returns a ggplot", {
rf <- randomForestSRC::rfsrc(Species ~ ., data = iris, ntree = 50)
gg_plt <- plot(gg_depth(rf))
expect_s3_class(gg_plt, "ggplot")
})
test_that("gg_depth throws on wrong input", {
expect_error(gg_depth("not a forest"))
})- Keep forests small in tests —
ntree = 50is plenty, faster than the default 1000. - Test the error path as well as the happy path (
expect_error,expect_warning). - Use
expect_s3_class()rather than the olderexpect_is(). - Avoid
set.seed()unless you are explicitly testing something random — randomForestSRC results are stochastic and exact-value tests break across versions.
Run tests for a single file during development:
testthat::test_file("tests/testthat/test_gg_depth.R")Check coverage (aim for > 80%):
covr::package_coverage()Documentation is written in Roxygen2 comments (lines starting with #') immediately above each function.
#' Short one-line title
#'
#' One or two paragraphs describing what the function does and why.
#'
#' @param arg1 Type and meaning. Include the default value if there is one.
#' @param arg2 ...
#'
#' @return Describe what is returned: the class, the columns in any
#' data.frame, and any class attributes set on the object.
#'
#' @seealso \code{\link{related_function}}
#'
#' @examples
#' # A runnable example — must complete in < 10 seconds for CRAN
#' rf <- rfsrc(Species ~ ., data = iris, ntree = 50)
#' plot(gg_something(rf))
#'
#' @export@paramfor every argument, including...when the extras are meaningful.@returnmust describe the shape of the output — not just the class name.@seealsolinks to the pairedplot.*function (or thegg_*function from aplot.*file).@examplesmust be runnable without error byR CMD check. Wrap slow examples in\donttest{}. Never wrap in\dontrun{}unless they literally cannot run on CRAN (network, credentials, etc.).- Internal helpers (not exported) get
@keywords internalinstead of@export.
Rebuild the docs after any change:
devtools::document()Then spot-check the result:
?gg_depthEvery user-visible change needs a bullet in NEWS.md under the appropriate version heading:
ggRandomForests v2.7.0
=====================
* Add `gg_depth()` to visualise average tree depth per forest (#42)The package follows the tidyverse style guide. Key points:
| Rule | Good | Bad |
|---|---|---|
| Spacing around operators | x <- x + 1 |
x<-x+1 |
| Spaces after commas | f(x, y) |
f(x,y) |
| Indentation | 2 spaces | tabs |
| Object names | snake_case |
camelCase, dotted.name |
| Boolean checks | !inherits(x, "foo") |
inherits(x, "foo") == FALSE |
| Safe sequences | seq_len(n) |
1:n |
| Column references in aes() | .data$col or .data[[var]] |
bare col or string "col" |
dplyr column selection |
dplyr::select(tidyr::all_of(vars)) |
dplyr::select(vars) |
Check your code with lintr before opening a PR:
lintr::lint_package()Common issues lintr flags:
- Lines > 120 characters.
T/Finstead ofTRUE/FALSE.- Trailing whitespace.
1:ninstead ofseq_len(n).inherits(x, "cls") == FALSEinstead of!inherits(x, "cls").
Before opening a PR, run the same checks CI runs:
# Quick: just tests
devtools::test()
# Thorough: full R CMD check (builds vignette, checks examples, etc.)
devtools::check()A clean check means:
0 errors ✔ | 0 warnings ✔ | 0 notes ✔
One note about the package size or installed path is acceptable. Errors or warnings must be fixed before a PR can be merged.
To reproduce the exact CI matrix locally you can use rhub:
rhub::rhub_check()-
Commit your changes with a clear, present-tense message:
git add R/gg_depth.R R/plot.gg_depth.R tests/testthat/test_gg_depth.R git commit -m "Add gg_depth() for average tree depth visualisation" -
Push to your fork:
git push origin my-feature-name
-
Open a PR on GitHub against the
mainbranch ofehrlinger/ggRandomForests. -
PR description checklist — include in the description:
- What problem does this solve or what feature does it add?
- Which functions are new or changed?
- Did you add or update tests?
- Did you add a
NEWS.mdentry? - Does
devtools::check()pass cleanly?
-
CI will run automatically across macOS, Windows, and Linux on R release, devel, and oldrel-1. All checks must pass before merge.
Add gg_depth() for average tree depth ← new feature
Fix factor ordering in gg_partial categorical branch ← bug fix
Improve @return docs for gg_rfsrc ← documentation
Refactor bootstrap_survival to utils.R ← refactor
Avoid "WIP", "fix", or "update" with no context.
- Bug reports and feature requests: GitHub Issues. Search existing issues before filing a new one.
- Questions about usage: GitHub Discussions or post on Posit Community.
- randomForestSRC questions: the randomForestSRC documentation and its own GitHub issues.
When filing a bug, always include:
# Minimum reproducible example
library(ggRandomForests)
library(randomForestSRC)
rf <- rfsrc(Species ~ ., data = iris, ntree = 50)
# ... the code that triggers the error ...
sessionInfo() # paste this output into the issueThank you for helping improve ggRandomForests!