Hi,
We are training a large random forest model (rf object size is ~270mb) on a large dataset (dim 1,670,000 x 267, object size 3.3gb) and are hitting errors. The machine tested on has 96 cpus/354Gb ram.
Here is a repro.
library(treeshap)
library(ranger)
library(tidyverse)
# Generate random training tibble of similar size to our data
m = matrix(nrow = 800000,ncol = 200,data = runif(n = 800000*200))
object.size(m)/1024^3 # 1.2 gb
trainM = m %>% as_tibble
srf <- ranger(V200 ~ ., data=trainM, num.trees = 5,verbose = TRUE)
object.size(srf)/1024^2 # 89.4 MB
rfu = treeshap::ranger.unify(srf, trainM)
We then got this error:
# *** caught segfault ***
# address 0x55e43e173ed0, cause 'memory not mapped'
#
# Traceback:
# 1: new_covers(x, is_na, roots, yes, no, missing, is_leaf, feature, split, decision_type)
# 2: set_reference_dataset(ret, as.data.frame(data))
# 3: treeshap::ranger.unify(srf, trainM)
# An irrecoverable exception occurred. R is aborting now ...
# Segmentation fault (core dumped)
# R version 4.0.2 (2020-06-22)
# Platform: x86_64-pc-linux-gnu (64-bit)
# Running under: Ubuntu 20.04 LTS
Any ideas as to what may be causing this issue? Is it a limitation of the current implementation of the package, or perhaps an issue related to our R environment?
Thanks.
Hi,
We are training a large random forest model (rf object size is ~270mb) on a large dataset (dim 1,670,000 x 267, object size 3.3gb) and are hitting errors. The machine tested on has 96 cpus/354Gb ram.
Here is a repro.
We then got this error:
Any ideas as to what may be causing this issue? Is it a limitation of the current implementation of the package, or perhaps an issue related to our R environment?
Thanks.