Skip to content

ranger.unify fails on large models #13

@yovizzle

Description

@yovizzle

Hi,

We are training a large random forest model (rf object size is ~270mb) on a large dataset (dim 1,670,000 x 267, object size 3.3gb) and are hitting errors. The machine tested on has 96 cpus/354Gb ram.

Here is a repro.

library(treeshap)
library(ranger)
library(tidyverse)

# Generate random training tibble of similar size to our data
m = matrix(nrow = 800000,ncol = 200,data = runif(n = 800000*200))
object.size(m)/1024^3 # 1.2 gb
trainM = m %>% as_tibble
srf <- ranger(V200 ~ ., data=trainM, num.trees = 5,verbose = TRUE)
object.size(srf)/1024^2 # 89.4 MB
rfu = treeshap::ranger.unify(srf, trainM)

We then got this error:

# *** caught segfault ***
#   address 0x55e43e173ed0, cause 'memory not mapped'
# 
# Traceback:
# 1: new_covers(x, is_na, roots, yes, no, missing, is_leaf, feature,     split, decision_type)
# 2: set_reference_dataset(ret, as.data.frame(data))
# 3: treeshap::ranger.unify(srf, trainM)
# An irrecoverable exception occurred. R is aborting now ...
# Segmentation fault (core dumped)

# R version 4.0.2 (2020-06-22)
# Platform: x86_64-pc-linux-gnu (64-bit)
# Running under: Ubuntu 20.04 LTS

Any ideas as to what may be causing this issue? Is it a limitation of the current implementation of the package, or perhaps an issue related to our R environment?

Thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions