Skip to content

RCTree cannot handle when the data consists of only one unique value #88

@kongwilson

Description

@kongwilson

I ran into issues when a subset of my sample data points only contain ONE unique value. How should we handle such an exception?

The error message basically suggests a NaN value for probability (caused by division by zero). I tried to turn this into a uniform distribution, but it caused subsequent issue after a cut the right side contains no values. I think this violates the principle of the RRCF algo. Do we have better way of resolving such cases?

File "<ipython-input-2-b3a957a401e5>", line 139, in <listcomp>
    rrcf.RCTree(x[ix], index_labels=ix) for ix in ixs]
  File "C:\ProgramData\Anaconda3\lib\site-packages\rrcf-0.4.3-py3.8.egg\rrcf\rrcf.py", line 106, in __init__
    self._mktree(X, S, N, I, parent=self)
  File "C:\ProgramData\Anaconda3\lib\site-packages\rrcf-0.4.3-py3.8.egg\rrcf\rrcf.py", line 177, in _mktree
    S1, S2, branch = self._cut(X, S, parent=parent, side=side)
  File "C:\ProgramData\Anaconda3\lib\site-packages\rrcf-0.4.3-py3.8.egg\rrcf\rrcf.py", line 159, in _cut
    q = self.rng.choice(self.ndim, p=l)
  File "mtrand.pyx", line 928, in numpy.random.mtrand.RandomState.choice
ValueError: probabilities contain NaN

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions