Skip to content

Succesive evaluations of metrics eating RAM #82

@patoorio

Description

@patoorio

Description of the problem

Hi,

I am using HOI to evaluate several metrics of High-order stuff. I have to do it in a for loop for many sets of simulated data and also including surrogate data. The problem is that successive evaluations make the use of memory to increase, even if I try to release memory by deleting variables and using garbage collection.
The code attached will produce the following output when calculating surrogate O-information:

Get list of multiplets
surrogate 0  ready. Memory 399                             
surrogate 1  ready. Memory 408
surrogate 2  ready. Memory 426
surrogate 3  ready. Memory 437

(...)

surrogate 36  ready. Memory 739
surrogate 37  ready. Memory 747
surrogate 38  ready. Memory 754
surrogate 39  ready. Memory 764

and if I execute again within the same kernel, memory continues to increase:

Get list of multiplets
surrogate 0  ready. Memory 802                             
surrogate 1  ready. Memory 809
surrogate 2  ready. Memory 822
(...)
surrogate 37  ready. Memory 1106
surrogate 38  ready. Memory 1114
surrogate 39  ready. Memory 1121

The previous is with data of dimension 3 If I do the same with dimension-4 data, I get:

Get list of multiplets
surrogate 0  ready. Memory 6724
surrogate 1  ready. Memory 6747
(...)
surrogate 18  ready. Memory 6916
surrogate 19  ready. Memory 6925

And after a couple of times the thing escalates to

(...)
surrogate 16  ready. Memory 7511
surrogate 17  ready. Memory 7517
surrogate 18  ready. Memory 7527
surrogate 19  ready. Memory 7534

I have seen this under Windows and Linux, in a Desktop PC and in a HPC server. The problem is that, in the HPC server I am running several instances in parallel and after some hours the full RAM of 256Gb collapses.
I know I can use alternative approaches like killing the process and starting a new Python kernel after some time, but still I think this is a dangerous memory leak that should be taken into account in some way.

Steps to reproduce

import numpy as np
import hoi
import os, psutil
import gc

proc=psutil.Process(os.getpid())

def randomShift(data,N=50,cols=True):
    """
    Random shifting of time series

    Parameters
    ----------
    data : 2D numpy array
        T x S numpy array (S x T if cols == False).
    N : int, optional
        Number of surrogates to generate. The default is 50.
    cols : boolean, optional
        If true, the first dimension of the array is time. The default is True.

    Returns
    -------
    outSeries : numpy array
        N x T x S.  (N x S x T if cols==False)
        N = number of surrogates, T = time points, S = number of series

    """
    if cols:
        data2=data.T
    else:
        data2=np.copy(data)
    D,L=data2.shape
    outSeries=[]
    for i in range(N):
        shifts=np.random.randint(1,L,D)
        serie=[np.r_[d[s:],d[:s]] for d,s in zip(data2,shifts)]
        outSeries.append(serie)
    outSeries=np.array(outSeries)
    if cols:
        outSeries=np.swapaxes(outSeries,1,2)
    return outSeries 
   
est = 'kernel'

x_t=hoi.simulation.simulate_hoi_gauss(n_samples=2000)

ent = hoi.core.get_entropy(method=est)
H = ent(x_t.T)
Oinf = hoi.metrics.Oinfo(x_t,verbose=True).fit(minsize=3, method=est).squeeze()
Sinfo = hoi.metrics.Sinfo(x_t,verbose=False).fit(minsize=3, method=est).squeeze()

TC = (Sinfo + Oinf) / 2
DTC = (Sinfo - Oinf) / 2


#%%
surrX=randomShift(x_t,N=40)
surrOinf = []

for s,serie in enumerate(surrX):
    OinfoScalc = hoi.metrics.Oinfo(serie,verbose=False)
    surrOinf.append(OinfoScalc.fit(minsize=3, method=est).squeeze())
    print(f"surrogate {s}  ready. Memory",proc.memory_info()[0]//1024//1024)
    
    del OinfoScalc
    gc.collect()

Expected results

More or less constant memory usage

Actual results

(given above)

Additional information

n/a

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions