Skip to content

Quiet sleep/death during dist_fit.log_prob(target) in optimisation. #63

@StefanoDamato

Description

@StefanoDamato

Hi @StatMixedML,

thank you for the nice package. It is a very valuable contribution, and I am planning to use it in my experiments.

While trying to run the following script, which is a toy example of a regression with negative binomial likelihood in LIghtGBMLSS, I observe that the process suddenly sleeps/dies without returning error messages. It simply gets stuck during the optimisation process (handled via the optuna-based hyperparameter optimisation method provided by the package, hyper_opt).

The script is the following:

import numpy as np
import lightgbm as lgb

from lightgbmlss.model import LightGBMLSS
from lightgbmlss.distributions.NegativeBinomial import NegativeBinomial


SEED = 123
np.random.seed(SEED)

# Small synthetic count-regression dataset.
n = 120
x1 = np.random.normal(0.0, 1.0, size=n)
x2 = np.random.uniform(-1.0, 1.0, size=n)
x3 = np.random.normal(2.0, 0.5, size=n)
X = np.column_stack([x1, x2, x3])

beta0 = 0.4
beta = np.array([0.8, -0.5, 0.3])
eta = beta0 + X @ beta
lam = np.exp(eta)
y = np.random.poisson(lam)

X_train, y_train = X[:80], y[:80]
X_valid = X[80:100]

dtrain = lgb.Dataset(X_train, label=y_train)

lgblss = LightGBMLSS(NegativeBinomial())

param_dict = {
    "eta": ["float", {"low": 1e-5, "high": 1.0, "log": True}],
    "max_depth": ["int", {"low": 1, "high": 10, "log": False}],
    "num_leaves": ["int", {"low": 255, "high": 255, "log": False}],
    "min_data_in_leaf": ["int", {"low": 20, "high": 20, "log": False}],
    "min_gain_to_split": ["float", {"low": 1e-8, "high": 40.0, "log": False}],
    "min_sum_hessian_in_leaf": ["float", {"low": 1e-8, "high": 500.0, "log": True}],
    "subsample": ["float", {"low": 0.2, "high": 1.0, "log": False}],
    "feature_fraction": ["float", {"low": 0.2, "high": 1.0, "log": False}],
    "boosting": ["categorical", ["gbdt"]],
}

opt_param = lgblss.hyper_opt(
    param_dict,
    dtrain,
    num_boost_round=100,
    nfold=5,
    early_stopping_rounds=20,
    max_minutes=10,
    n_trials=30,
    silence=True,
    seed=SEED,
    hp_seed=SEED,
)

_ = lgblss.predict(data=X_valid[:5], pred_type="parameters")

print("repro script completed")

From terminal, I see that the script gets stuck on this state 0%| | 0/30 [00:00<?, ?it/s].

Digging with the VScode debugger, I noticed that the very last lines I am able to run are the following, from the distribution_utils.py file (lines 288-290):

            dist_fit = self.distribution(**dist_kwargs)
            if self.loss_fn == "nll":
                loss = -torch.nansum(dist_fit.log_prob(target))

The conda environment I am using is created with the following .yml file:

name: ilocglob
channels:
  - conda-forge
  - defaults
dependencies:
  - python=3.10
  - pip
  - icu
  - r-base
  - rpy2
  - pip:
    - torchvision
    - torchaudio
    - torch == 2.2.0
    - gpytorch
    - numpy < 2
    - scipy
    - pandas 
    - matplotlib
    - seaborn>=0.13.0,<0.14.0
    - optuna>=3.5.0,<4.4
    - optuna-integration
    - ipykernel
    - accelerate
    - datasets
    - lightning
    - ujson
    - lightgbmlss

I believe that this problem is somehow related to that of issue #16.
I experienced a similar problem when trying to just train a model (without hyper parameter optimisation) as well.

Thanks in advance for your help. I’ll be happy to provide any additional information if needed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions