Quiet sleep/death during `dist_fit.log_prob(target)` in optimisation.

Hi @StatMixedML,

thank you for the nice package. It is a very valuable contribution, and I am planning to use it in my experiments.

While trying to run the following script, which is a toy example of a regression with negative binomial likelihood in LIghtGBMLSS, I observe that the process suddenly sleeps/dies without returning error messages. It simply gets stuck during the optimisation process (handled via the `optuna`-based hyperparameter optimisation method provided by the package, `hyper_opt`). 

The script is the following:

```
import numpy as np
import lightgbm as lgb

from lightgbmlss.model import LightGBMLSS
from lightgbmlss.distributions.NegativeBinomial import NegativeBinomial


SEED = 123
np.random.seed(SEED)

# Small synthetic count-regression dataset.
n = 120
x1 = np.random.normal(0.0, 1.0, size=n)
x2 = np.random.uniform(-1.0, 1.0, size=n)
x3 = np.random.normal(2.0, 0.5, size=n)
X = np.column_stack([x1, x2, x3])

beta0 = 0.4
beta = np.array([0.8, -0.5, 0.3])
eta = beta0 + X @ beta
lam = np.exp(eta)
y = np.random.poisson(lam)

X_train, y_train = X[:80], y[:80]
X_valid = X[80:100]

dtrain = lgb.Dataset(X_train, label=y_train)

lgblss = LightGBMLSS(NegativeBinomial())

param_dict = {
    "eta": ["float", {"low": 1e-5, "high": 1.0, "log": True}],
    "max_depth": ["int", {"low": 1, "high": 10, "log": False}],
    "num_leaves": ["int", {"low": 255, "high": 255, "log": False}],
    "min_data_in_leaf": ["int", {"low": 20, "high": 20, "log": False}],
    "min_gain_to_split": ["float", {"low": 1e-8, "high": 40.0, "log": False}],
    "min_sum_hessian_in_leaf": ["float", {"low": 1e-8, "high": 500.0, "log": True}],
    "subsample": ["float", {"low": 0.2, "high": 1.0, "log": False}],
    "feature_fraction": ["float", {"low": 0.2, "high": 1.0, "log": False}],
    "boosting": ["categorical", ["gbdt"]],
}

opt_param = lgblss.hyper_opt(
    param_dict,
    dtrain,
    num_boost_round=100,
    nfold=5,
    early_stopping_rounds=20,
    max_minutes=10,
    n_trials=30,
    silence=True,
    seed=SEED,
    hp_seed=SEED,
)

_ = lgblss.predict(data=X_valid[:5], pred_type="parameters")

print("repro script completed")
```

From terminal, I see that the script gets stuck on this state `0%|    | 0/30 [00:00<?, ?it/s]`.


Digging with the VScode debugger, I noticed that the very last lines I am able to run are the following, from the `distribution_utils.py` file (lines 288-290):

```
            dist_fit = self.distribution(**dist_kwargs)
            if self.loss_fn == "nll":
                loss = -torch.nansum(dist_fit.log_prob(target))
```


The conda environment I am using is created with the following `.yml` file:

```
name: ilocglob
channels:
  - conda-forge
  - defaults
dependencies:
  - python=3.10
  - pip
  - icu
  - r-base
  - rpy2
  - pip:
    - torchvision
    - torchaudio
    - torch == 2.2.0
    - gpytorch
    - numpy < 2
    - scipy
    - pandas 
    - matplotlib
    - seaborn>=0.13.0,<0.14.0
    - optuna>=3.5.0,<4.4
    - optuna-integration
    - ipykernel
    - accelerate
    - datasets
    - lightning
    - ujson
    - lightgbmlss
```

I believe that this problem is somehow related to that of [issue #16](https://github.com/StatMixedML/LightGBMLSS/issues/16).
I experienced a similar problem when trying to just train a model (without hyper parameter optimisation) as well.

Thanks in advance for your help. I’ll be happy to provide any additional information if needed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quiet sleep/death during `dist_fit.log_prob(target)` in optimisation. #63

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Quiet sleep/death during dist_fit.log_prob(target) in optimisation. #63

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Quiet sleep/death during `dist_fit.log_prob(target)` in optimisation. #63