|
kl_loss = -0.5 * (1 + (logsigma - logsigma_prior) - (mu-mu_prior).pow(2)/(2*logsigma_prior).exp() - (2*logsigma).exp()/(2*logsigma_prior).exp()).mean() #.sum(-1).mean() |
Hi!
First of all thanks a lot for the implementation of the paper!
Issue: While reviewing the vib_kld_loss_lp implementation, I noticed a potential discrepancy in the KL divergence formulation between two diagonal Gaussians. I wanted to double-check whether this is intentional or an oversight.
Current Implementation: (logsigma - logsigma_prior)
Shouldn't it be (logsigma_sq - logsigma_prior_sq) ?
Thanks!
vittle/vittle/model/language_model/vittle_llama.py
Line 273 in 1ea2e09
Hi!
First of all thanks a lot for the implementation of the paper!
Issue: While reviewing the vib_kld_loss_lp implementation, I noticed a potential discrepancy in the KL divergence formulation between two diagonal Gaussians. I wanted to double-check whether this is intentional or an oversight.
Current Implementation: (logsigma - logsigma_prior)
Shouldn't it be (logsigma_sq - logsigma_prior_sq) ?
Thanks!