Description
Implement Generalized normal distribution https://en.wikipedia.org/wiki/Generalized_normal_distribution
Why this is useful?
It generalizes the normal (for p=2) and double exponential (Laplace) distribution (for p=1) and in the limit case also the uniform (p → ∞) for the shape parameter p.
These correspond to the $L_p$ norms when used for regularization and in other cases.
For examples:
y ~ normal(X*beta, sigma);
produces the criterion minimize $L_2$ norm of (X*beta-y) (a.k.a. least squares). But then
y ~ double_exponential(X*beta, sigma);
produces the criterion minimize $L_1$ norm of (X*beta-y). (a.k.a. least absolute deviations)
Similarly, in the Bayesian interpretation of ridge and LASSO,
beta ~ normal(0, lambda);
y ~ normal(X*beta, sigma);
produces the $L_2$ regularized ridge and
beta ~ double_exponential(0, lambda);
y ~ normal(X*beta, sigma);
produces the $L_1$ regularized LASSO.
Using the Generalized normal distribution would allow to conveniently use an arbitrary $L_p$ norm for the optimization criterion, or to even find the suitable value of p when used as a parameter.
Description
Implement Generalized normal distribution https://en.wikipedia.org/wiki/Generalized_normal_distribution
Why this is useful?
It generalizes the normal (for p=2) and double exponential (Laplace) distribution (for p=1) and in the limit case also the uniform (p → ∞) for the shape parameter p.
These correspond to the$L_p$ norms when used for regularization and in other cases.
For examples:
produces the criterion minimize$L_2$ norm of (X*beta-y) (a.k.a. least squares). But then
produces the criterion minimize$L_1$ norm of (X*beta-y). (a.k.a. least absolute deviations)
Similarly, in the Bayesian interpretation of ridge and LASSO,
produces the$L_2$ regularized ridge and
produces the$L_1$ regularized LASSO.
Using the Generalized normal distribution would allow to conveniently use an arbitrary$L_p$ norm for the optimization criterion, or to even find the suitable value of p when used as a parameter.