Skip to content

Implementation of the multinomial generalized linear model (softmax link)#3319

Open
jachymb wants to merge 7 commits intostan-dev:developfrom
jachymb:jachym_multinomial2
Open

Implementation of the multinomial generalized linear model (softmax link)#3319
jachymb wants to merge 7 commits intostan-dev:developfrom
jachymb:jachym_multinomial2

Conversation

@jachymb
Copy link
Copy Markdown

@jachymb jachymb commented May 10, 2026

Summary

This is the implementation of the multinomial_logit_glm_lpmf, including the OpenCL variant. A solution to #3149

It computes, in an efficient vectorized way, exactly what one would expect:

$\mathrm{multinomialLogitGLM} (y|x, \alpha, \beta) = \prod_n \mathrm{multinomial}(y_n \mid \mathrm{softmax}(x_n \beta + \alpha_n))$

I was trying to closely follow patterns in related functions like the binomial_logit_glm_lmpf and categorical_logit_glm_lpmf which this is a generalization of both.

The typing:

  • observations y are a 2D array [,] int non-negative (rows=instances, columns=outcome classes)
  • thus the predictors x are matrix(rows=instances, columns=features)
  • the coefficients beta are a matrix (rows=features, columns=outcome classes)
  • then the intercept alpha may be either a matrix (same shape as y) or a row_vector (length=outcome classes). Now this may seem a little irregular at first, but it makes sense when you really think about it. Other GLMs offer either a vector or real for the intercept. In the multivariate case, the (univariate) vector corresponds to the matrix and the scalar to be copied for each instance is more naturally represented here as a row_vector rather than a column. (It can be seen as an extra row of beta corresponding to a constant feature, also matching the shape of y and in consequence, the implementation algebra is just more natural this way.) However this may be argued to be a little inconsistent with categorical_logit_glm_lpmf so it's maybe up to a debate, but then the categorical is not implemented as multivariate even though it also uses beta matrix.

I chose to use "logit" in the name instead of "softmax" (even though it really is softmax) to be consistent with existing functions like multinomial_logit.

I am new to this, so if there are any improvements to be made, I will gladly listen. I tried to use efficient vectorized approach and utilize existing functions where applicable. Note however this uses its own softmax implementation, which is necessary for efficient vectorization and in my understanding can't be done by the existing library implementation the way it's needed here. Maybe if my other PR #3313 is approved, this could be made a few LOC shorter with a call to that.

As a personal note, this is a function I want available in Stan for my actual work. :)

Tests

Standard testing of the new functions introduced, both for the prim and cl variants.

Side Effects

None.

Release notes

Add multinomial_logit_glm_lpmf, including OpenCL support.

AI use disclosure

I used claude code w/ Sonnet to help with the work, but I critically reviewed/edited every single line, striving to match the general code quality and patterns within the library.

Checklist

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants