Skip to content

Gram-based CD/BCD/FISTA solvers for (group)Lasso when n_samples >> n_features#4

Draft
PABannier wants to merge 23 commits into
mainfrom
gram_solver
Draft

Gram-based CD/BCD/FISTA solvers for (group)Lasso when n_samples >> n_features#4
PABannier wants to merge 23 commits into
mainfrom
gram_solver

Conversation

@PABannier

@PABannier PABannier commented Apr 20, 2022

Copy link
Copy Markdown
Collaborator

The goal of this PR is to write a CD (BCD) solver when n_samples >> n_features.
Such configurations are solved much faster by pre-computing a Gram matrix XtX and updating the gradient (rather than the residuals) at every CD cycle.

A quick experiment with 1e6 samples and 300 features:

########
Lasso
########

Celer: 5.43s
Gram: 1.89s

###########
Group Lasso
###########

Celer: 43.41s
Gram: 4.23s

@mathurinm mathurinm marked this pull request as draft April 21, 2022 06:20
@mathurinm mathurinm changed the title Faster CD solver when n_samples >> n_features Implement Gram-based CD/BCD solver when n_samples >> n_features Apr 21, 2022
@mathurinm

Copy link
Copy Markdown
Collaborator

@PABannier thanks a lot ! I tried it and it really shines on the data by @sehoff

Caveat: the stopping criteria are not the same. Monitoring the primal decrease is much looser than checking the duality gap. I adjusted the tolerance manually to obtain similar results

@sehoff

sehoff commented Apr 22, 2022

Copy link
Copy Markdown

Thanks a lot! I tried it out (on slightly different data than I provided), and can confirm sizable speedups. I use warm-starts for celer, so I guess the figures are even biased towards celer.

Note: in the current setting of the gram-solver: res = gram_group_lasso(X, y, a, groups=grps, tol=1e-10, max_iter=10_000, check_freq=10) with very small alphas (alpha_max/1000), the solver does not convergence after max_iter.

Do you have any suggestion on a reasonable decrease in the tolerance? Because for tol=1e-9 it reaches convergence after ~3500 iterations.
speed_comparison_celer_gram
.

Comment thread skglm/solvers/gram.py Outdated
@mathurinm

Copy link
Copy Markdown
Collaborator

@sehoff hard to tell without the data but consider increasing max_iter, probably the solver is not very far from convergence.You can run it in verbose mode to see if you stop far from convergence or not.

It's easy to add warm start to the gram solvers, it should be done shortly. Beware in your comparison that the stopping criteria are not the same.

@PABannier

PABannier commented Apr 24, 2022

Copy link
Copy Markdown
Collaborator Author

Quick update:

  • Updated stopping criterion to duality gap
  • Gram CD
  • Gram BCD
  • Gram CD - FISTA
  • Gram BCD - FISTA

Comment thread skglm/solvers/gram.py
Comment thread skglm/solvers/gram.py
Comment thread skglm/solvers/gram.py Outdated
Comment thread skglm/solvers/gram.py
Comment thread skglm/solvers/gram.py Outdated
@mathurinm mathurinm changed the title Implement Gram-based CD/BCD solver when n_samples >> n_features Gram-based CD/BCD/FISTA solvers for (group)Lasso when n_samples >> n_features Apr 25, 2022
@mathurinm

mathurinm commented Apr 25, 2022

Copy link
Copy Markdown
Collaborator

@sehoff support for weights (inifnite ones too) is here. Let us know how it works !

@sehoff

sehoff commented Apr 26, 2022

Copy link
Copy Markdown

@sehoff hard to tell without the data but consider increasing max_iter, probably the solver is not very far from convergence.You can run it in verbose mode to see if you stop far from convergence or not.

It's easy to add warm start to the gram solvers, it should be done shortly. Beware in your comparison that the stopping criteria are not the same.

Increasing max_iter works for me, thank you !

@PABannier

Copy link
Copy Markdown
Collaborator Author

@sehoff For very small alphas, I'd recommend using FISTA instead. Have a look at the gram_fista_group_lasso function!

@sehoff

sehoff commented Apr 26, 2022

Copy link
Copy Markdown

@sehoff support for weights (inifnite ones too) is here. Let us know how it works !
@sehoff For very small alphas, I'd recommend using FISTA instead. Have a look at the gram_fista_group_lasso function!

Here are the results of my comparisons, where I used the data I provided in the dropbox, and a tol=1e-8 in all solvers. Furthermore, the results I show refer to Case 3.1., i.e., with one weight set to infinity. As a side remark, this highlights that all three solvers can handle infinite weights!
Bottomline: especially for (very) small alphas the Gram-based solvers outperform the one in celer significantly. So depending on the grid one eventually searches, each solver has its advantages.
Comparison 1:
speed_comparison_celer_gram
Comparison 2: Note: I leave out celer here, because for alpha_max/1000 it takes quite long for convergence.
speed_comparison_bcd_fista

@PABannier

Copy link
Copy Markdown
Collaborator Author

@sehoff indeed, Celer is particularly efficient in settings where n_features >> n_samples. It implements a working set strategy, that is particularly useful in high-regularization regime (where there are few active features). In your first figure, for low level of regularization, since your design matrix has way more samples than features, Celer working set strategy is less useful.

Besides, for your data, the Gram solver has a cheaper update since X.T @ X is of size (n_features, n_features) (X.T @ X being a useful quantity at every gradient update).

@PABannier

Copy link
Copy Markdown
Collaborator Author

@sehoff Thanks for the plots. They are very insightful for us. By the way, if you ever find yourself in need for an automated way of benchmarking optimization routines, look at https://github.com/benchopt/benchopt

@PABannier

Copy link
Copy Markdown
Collaborator Author

@mathurinm do we want to keep this PR open? FISTA and Gram-solver have been merged

@mathurinm

Copy link
Copy Markdown
Collaborator

This one supports groups while the merged PR doesn't, so it does not hurt to leave it open IMO

@mathurinm

Copy link
Copy Markdown
Collaborator

note: this is a PR from the skglm repo, instead it should be done from a branch of your fork @PABannier

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants