Skip to content

Commit 74f04aa

Browse files
committed
first commit
0 parents  commit 74f04aa

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

88 files changed

+11171
-0
lines changed

.DS_Store

10 KB
Binary file not shown.

.codel/codel.ini

Whitespace-only changes.

LICENSE.txt

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) [2021] [Iain Carmichael]
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

Readme.md

Lines changed: 137 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,137 @@
1+
# Yet another generalized linear model package
2+
3+
`ya_glm` aims to give you a fast, easy to use and flexible package for fitting a wide variety of penalized *generalized linear models* (GLM). Existing packages (e.g. [sklearn](https://scikit-learn.org/stable/), [lightning](https://github.com/scikit-learn-contrib/lightning), [statsmodels](https://www.statsmodels.org/), [glmnet](https://glmnet.stanford.edu/articles/glmnet.html), [pyglmenet](https://github.com/glm-tools/pyglmnet)) accomplish the first two of these goals, but are not easy to customize and support a limited number GLM + penalty combinations.
4+
5+
6+
**Beware**: this is a preliminary release of the package; the documentation and testing may leave you wanting and the code may be subject to breaking changes in the near future.
7+
8+
9+
10+
# Installation
11+
`ya_glm` can be installed via github
12+
```
13+
git clone https://github.com/idc9/ya_glm.git
14+
python setup.py install
15+
```
16+
17+
To use the backend from [andersoncd](https://github.com/mathurinm/andersoncd) you have to install their package manually -- see their github page.
18+
19+
20+
# Example
21+
22+
```python
23+
from sklearn.datasets import make_regression
24+
25+
from ya_glm.backends.fista.LinearRegression import Lasso, LassoCV,\
26+
RidgeCV, LassoENetCV, GroupLassoENetCV, FcpLLACV
27+
28+
# sample some linear regression data
29+
X, y = make_regression(n_samples=100, n_features=20)
30+
31+
# fit the Lasso penalized linear regression model we all known and love
32+
est = Lasso(pen_val=1).fit(X, y)
33+
34+
# tune the Lasso penalty using cross-validation
35+
# just as in sklearn.linear_model.LassoCV we use a
36+
# path algorithm to make this fast and set the tuning
37+
# parameter sequence with a sensible default
38+
est_cv = LassoCV(cv_select_rule='1se').fit(X, y)
39+
40+
# or you could have picked a different penalty!
41+
# est_cv = RidgeCV().fit(X, y)
42+
# est_cv = LassoENetCV().fit(X, y)
43+
44+
# we support user specified groups!
45+
groups = [np.arange(10), np.arange(10, 20)]
46+
est_groups = GroupLassoENetCV(groups).fit(X, y)
47+
48+
49+
# folded concave penalty, tuned with cross-validation
50+
# and initialized from the LassoCV solution
51+
# see (Fan et al. 2014) for details
52+
est_concave_cv = FcpLLACV(init=est_cv,
53+
pen_func='scad').fit(X, y)
54+
```
55+
56+
Se the [docs/](docs/) folder for additional examples in jupyter notebooks.
57+
58+
59+
# Currently supported features
60+
61+
We currently support the following loss functions
62+
63+
- linear regression
64+
- logistic regression
65+
- linear regression with multiple responses
66+
67+
and the following penalties
68+
69+
- Lasso (optionally with weights)
70+
- Group Lasso with user specified groups
71+
- Elastic net
72+
- Ridge
73+
- Tikhonov
74+
- Nuclear norm
75+
- Row lasso (i.e. L1 to L2 norm)
76+
- Folded concave penalties (FCP) such as SCAD
77+
78+
The FCP penalties are fit by applying the *local linear approximation* (LLA) algorithm to a "good enough" initializer such as the Lasso fit. See (Fan et al, 2014) for details.
79+
80+
We also provided built in cross-validation (CV) for each of these penalties. For Lasso, Ridge and ElasticNet CV methods use faster path algorithms (as in sklearn.linear_model.LassoCV). Our CV function allow custom metrics and custom selection rules such as the '1se' rule from the glmnet package.
81+
82+
We aim to add additional loss functions including poisson, multinomial, gamma, huber, and cox regression.
83+
84+
85+
# What we provide on the backend
86+
87+
88+
- Cross-validation support for
89+
90+
- path algorithms with parallelization over folds
91+
- custom evaluation metrics
92+
- custom CV selection rule (e.g. the '1se' rule)
93+
- automatically generated tuning parameter sequence from the training data for both Lasso and concave penalties. This requires computing the largest reasonable penalty value for different combinations of loss + penalty.
94+
- see e.g. `ya_glm.GlmCV`, `ya_glm.fcp.GlmFcpCV`
95+
96+
- A flexible and reasonably fast FISTA algorithm (Beck and Tebouule, 2009) for GLM loss + non-smooth penalty problems
97+
- see `ya_glm.opt`
98+
- this module is inspired by [pyunlocbox](https://github.com/epfl-lts2/pyunlocbox) and [lightning](https://github.com/scikit-learn-contrib/lightning)
99+
100+
101+
- Support for concave penalties such as SCAD
102+
- fit using the LLA algorithm (Zou and Li, 2008; Fan et al. 2014)
103+
- see `ya_glm.lla`
104+
- the LLA algorithm only needs you to provide a solver for GLM + weight Lasso problems
105+
106+
107+
- Support for customization
108+
- straightforward to swap in your favorite GLM solver
109+
- see `ya_glm.backends.celer` for an example
110+
- cross-validation tuning
111+
- monitor various metrics for cross-validation path
112+
113+
114+
# Help and Support
115+
116+
Additional documentation, examples and code revisions are coming soon.
117+
For questions, issues or feature requests please reach out to Iain:
118+
idc9@cornell.edu.
119+
120+
121+
122+
## Contributing
123+
124+
We welcome contributions to make this a stronger package: data examples,
125+
bug fixes, spelling errors, new features, etc.
126+
127+
128+
129+
130+
# References
131+
132+
Zou, H. and Li, R., 2008. [One-step sparse estimates in nonconcave penalized likelihood models](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2759727/). Annals of statistics, 36(4), p.1509.
133+
134+
Beck, A. and Teboulle, M., 2009. [A fast iterative shrinkage-thresholding algorithm for linear inverse problems](https://epubs.siam.org/doi/pdf/10.1137/080716542?casa_token=cjyK5OxcbSoAAAAA:lQOp0YAVKIOv2-vgGUd_YrnZC9VhbgWvZgj4UPbgfw8I7NV44K82vbIu0oz2-xAACBz9k0Lclw). SIAM journal on imaging sciences, 2(1), pp.183-202.
135+
136+
137+
Fan, J., Xue, L. and Zou, H., 2014. [Strong oracle optimality of folded concave penalized estimation](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4295817/). Annals of statistics, 42(3), p.819.

docs/.DS_Store

6 KB
Binary file not shown.
Lines changed: 158 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,158 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"The `ya_glm` package is written to facilitate adding new optimization solvers. To add your favorite solver you have to first wrap your function so it is compatible with the `ya_glm` code. In particular, you have to provide a `solve_glm` function and a `solve_glm_path` function e.g. see `ya_glm.backends.fista.glm_solver`.\n",
8+
"\n",
9+
"To see how `solve_glm` is called take a look at the `_compute_fit` function in `ya_glm.Glm.GlmENet`.\n",
10+
"\n",
11+
"Once you have implemented `solve_glm` and `solve_glm_path` take a look at `ya_glm.backends.fista.LinearRegression` to see how to put all the pieces together."
12+
]
13+
},
14+
{
15+
"cell_type": "markdown",
16+
"metadata": {},
17+
"source": [
18+
"# solve_glm"
19+
]
20+
},
21+
{
22+
"cell_type": "code",
23+
"execution_count": 3,
24+
"metadata": {},
25+
"outputs": [],
26+
"source": [
27+
"def solve_glm(X, y,\n",
28+
" loss_func='linear_regression',\n",
29+
" fit_intercept=True,\n",
30+
" lasso_pen=None,\n",
31+
" lasso_weights=None,\n",
32+
" L2_pen=None,\n",
33+
" L2_weights=None,\n",
34+
" **kws):\n",
35+
" \"\"\"\n",
36+
" Fits a L1 and/or L2 penalized GLM problem\n",
37+
" \n",
38+
" min_{coef, intercept} 1/n sum_{i=1}^n L(y_i, x_i^T coef + intercept) \n",
39+
" + L1_pen * sum_{j=1}^d L1_weights_j |coef_j|\n",
40+
" + 0.5 * L2_pen * sum_{j=1}^d L2_weights_j |coef_j|^2\n",
41+
" \n",
42+
" Parameters\n",
43+
" -----------\n",
44+
" X: array-like, shape (n_samples, n_features)\n",
45+
" The training covariate data.\n",
46+
"\n",
47+
" y: array-like, shape (n_samples, )\n",
48+
" The training response data.\n",
49+
"\n",
50+
" loss_func: str\n",
51+
" Which GLM loss function to use.\n",
52+
"\n",
53+
" fit_intercept: bool\n",
54+
" Whether or not to fit an intercept.\n",
55+
"\n",
56+
" lasso_pen: None, float\n",
57+
" (Optional) The L1 penalty parameter value.\n",
58+
"\n",
59+
" lasso_weights: None, array-like, shape (n_features, )\n",
60+
" (Optional) The L1 penalty feature weights.\n",
61+
"\n",
62+
" L2_pen: None, float\n",
63+
" (Optional) The L2 penalty parameter value.\n",
64+
"\n",
65+
" L2_weights: None, array-like, shape (n_features, )\n",
66+
" (Optional) The L2 penalty feature weights.\n",
67+
" \n",
68+
" **kws: \n",
69+
" Additional keyword arguments for optimization algorithm e.g. number of steps.\n",
70+
"\n",
71+
" Output\n",
72+
" ------\n",
73+
" coef, intercept, opt_data\n",
74+
"\n",
75+
" coef: array-like, shape (n_features, )\n",
76+
" The estimated coefficient.\n",
77+
"\n",
78+
" intercept: None or float\n",
79+
" The estimated intercept -- if one was requested.\n",
80+
"\n",
81+
" opt_data: dict\n",
82+
" The optmization history output.\n",
83+
" \"\"\"\n",
84+
"\n",
85+
" # YOUR ALGORITHM GOES HERE\n",
86+
" coef = None\n",
87+
" intercept = None\n",
88+
" opt_data = {}\n",
89+
" \n",
90+
" return coef, intercept, opt_data\n",
91+
"\n"
92+
]
93+
},
94+
{
95+
"cell_type": "markdown",
96+
"metadata": {},
97+
"source": [
98+
"# solve_glm_path"
99+
]
100+
},
101+
{
102+
"cell_type": "code",
103+
"execution_count": 1,
104+
"metadata": {},
105+
"outputs": [],
106+
"source": [
107+
"def solve_glm_path(lasso_pen_seq=None, L2_pen_seq=None, **kws):\n",
108+
" \"\"\"\n",
109+
" Fits a GLM along a tuning parameter path.\n",
110+
" \n",
111+
" Parameters\n",
112+
" ----------\n",
113+
" lasso_pen_seq: None, list\n",
114+
" The (optional) L1 penalty parameter sequence.\n",
115+
" \n",
116+
" L2_pen_seq: None, list\n",
117+
" The (optional) L2 penalty parameter sequence.\n",
118+
" \n",
119+
" **kws:\n",
120+
" Any keyword argument to solve_glm\n",
121+
" \n",
122+
" Output\n",
123+
" ------\n",
124+
" TODO: decide on this!\n",
125+
" \"\"\"\n",
126+
" pass"
127+
]
128+
},
129+
{
130+
"cell_type": "code",
131+
"execution_count": null,
132+
"metadata": {},
133+
"outputs": [],
134+
"source": []
135+
}
136+
],
137+
"metadata": {
138+
"kernelspec": {
139+
"display_name": "Python [conda env:repro_lap_reg] *",
140+
"language": "python",
141+
"name": "conda-env-repro_lap_reg-py"
142+
},
143+
"language_info": {
144+
"codemirror_mode": {
145+
"name": "ipython",
146+
"version": 3
147+
},
148+
"file_extension": ".py",
149+
"mimetype": "text/x-python",
150+
"name": "python",
151+
"nbconvert_exporter": "python",
152+
"pygments_lexer": "ipython3",
153+
"version": "3.6.12"
154+
}
155+
},
156+
"nbformat": 4,
157+
"nbformat_minor": 4
158+
}

0 commit comments

Comments
 (0)