Skip to content

Commit b2ce059

Browse files
authored
Merge pull request #28 from LinearBoost/v0.2.0
SEFRBoost added, security bug fixes
2 parents 7a217e8 + 715ea89 commit b2ce059

5 files changed

Lines changed: 903 additions & 8 deletions

File tree

README.md

Lines changed: 30 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,15 @@
11
# LinearBoost Classifier
22

3-
![Latest Release](https://img.shields.io/badge/release-v0.1.9-green)
3+
![Latest Release](https://img.shields.io/badge/release-v0.2.0-green)
44
[![PyPI Version](https://img.shields.io/pypi/v/linearboost)](https://pypi.org/project/linearboost/)
55
![Python Versions](https://img.shields.io/badge/python-3.8%20%7C%203.9%20%7C%203.10%20%7C%203.11%20%7C%203.12%20%7C%203.13-blue)
66
[![PyPI Downloads](https://static.pepy.tech/badge/linearboost)](https://pepy.tech/projects/linearboost)
77

8-
## 🧪 Quickstart Demo
8+
**Current release: v0.2.0.** This version adds **SEFRBoost**—gradient boosting with linear SEFR splits at tree nodes—via `SEFRBoostClassifier` and `SEFRBoostRegressor` (`from linearboost import …` or `linearboost.sefr_boost`). **LinearBoost security issues have been updated** in this release; upgrade from earlier versions to stay patched.
9+
10+
## 🧪 Quickstart demos
11+
12+
### LinearBoost
913

1014
Want to see how LinearBoost works in practice?
1115

@@ -16,6 +20,10 @@ This Jupyter notebook shows how to:
1620
- Train `LinearBoostClassifier`
1721
- Evaluate using F1 score and cross-validation
1822

23+
### SEFRBoost
24+
25+
- **[`optuna_sefrboost_demo.ipynb`](notebooks/optuna_sefrboost_demo.ipynb)** — Hyperparameter search for **`SEFRBoostClassifier`** with **[Optuna](https://optuna.org/)** on sklearn’s Breast Cancer Wisconsin data: default baseline, then **5-fold stratified CV** optimizing F1 (install `optuna` in addition to `linearboost` and `scikit-learn`).
26+
1927
LinearBoost is a fast and accurate classification algorithm built to enhance the performance of the linear classifier SEFR. It combines efficiency and accuracy, delivering state-of-the-art F1 scores and classification performance.
2028

2129
In benchmarks across seven well-known datasets, LinearBoost:
@@ -32,11 +40,30 @@ Key Features:
3240

3341
---
3442

43+
## 🚀 New in Version 0.2.0
44+
45+
### SEFRBoost
46+
47+
SEFRBoost provides binary classification and regression with shallow trees whose internal nodes use **SEFR** hyperplane splits on pseudo-residuals—an oblique-split GBDT-style alternative that lives alongside `LinearBoostClassifier`.
48+
49+
```python
50+
from linearboost import SEFRBoostClassifier, SEFRBoostRegressor
51+
# or: from linearboost.sefr_boost import SEFRBoostClassifier, SEFRBoostRegressor
52+
```
53+
54+
Hands-on examples: [`notebooks/optuna_sefrboost_demo.ipynb`](notebooks/optuna_sefrboost_demo.ipynb) (Optuna tuning).
55+
56+
### Security updates
57+
58+
LinearBoost-related security issues are addressed in v0.2.0. Upgrade from older releases.
59+
60+
---
61+
3562
## 🚀 New in Version 0.1.9
3663

3764
### Security Updates
3865

39-
Version 0.1.9 includes security updates. We recommend upgrading from earlier versions to stay current.
66+
Version 0.1.9 included security updates. For the latest fixes, use **v0.2.0** or newer.
4067

4168
---
4269

Lines changed: 186 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,186 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"id": "7fb27b941602401d91542211134fc71a",
6+
"metadata": {},
7+
"source": [
8+
"# Optuna + `SEFRBoostClassifier`\n",
9+
"\n",
10+
"Tune [`SEFRBoostClassifier`](https://github.com/LinearBoost/linearboost-classifier) (gradient boosting with SEFR oblique splits) using [Optuna](https://optuna.org/) on sklearn’s **Breast Cancer Wisconsin** dataset (binary).\n",
11+
"\n",
12+
"**Install (if needed):** `pip install linearboost optuna scikit-learn` — or install this repo editable: `pip install -e .` from the repository root."
13+
]
14+
},
15+
{
16+
"cell_type": "code",
17+
"execution_count": null,
18+
"id": "acae54e37e7d407bbb7b55eff062a284",
19+
"metadata": {},
20+
"outputs": [],
21+
"source": [
22+
"import warnings\n",
23+
"\n",
24+
"import numpy as np\n",
25+
"import optuna\n",
26+
"from sklearn.datasets import load_breast_cancer\n",
27+
"from sklearn.metrics import f1_score, roc_auc_score\n",
28+
"from sklearn.model_selection import StratifiedKFold, train_test_split\n",
29+
"from sklearn.pipeline import Pipeline\n",
30+
"from sklearn.preprocessing import StandardScaler\n",
31+
"\n",
32+
"from linearboost import SEFRBoostClassifier\n",
33+
"\n",
34+
"warnings.filterwarnings(\"ignore\")\n",
35+
"optuna.logging.set_verbosity(optuna.logging.WARNING)"
36+
]
37+
},
38+
{
39+
"cell_type": "markdown",
40+
"id": "9a63283cbaf04dbcab1f6479b197f3a8",
41+
"metadata": {},
42+
"source": [
43+
"## 1. Load data and train / test split\n",
44+
"\n",
45+
"`SEFRBoostClassifier` expects **dense numeric** input; we use `StandardScaler` in a pipeline."
46+
]
47+
},
48+
{
49+
"cell_type": "code",
50+
"execution_count": null,
51+
"id": "8dd0d8092fe74a7c96281538738b07e2",
52+
"metadata": {},
53+
"outputs": [],
54+
"source": [
55+
"X, y = load_breast_cancer(return_X_y=True)\n",
56+
"X_train, X_test, y_train, y_test = train_test_split(\n",
57+
" X, y, test_size=0.25, stratify=y, random_state=42\n",
58+
")\n",
59+
"print(\"Train:\", X_train.shape, \"Test:\", X_test.shape, \"Classes:\", np.unique(y))"
60+
]
61+
},
62+
{
63+
"cell_type": "markdown",
64+
"id": "72eea5119410473aa328ad9291626812",
65+
"metadata": {},
66+
"source": [
67+
"## 2. Quick baseline (default hyperparameters)\n",
68+
"\n",
69+
"`Pipeline(StandardScaler → SEFRBoostClassifier)`."
70+
]
71+
},
72+
{
73+
"cell_type": "code",
74+
"execution_count": null,
75+
"id": "8edb47106e1a46a883d545849b8ab81b",
76+
"metadata": {},
77+
"outputs": [],
78+
"source": [
79+
"baseline = Pipeline(\n",
80+
" [\n",
81+
" (\"scale\", StandardScaler()),\n",
82+
" (\"clf\", SEFRBoostClassifier(n_estimators=50, random_state=42)),\n",
83+
" ]\n",
84+
")\n",
85+
"baseline.fit(X_train, y_train)\n",
86+
"y_pred = baseline.predict(X_test)\n",
87+
"y_proba = baseline.predict_proba(X_test)[:, 1]\n",
88+
"print(\"Baseline F1 (weighted):\", f1_score(y_test, y_pred, average=\"weighted\"))\n",
89+
"print(\"Baseline ROC-AUC:\", roc_auc_score(y_test, y_proba))"
90+
]
91+
},
92+
{
93+
"cell_type": "markdown",
94+
"id": "10185d26023b46108eb7d9f57d49d2b3",
95+
"metadata": {},
96+
"source": [
97+
"## 3. Optuna: maximize cross-validated F1\n",
98+
"\n",
99+
"Objective: suggest tree size, learning rate, depth, leaf constraints, and subsample; evaluate with **5-fold stratified CV** on the training set only (fast enough for local runs)."
100+
]
101+
},
102+
{
103+
"cell_type": "code",
104+
"execution_count": null,
105+
"id": "8763a12b2bbd4a93a75aff182afb95dc",
106+
"metadata": {},
107+
"outputs": [],
108+
"source": [
109+
"cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)\n",
110+
"\n",
111+
"\n",
112+
"def objective(trial: optuna.Trial) -> float:\n",
113+
" params = {\n",
114+
" \"n_estimators\": trial.suggest_int(\"n_estimators\", 20, 150),\n",
115+
" \"learning_rate\": trial.suggest_float(\"learning_rate\", 0.02, 0.3, log=True),\n",
116+
" \"max_depth\": trial.suggest_int(\"max_depth\", 2, 6),\n",
117+
" \"min_samples_leaf\": trial.suggest_int(\"min_samples_leaf\", 5, 40),\n",
118+
" \"min_samples_split\": trial.suggest_int(\"min_samples_split\", 10, 80),\n",
119+
" \"subsample\": trial.suggest_float(\"subsample\", 0.6, 1.0),\n",
120+
" \"random_state\": 42,\n",
121+
" }\n",
122+
" pipe = Pipeline(\n",
123+
" [\n",
124+
" (\"scale\", StandardScaler()),\n",
125+
" (\"clf\", SEFRBoostClassifier(**params)),\n",
126+
" ]\n",
127+
" )\n",
128+
" scores = []\n",
129+
" for train_idx, val_idx in cv.split(X_train, y_train):\n",
130+
" pipe.fit(X_train[train_idx], y_train[train_idx])\n",
131+
" pred = pipe.predict(X_train[val_idx])\n",
132+
" scores.append(f1_score(y_train[val_idx], pred, average=\"weighted\"))\n",
133+
" return float(np.mean(scores))\n",
134+
"\n",
135+
"\n",
136+
"study = optuna.create_study(direction=\"maximize\")\n",
137+
"study.optimize(objective, n_trials=30, show_progress_bar=True)\n",
138+
"print(\"Best trial:\", study.best_trial.number, \"F1 (CV mean):\", study.best_value)\n",
139+
"print(\"Best params:\", study.best_params)"
140+
]
141+
},
142+
{
143+
"cell_type": "markdown",
144+
"id": "7623eae2785240b9bd12b16a66d81610",
145+
"metadata": {},
146+
"source": [
147+
"## 4. Fit tuned model on full training set and evaluate on held-out test"
148+
]
149+
},
150+
{
151+
"cell_type": "code",
152+
"execution_count": null,
153+
"id": "7cdc8c89c7104fffa095e18ddfef8986",
154+
"metadata": {},
155+
"outputs": [],
156+
"source": [
157+
"best = study.best_params.copy()\n",
158+
"best[\"random_state\"] = 42\n",
159+
"tuned = Pipeline(\n",
160+
" [\n",
161+
" (\"scale\", StandardScaler()),\n",
162+
" (\"clf\", SEFRBoostClassifier(**best)),\n",
163+
" ]\n",
164+
")\n",
165+
"tuned.fit(X_train, y_train)\n",
166+
"y_pred_t = tuned.predict(X_test)\n",
167+
"y_proba_t = tuned.predict_proba(X_test)[:, 1]\n",
168+
"print(\"Tuned F1 (weighted):\", f1_score(y_test, y_pred_t, average=\"weighted\"))\n",
169+
"print(\"Tuned ROC-AUC:\", roc_auc_score(y_test, y_proba_t))"
170+
]
171+
}
172+
],
173+
"metadata": {
174+
"kernelspec": {
175+
"display_name": "Python 3",
176+
"language": "python",
177+
"name": "python3"
178+
},
179+
"language_info": {
180+
"name": "python",
181+
"version": "3.11.0"
182+
}
183+
},
184+
"nbformat": 4,
185+
"nbformat_minor": 5
186+
}

src/linearboost/__init__.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,12 @@
1-
__version__ = "0.1.9"
1+
__version__ = "0.2.0"
22

33
from .linear_boost import LinearBoostClassifier
44
from .sefr import SEFR
5+
from .sefr_boost import SEFRBoostClassifier, SEFRBoostRegressor
56

67
__all__ = [
78
"LinearBoostClassifier",
89
"SEFR",
10+
"SEFRBoostClassifier",
11+
"SEFRBoostRegressor",
912
]

0 commit comments

Comments
 (0)