Skip to content

Commit 7795389

Browse files
joblib fix
1 parent a6c3421 commit 7795389

70 files changed

Lines changed: 5424 additions & 8583 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.coveragerc

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
[run]
2-
branch = True
3-
source = skrebate
4-
include = */skrebate/*
1+
[run]
2+
branch = True
3+
source = skrebate
4+
include = */skrebate/*

.gitignore

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,3 +70,8 @@ testing.ipynb
7070

7171
*.prof
7272
/demo_scikitrebate.ipynb
73+
74+
*.DS_Store
75+
.idea/
76+
77+
analysis_pipeline/skrebatewip

.landscape.yaml

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
1-
doc-warnings: yes
2-
3-
ignore-patterns:
4-
- __init__.py
5-
1+
doc-warnings: yes
2+
3+
ignore-patterns:
4+
- __init__.py
5+

.project

Lines changed: 0 additions & 17 deletions
This file was deleted.

.pydevproject

Lines changed: 0 additions & 5 deletions
This file was deleted.

.settings/org.eclipse.core.resources.prefs

Lines changed: 0 additions & 10 deletions
This file was deleted.

.travis.yml

Lines changed: 18 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,18 @@
1-
language: python
2-
python:
3-
- "3.5"
4-
virtualenv:
5-
system_site_packages: true
6-
env:
7-
matrix:
8-
# let's start simple:
9-
- PYTHON_VERSION="3.5" LATEST="true"
10-
- PYTHON_VERSION="3.5" COVERAGE="true" LATEST="true"
11-
- PYTHON_VERSION="3.5" LATEST="true"
12-
install: source ./ci/.travis_install.sh
13-
script: bash ./ci/.travis_test.sh
14-
after_success:
15-
# Ignore coveralls failures as the coveralls server is not very reliable
16-
# but we don't want travis to report a failure in the github UI just
17-
# because the coverage report failed to be published.
18-
- if [[ "$COVERAGE" == "true" ]]; then coveralls || echo "failed"; fi
19-
cache: apt
20-
sudo: false
1+
language: python
2+
virtualenv:
3+
system_site_packages: true
4+
env:
5+
matrix:
6+
# let's start simple:
7+
- PYTHON_VERSION="2.7" LATEST="true"
8+
- PYTHON_VERSION="3.6" COVERAGE="true" LATEST="true"
9+
- PYTHON_VERSION="3.6" LATEST="true"
10+
install: source ./ci/.travis_install.sh
11+
script: bash ./ci/.travis_test.sh
12+
after_success:
13+
# Ignore coveralls failures as the coveralls server is not very reliable
14+
# but we don't want travis to report a failure in the github UI just
15+
# because the coverage report failed to be published.
16+
- if [[ "$COVERAGE" == "true" ]]; then coveralls || echo "failed"; fi
17+
cache: apt
18+
sudo: false

LICENSE

Lines changed: 21 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,21 @@
1-
The MIT License (MIT)
2-
3-
Copyright (c) 2016 Randal S. Olson and Ryan J. Urbanowicz
4-
5-
Permission is hereby granted, free of charge, to any person obtaining a copy
6-
of this software and associated documentation files (the "Software"), to deal
7-
in the Software without restriction, including without limitation the rights
8-
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9-
copies of the Software, and to permit persons to whom the Software is
10-
furnished to do so, subject to the following conditions:
11-
12-
The above copyright notice and this permission notice shall be included in all
13-
copies or substantial portions of the Software.
14-
15-
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16-
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17-
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18-
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19-
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20-
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21-
SOFTWARE.
1+
The MIT License (MIT)
2+
3+
Copyright (c) 2016 Randal S. Olson and Ryan J. Urbanowicz
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

MANIFEST.in

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,2 @@
1-
include LICENSE
2-
include README.md
1+
include LICENSE
2+
include README.md

README.md

Lines changed: 120 additions & 120 deletions
Original file line numberDiff line numberDiff line change
@@ -1,120 +1,120 @@
1-
Master status: [![Master Build Status](https://travis-ci.org/EpistasisLab/scikit-rebate.svg?branch=master)](https://travis-ci.org/EpistasisLab/scikit-rebate)
2-
[![Master Code Health](https://landscape.io/github/EpistasisLab/scikit-rebate/master/landscape.svg?style=flat)](https://landscape.io/github/EpistasisLab/scikit-rebate/master)
3-
[![Master Coverage Status](https://coveralls.io/repos/github/EpistasisLab/scikit-rebate/badge.svg?branch=master&service=github)](https://coveralls.io/github/EpistasisLab/scikit-rebate?branch=master)
4-
5-
Development status: [![Development Build Status](https://travis-ci.org/EpistasisLab/scikit-rebate.svg?branch=development)](https://travis-ci.org/EpistasisLab/scikit-rebate)
6-
[![Development Code Health](https://landscape.io/github/EpistasisLab/scikit-rebate/development/landscape.svg?style=flat)](https://landscape.io/github/EpistasisLab/scikit-rebate/development)
7-
[![Development Coverage Status](https://coveralls.io/repos/github/EpistasisLab/scikit-rebate/badge.svg?branch=development&service=github)](https://coveralls.io/github/EpistasisLab/scikit-rebate?branch=development)
8-
9-
Package information: ![Python 2.7](https://img.shields.io/badge/python-2.7-blue.svg)
10-
![Python 3.5](https://img.shields.io/badge/python-3.6-blue.svg)
11-
![License](https://img.shields.io/badge/license-MIT%20License-blue.svg)
12-
[![PyPI version](https://badge.fury.io/py/skrebate.svg)](https://badge.fury.io/py/skrebate)
13-
14-
# scikit-rebate
15-
This package includes a scikit-learn-compatible Python implementation of ReBATE, a suite of [Relief-based feature selection algorithms](https://en.wikipedia.org/wiki/Relief_(feature_selection)) for Machine Learning. These Relief-Based algorithms (RBAs) are designed for feature weighting/selection as part of a machine learning pipeline (supervised learning). Presently this includes the following core RBAs: ReliefF, SURF, SURF\*, MultiSURF\*, and MultiSURF. Additionally, an implementation of the iterative TuRF mechanism and VLSRelief is included. **It is still under active development** and we encourage you to check back on this repository regularly for updates.
16-
17-
These algorithms offer a computationally efficient way to perform feature selection that is sensitive to feature interactions as well as simple univariate associations, unlike most currently available filter-based feature selection methods. The main benefit of Relief algorithms is that they identify feature interactions without having to exhaustively check every pairwise interaction, thus taking significantly less time than exhaustive pairwise search.
18-
19-
Certain algorithms require user specified run parameters (e.g. ReliefF requires the user to specify some 'k' number of nearest neighbors).
20-
21-
Relief algorithms are commonly applied to genetic analyses, where epistasis (i.e., feature interactions) is common. However, the algorithms implemented in this package can be applied to almost any supervised classification data set and supports:
22-
23-
* Feature sets that are discrete/categorical, continuous-valued or a mix of both
24-
25-
* Data with missing values
26-
27-
* Binary endpoints (i.e., classification)
28-
29-
* Multi-class endpoints (i.e., classification)
30-
31-
* Continuous endpoints (i.e., regression)
32-
33-
Built into this code, is a strategy to 'automatically' detect from the loaded data, these relevant characteristics.
34-
35-
Of our two initial ReBATE software releases, this scikit-learn compatible version primarily focuses on ease of incorporation into a scikit learn analysis pipeline.
36-
This code is most appropriate for scikit-learn users, Windows operating system users, beginners, or those looking for the most recent ReBATE developments.
37-
38-
An alternative 'stand-alone' version of [ReBATE](https://github.com/EpistasisLab/ReBATE) is also available that focuses on improving run-time with the use of Cython for optimization. This implementation also outputs feature names and associated feature scores as a text file by default.
39-
40-
## License
41-
42-
Please see the [repository license](https://github.com/EpistasisLab/scikit-rebate/blob/master/LICENSE) for the licensing and usage information for scikit-rebate.
43-
44-
Generally, we have licensed scikit-rebate to make it as widely usable as possible.
45-
46-
## Installation
47-
48-
scikit-rebate is built on top of the following existing Python packages:
49-
50-
* NumPy
51-
52-
* SciPy
53-
54-
* scikit-learn
55-
56-
All of the necessary Python packages can be installed via the [Anaconda Python distribution](https://www.continuum.io/downloads), which we strongly recommend that you use. We also strongly recommend that you use Python 3 over Python 2 if you're given the choice.
57-
58-
NumPy, SciPy, and scikit-learn can be installed in Anaconda via the command:
59-
60-
```
61-
conda install numpy scipy scikit-learn
62-
```
63-
64-
Once the prerequisites are installed, you should be able to install scikit-rebate with a pip command:
65-
66-
```
67-
pip install skrebate
68-
```
69-
70-
Please [file a new issue](https://github.com/EpistasisLab/scikit-rebate/issues/new) if you run into installation problems.
71-
72-
## Usage
73-
74-
We have designed the Relief algorithms to be integrated directly into scikit-learn machine learning workflows. For example, the ReliefF algorithm can be used as a feature selection step in a scikit-learn pipeline as follows.
75-
76-
```python
77-
import pandas as pd
78-
import numpy as np
79-
from sklearn.pipeline import make_pipeline
80-
from skrebate import ReliefF
81-
from sklearn.ensemble import RandomForestClassifier
82-
from sklearn.model_selection import cross_val_score
83-
84-
genetic_data = pd.read_csv('https://github.com/EpistasisLab/scikit-rebate/raw/master/data/'
85-
'GAMETES_Epistasis_2-Way_20atts_0.4H_EDM-1_1.tsv.gz',
86-
sep='\t', compression='gzip')
87-
88-
features, labels = genetic_data.drop('class', axis=1).values, genetic_data['class'].values
89-
90-
clf = make_pipeline(ReliefF(n_features_to_select=2, n_neighbors=100),
91-
RandomForestClassifier(n_estimators=100))
92-
93-
print(np.mean(cross_val_score(clf, features, labels)))
94-
>>> 0.795
95-
```
96-
97-
For more information on the Relief algorithms available in this package and how to use them, please refer to our [usage documentation](https://EpistasisLab.github.io/scikit-rebate/using/).
98-
99-
## Contributing to scikit-rebate
100-
101-
We welcome you to [check the existing issues](https://github.com/EpistasisLab/scikit-rebate/issues/) for bugs or enhancements to work on. If you have an idea for an extension to scikit-rebate, please [file a new issue](https://github.com/EpistasisLab/scikit-rebate/issues/new) so we can discuss it.
102-
103-
Please refer to our [contribution guidelines](https://EpistasisLab.github.io/scikit-rebate/contributing/) prior to working on a new feature or bug fix.
104-
105-
## Citing scikit-rebate
106-
107-
If you use scikit-rebate in a scientific publication, please consider citing the following paper:
108-
109-
Ryan J. Urbanowicz, Randal S. Olson, Peter Schmitt, Melissa Meeker, Jason H. Moore (2017). [Benchmarking Relief-Based Feature Selection Methods](https://arxiv.org/abs/1711.08477). *arXiv preprint*, under review.
110-
111-
BibTeX entry:
112-
113-
```bibtex
114-
@misc{Urbanowicz2017Benchmarking,
115-
author = {Urbanowicz, Ryan J. and Olson, Randal S. and Schmitt, Peter and Meeker, Melissa and Moore, Jason H.},
116-
title = {Benchmarking Relief-Based Feature Selection Methods},
117-
year = {2017},
118-
howpublished = {arXiv e-print. https://arxiv.org/abs/1711.08477},
119-
}
120-
```
1+
Master status: [![Master Build Status](https://travis-ci.org/EpistasisLab/scikit-rebate.svg?branch=master)](https://travis-ci.org/EpistasisLab/scikit-rebate)
2+
[![Master Code Health](https://landscape.io/github/EpistasisLab/scikit-rebate/master/landscape.svg?style=flat)](https://landscape.io/github/EpistasisLab/scikit-rebate/master)
3+
[![Master Coverage Status](https://coveralls.io/repos/github/EpistasisLab/scikit-rebate/badge.svg?branch=master&service=github)](https://coveralls.io/github/EpistasisLab/scikit-rebate?branch=master)
4+
5+
Development status: [![Development Build Status](https://travis-ci.org/EpistasisLab/scikit-rebate.svg?branch=development)](https://travis-ci.org/EpistasisLab/scikit-rebate)
6+
[![Development Code Health](https://landscape.io/github/EpistasisLab/scikit-rebate/development/landscape.svg?style=flat)](https://landscape.io/github/EpistasisLab/scikit-rebate/development)
7+
[![Development Coverage Status](https://coveralls.io/repos/github/EpistasisLab/scikit-rebate/badge.svg?branch=development&service=github)](https://coveralls.io/github/EpistasisLab/scikit-rebate?branch=development)
8+
9+
Package information: ![Python 2.7](https://img.shields.io/badge/python-2.7-blue.svg)
10+
![Python 3.5](https://img.shields.io/badge/python-3.6-blue.svg)
11+
![License](https://img.shields.io/badge/license-MIT%20License-blue.svg)
12+
[![PyPI version](https://badge.fury.io/py/skrebate.svg)](https://badge.fury.io/py/skrebate)
13+
14+
# scikit-rebate
15+
This package includes a scikit-learn-compatible Python implementation of ReBATE, a suite of [Relief-based feature selection algorithms](https://en.wikipedia.org/wiki/Relief_(feature_selection)) for Machine Learning. These Relief-Based algorithms (RBAs) are designed for feature weighting/selection as part of a machine learning pipeline (supervised learning). Presently this includes the following core RBAs: ReliefF, SURF, SURF\*, MultiSURF\*, and MultiSURF. Additionally, an implementation of the iterative TuRF mechanism and VLSRelief is included. **It is still under active development** and we encourage you to check back on this repository regularly for updates.
16+
17+
These algorithms offer a computationally efficient way to perform feature selection that is sensitive to feature interactions as well as simple univariate associations, unlike most currently available filter-based feature selection methods. The main benefit of Relief algorithms is that they identify feature interactions without having to exhaustively check every pairwise interaction, thus taking significantly less time than exhaustive pairwise search.
18+
19+
Certain algorithms require user specified run parameters (e.g. ReliefF requires the user to specify some 'k' number of nearest neighbors).
20+
21+
Relief algorithms are commonly applied to genetic analyses, where epistasis (i.e., feature interactions) is common. However, the algorithms implemented in this package can be applied to almost any supervised classification data set and supports:
22+
23+
* Feature sets that are discrete/categorical, continuous-valued or a mix of both
24+
25+
* Data with missing values
26+
27+
* Binary endpoints (i.e., classification)
28+
29+
* Multi-class endpoints (i.e., classification)
30+
31+
* Continuous endpoints (i.e., regression)
32+
33+
Built into this code, is a strategy to 'automatically' detect from the loaded data, these relevant characteristics.
34+
35+
Of our two initial ReBATE software releases, this scikit-learn compatible version primarily focuses on ease of incorporation into a scikit learn analysis pipeline.
36+
This code is most appropriate for scikit-learn users, Windows operating system users, beginners, or those looking for the most recent ReBATE developments.
37+
38+
An alternative 'stand-alone' version of [ReBATE](https://github.com/EpistasisLab/ReBATE) is also available that focuses on improving run-time with the use of Cython for optimization. This implementation also outputs feature names and associated feature scores as a text file by default.
39+
40+
## License
41+
42+
Please see the [repository license](https://github.com/EpistasisLab/scikit-rebate/blob/master/LICENSE) for the licensing and usage information for scikit-rebate.
43+
44+
Generally, we have licensed scikit-rebate to make it as widely usable as possible.
45+
46+
## Installation
47+
48+
scikit-rebate is built on top of the following existing Python packages:
49+
50+
* NumPy
51+
52+
* SciPy
53+
54+
* scikit-learn
55+
56+
All of the necessary Python packages can be installed via the [Anaconda Python distribution](https://www.continuum.io/downloads), which we strongly recommend that you use. We also strongly recommend that you use Python 3 over Python 2 if you're given the choice.
57+
58+
NumPy, SciPy, and scikit-learn can be installed in Anaconda via the command:
59+
60+
```
61+
conda install numpy scipy scikit-learn
62+
```
63+
64+
Once the prerequisites are installed, you should be able to install scikit-rebate with a pip command:
65+
66+
```
67+
pip install skrebate
68+
```
69+
70+
Please [file a new issue](https://github.com/EpistasisLab/scikit-rebate/issues/new) if you run into installation problems.
71+
72+
## Usage
73+
74+
We have designed the Relief algorithms to be integrated directly into scikit-learn machine learning workflows. For example, the ReliefF algorithm can be used as a feature selection step in a scikit-learn pipeline as follows.
75+
76+
```python
77+
import pandas as pd
78+
import numpy as np
79+
from sklearn.pipeline import make_pipeline
80+
from skrebate import ReliefF
81+
from sklearn.ensemble import RandomForestClassifier
82+
from sklearn.model_selection import cross_val_score
83+
84+
genetic_data = pd.read_csv('https://github.com/EpistasisLab/scikit-rebate/raw/master/data/'
85+
'GAMETES_Epistasis_2-Way_20atts_0.4H_EDM-1_1.tsv.gz',
86+
sep='\t', compression='gzip')
87+
88+
features, labels = genetic_data.drop('class', axis=1).values, genetic_data['class'].values
89+
90+
clf = make_pipeline(ReliefF(n_features_to_select=2, n_neighbors=100),
91+
RandomForestClassifier(n_estimators=100))
92+
93+
print(np.mean(cross_val_score(clf, features, labels)))
94+
>>> 0.795
95+
```
96+
97+
For more information on the Relief algorithms available in this package and how to use them, please refer to our [usage documentation](https://EpistasisLab.github.io/scikit-rebate/using/).
98+
99+
## Contributing to scikit-rebate
100+
101+
We welcome you to [check the existing issues](https://github.com/EpistasisLab/scikit-rebate/issues/) for bugs or enhancements to work on. If you have an idea for an extension to scikit-rebate, please [file a new issue](https://github.com/EpistasisLab/scikit-rebate/issues/new) so we can discuss it.
102+
103+
Please refer to our [contribution guidelines](https://EpistasisLab.github.io/scikit-rebate/contributing/) prior to working on a new feature or bug fix.
104+
105+
## Citing scikit-rebate
106+
107+
If you use scikit-rebate in a scientific publication, please consider citing the following paper:
108+
109+
Ryan J. Urbanowicz, Randal S. Olson, Peter Schmitt, Melissa Meeker, Jason H. Moore (2017). [Benchmarking Relief-Based Feature Selection Methods](https://arxiv.org/abs/1711.08477). *arXiv preprint*, under review.
110+
111+
BibTeX entry:
112+
113+
```bibtex
114+
@misc{Urbanowicz2017Benchmarking,
115+
author = {Urbanowicz, Ryan J. and Olson, Randal S. and Schmitt, Peter and Meeker, Melissa and Moore, Jason H.},
116+
title = {Benchmarking Relief-Based Feature Selection Methods},
117+
year = {2017},
118+
howpublished = {arXiv e-print. https://arxiv.org/abs/1711.08477},
119+
}
120+
```

0 commit comments

Comments
 (0)