You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/citation.md
+73-14Lines changed: 73 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,10 +1,10 @@
1
1
# Citing STREAMLINE
2
2
3
-
If you use STREAMLINE in a scientific publication, please consider citing the following paper:
3
+
If you use STREAMLINE in a scientific publication, please consider citing the following paper as well as noting the *release* applied within the manuscript (i.e. the Beta 0.2.4 release was applied in the publication below):
4
4
5
-
Urbanowicz, Ryan, et al. "STREAMLINE: A Simple, Transparent, End-To-End Automated Machine Learning Pipeline Facilitating Data Analysis and Algorithm Comparison." Genetic Programming Theory and Practice XIX. Singapore: Springer Nature Singapore, 2023. 201-231.
5
+
[Urbanowicz, Ryan, et al. "STREAMLINE: A Simple, Transparent, End-To-End Automated Machine Learning Pipeline Facilitating Data Analysis and Algorithm Comparison." Genetic Programming Theory and Practice XIX. Singapore: Springer Nature Singapore, 2023. 201-231.](https://link.springer.com/chapter/10.1007/978-981-19-8460-0_9)
6
6
7
-
BibTeX entry:
7
+
BibTeX Citation:
8
8
```
9
9
@incollection{urbanowicz2023streamline,
10
10
title={STREAMLINE: A Simple, Transparent, End-To-End Automated Machine Learning Pipeline Facilitating Data Analysis and Algorithm Comparison},
@@ -16,7 +16,7 @@ BibTeX entry:
16
16
}
17
17
```
18
18
19
-
If you wish to cite the STREAMLINE codebase, please use the following (indicating the release used in the link: for example, v0.2.5-beta:
19
+
If you wish to cite the STREAMLINE codebase instead, please use the following (indicating the release used in the link, for example, v0.2.5-beta):
20
20
```
21
21
@misc{streamline2022,
22
22
author = {Urbanowicz, Ryan and Zhang, Robert},
@@ -27,11 +27,67 @@ If you wish to cite the STREAMLINE codebase, please use the following (indicatin
This section provides citations to publications applying STREAMLINE in recent research.
32
+
33
+
*[Exploring Automated Machine Learning for Cognitive Outcome Prediction from Multimodal Brain Imaging using STREAMLINE](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10283099/)
34
+
```
35
+
@article{wang2023exploring,
36
+
title={Exploring Automated Machine Learning for Cognitive Outcome Prediction from Multimodal Brain Imaging using STREAMLINE},
37
+
author={Wang, Xinkai and Feng, Yanbo and Tong, Boning and Bao, Jingxuan and Ritchie, Marylyn D and Saykin, Andrew J and Moore, Jason H and Urbanowicz, Ryan and Shen, Li},
38
+
journal={AMIA Summits on Translational Science Proceedings},
39
+
volume={2023},
40
+
pages={544},
41
+
year={2023},
42
+
publisher={American Medical Informatics Association}
43
+
}
44
+
```
45
+
46
+
*[Comparing Amyloid Imaging Normalization Strategies for Alzheimer’s Disease Classification using an Automated Machine Learning Pipeline](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10283108/)
47
+
```
48
+
@article{tong2023comparing,
49
+
title={Comparing Amyloid Imaging Normalization Strategies for Alzheimer’s Disease Classification using an Automated Machine Learning Pipeline},
50
+
author={Tong, Boning and Risacher, Shannon L and Bao, Jingxuan and Feng, Yanbo and Wang, Xinkai and Ritchie, Marylyn D and Moore, Jason H and Urbanowicz, Ryan and Saykin, Andrew J and Shen, Li},
51
+
journal={AMIA Summits on Translational Science Proceedings},
52
+
volume={2023},
53
+
pages={525},
54
+
year={2023},
55
+
publisher={American Medical Informatics Association}
56
+
}
57
+
```
58
+
59
+
*[Toward Predicting 30-Day Readmission Among Oncology Patients: Identifying Timely and Actionable Risk Factors](https://ascopubs.org/doi/abs/10.1200/CCI.22.00097)
60
+
```
61
+
@article{hwang2023toward,
62
+
title={Toward Predicting 30-Day Readmission Among Oncology Patients: Identifying Timely and Actionable Risk Factors},
63
+
author={Hwang, Sy and Urbanowicz, Ryan and Lynch, Selah and Vernon, Tawnya and Bresz, Kellie and Giraldo, Carolina and Kennedy, Erin and Leabhart, Max and Bleacher, Troy and Ripchinski, Michael R and others},
64
+
journal={JCO Clinical Cancer Informatics},
65
+
volume={7},
66
+
pages={e2200097},
67
+
year={2023},
68
+
publisher={Wolters Kluwer Health}
69
+
}
70
+
```
71
+
72
+
*[Identifying Barriers to Post-Acute Care Referral and Characterizing Negative Patient Preferences Among Hospitalized Older Adults Using Natural Language Processing](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10148308/)
73
+
```
74
+
@inproceedings{kennedy2022identifying,
75
+
title={Identifying Barriers to Post-Acute Care Referral and Characterizing Negative Patient Preferences Among Hospitalized Older Adults Using Natural Language Processing},
76
+
author={Kennedy, Erin E and Davoudi, Anahita and Hwang, Sy and Freda, Philip J and Urbanowicz, Ryan and Bowles, Kathryn H and Mowery, Danielle L},
77
+
booktitle={AMIA Annual Symposium Proceedings},
78
+
volume={2022},
79
+
pages={606},
80
+
year={2022},
81
+
organization={American Medical Informatics Association}
82
+
}
83
+
```
84
+
85
+
## Other STREAMLINE Related Research
31
86
In developing STREAMLINE we integrated a number of methods and lessons learned from our lab's previous research. We briefly summarize and provide citations for each.
32
87
33
88
### A rigorous ML pipeline for binary classification
34
-
A preprint describing an early version of what would become STREAMLINE applied to pancreatic cancer.
89
+
A [preprint](https://arxiv.org/abs/2008.12829) describing an early version of what would become STREAMLINE applied to pancreatic cancer.
90
+
35
91
```
36
92
@article{urbanowicz2020rigorous,
37
93
title={A Rigorous Machine Learning Analysis Pipeline for Biomedical Binary Classification: Application in Pancreatic Cancer Nested Case-control Studies with Implications for Bias Assessments},
@@ -41,7 +97,7 @@ A preprint describing an early version of what would become STREAMLINE applied t
41
97
}
42
98
```
43
99
44
-
The STREAMLINE preprint.
100
+
The STREAMLINE [preprint](https://arxiv.org/abs/2206.12002).
45
101
```
46
102
@article{urbanowicz2022streamline,
47
103
title={STREAMLINE: A Simple, Transparent, End-To-End Automated Machine Learning Pipeline Facilitating Data Analysis and Algorithm Comparison},
@@ -52,7 +108,7 @@ The STREAMLINE preprint.
52
108
```
53
109
54
110
### Relief-based feature importance estimation
55
-
One of the two feature importance algorithms used by STREAMLINE is MultiSURF, a Relief-based filter feature importance algorithm that can prioritize features involved in either univariate or multivariate feature interactions associated with outcome. We believe that it is important to have at least one 'interaction-sensitive' feature importance algorithm involved in feature selection prior such that relevant features involved in complex associations are not filtered out prior to modeling. The paper below is an introduction and review of Relief-based algorithms.
111
+
One of the two feature importance algorithms used by STREAMLINE is MultiSURF, a Relief-based filter feature importance algorithm that can prioritize features involved in either univariate or multivariate feature interactions associated with outcome. We believe that it is important to have at least one 'interaction-sensitive' feature importance algorithm involved in feature selection prior such that relevant features involved in complex associations are not filtered out prior to modeling. The [paper below](https://www.sciencedirect.com/science/article/pii/S1532046418301400) is an introduction and review of Relief-based algorithms.
56
112
```
57
113
@article{urbanowicz2018relief,
58
114
title={Relief-based feature selection: Introduction and review},
@@ -64,7 +120,7 @@ One of the two feature importance algorithms used by STREAMLINE is MultiSURF, a
64
120
publisher={Elsevier}
65
121
}
66
122
```
67
-
This next published research paper compared a number of Relief-based algorithms and demonstrated best overall performance with MultiSURF out of all evaluated. This second paper also introduced 'ReBATE', a scikit-learn package of Releif-based feature importance/selection algorithms (used by STREAMLINE).
123
+
This [next published research paper](https://www.sciencedirect.com/science/article/pii/S1532046418301412) compared a number of Relief-based algorithms and demonstrated best overall performance with MultiSURF out of all evaluated. This second paper also introduced 'ReBATE', a scikit-learn package of Releif-based feature importance/selection algorithms (used by STREAMLINE).
68
124
```
69
125
@article{urbanowicz2018benchmarking,
70
126
title={Benchmarking relief-based feature selection methods for bioinformatics data mining},
@@ -78,7 +134,7 @@ This next published research paper compared a number of Relief-based algorithms
78
134
```
79
135
80
136
### Collective feature selection
81
-
Following feature importance estimation, STREAMLINE adopts an ensemble approach to determining which features to select. The utility of this kind of 'collective' feature selection, was introduced in the next publication.
137
+
Following feature importance estimation, STREAMLINE adopts an ensemble approach to determining which features to select. The utility of this kind of 'collective' feature selection, was introduced in the [next publication](https://link.springer.com/article/10.1186/s13040-018-0168-6).
82
138
```
83
139
@article{verma2018collective,
84
140
title={Collective feature selection to identify crucial epistatic variants},
@@ -93,7 +149,7 @@ Following feature importance estimation, STREAMLINE adopts an ensemble approach
93
149
```
94
150
95
151
### Learning classifier systems
96
-
STREAMLINE currently incorporates 15 ML classification modeling algorithms that can be run. Our own research has closely followed a subfield of evolutionary algorithms that discover a set of rules that collectively constitute a trained model. The appeal of such 'rule-based machine learning algorithms' (e.g. learning classifier systems) is that they can model complex associations while also offering human interpretable models. In the first paper below we introduced 'ExSTraCS', a learning classifier system geared towards bioinformatics data analysis. ExSTraCS was the first ML algorithm demonstrated to be able to tackle the long-standing 135-bit multiplexer problem directly, largely due to it's ability to use prior feature importance estimates from a Relief algorithm to guide the evolutionary rule search.
152
+
STREAMLINE currently incorporates 15 ML classification modeling algorithms that can be run. Our own research has closely followed a subfield of evolutionary algorithms that discover a set of rules that collectively constitute a trained model. The appeal of such 'rule-based machine learning algorithms' (e.g. learning classifier systems) is that they can model complex associations while also offering human interpretable models. In the [first paper below](https://link.springer.com/article/10.1007/s12065-015-0128-8) we introduced 'ExSTraCS', a learning classifier system geared towards bioinformatics data analysis. ExSTraCS was the first ML algorithm demonstrated to be able to tackle the long-standing 135-bit multiplexer problem directly, largely due to it's ability to use prior feature importance estimates from a Relief algorithm to guide the evolutionary rule search.
97
153
```
98
154
@article{urbanowicz2015exstracs,
99
155
title={ExSTraCS 2.0: description and evaluation of a scalable learning classifier system},
@@ -106,7 +162,8 @@ STREAMLINE currently incorporates 15 ML classification modeling algorithms that
106
162
publisher={Springer}
107
163
}
108
164
```
109
-
In the next published pre-print we introduced a scikit-learn implementation of ExSTraCS (used by STREAMLINE) as well as a pipeline (LCS-DIVE) to take ExSTraCS output and characterize different patterns association between features and outcome. Future work will demonstrate how STREAMLINE can be linked with LCS-DIVE to better understand the relationship between features and outcome captured by rule-based modeling.
165
+
166
+
In the [next published pre-print](https://arxiv.org/abs/2104.12844) we introduced a scikit-learn implementation of ExSTraCS (used by STREAMLINE) as well as a pipeline (LCS-DIVE) to take ExSTraCS output and characterize different patterns association between features and outcome. Future work will demonstrate how STREAMLINE can be linked with LCS-DIVE to better understand the relationship between features and outcome captured by rule-based modeling.
110
167
```
111
168
@article{zhang2021lcs,
112
169
title={LCS-DIVE: An Automated Rule-based Machine Learning Visualization Pipeline for Characterizing Complex Associations in Classification},
@@ -115,7 +172,8 @@ In the next published pre-print we introduced a scikit-learn implementation of E
115
172
year={2021}
116
173
}
117
174
```
118
-
In the next publication we introduced the first scikit-learn compatible implementation of an LCS algorithm. Specifically this paper implemented eLCS, an educational learning classifier system. This eLCS algorithm is a direct descendant of the UCS algorithm.
175
+
176
+
In the [next publication](https://dl.acm.org/doi/abs/10.1145/3377929.3398097) we introduced the first scikit-learn compatible implementation of an LCS algorithm. Specifically this paper implemented eLCS, an educational learning classifier system. This eLCS algorithm is a direct descendant of the UCS algorithm.
@@ -125,7 +183,8 @@ In the next publication we introduced the first scikit-learn compatible implemen
125
183
year={2020}
126
184
}
127
185
```
128
-
eLCS was originally developed as a very simple supervised learning LCS implementation primarily as an educational resource pairing with the following published textbook.
186
+
187
+
eLCS was originally developed as a very simple supervised learning LCS implementation primarily as an educational resource pairing with the following [published textbook](https://books.google.com/books?hl=en&lr=&id=C6QxDwAAQBAJ&oi=fnd&pg=PR5&dq=Introduction+to+learning+classifier+systems&ots=pTcnuuYQPE&sig=wNgZmWkcne9m3LQgDzuBu30uQ1Y#v=onepage&q=Introduction%20to%20learning%20classifier%20systems&f=false).
129
188
```
130
189
@book{urbanowicz2017introduction,
131
190
title={Introduction to learning classifier systems},
0 commit comments