Skip to content

Commit f3268ed

Browse files
Merge pull request #281 from ncsa/develop
Citations and JOSS paper
2 parents 71c775d + 47efa7a commit f3268ed

5 files changed

Lines changed: 513 additions & 2 deletions

File tree

.github/workflows/paper-pdf.yml

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
name: Paper PDF
2+
on:
3+
push:
4+
paths:
5+
- paper.md
6+
- paper.bib
7+
- .github/workflows/paper-pdf.yml
8+
9+
jobs:
10+
paper:
11+
runs-on: ubuntu-latest
12+
name: Paper Draft
13+
steps:
14+
- name: Checkout
15+
uses: actions/checkout@v4
16+
- name: Open Journals PDF Generator
17+
uses: openjournals/openjournals-draft-action@v.1.0
18+
with:
19+
journal: joss
20+
# This is the path to the paper within the repo
21+
paper-path: paper.md
22+
- name: Upload
23+
uses: actions/upload-artifact@v4
24+
with:
25+
name: paper
26+
# This is the output path where Pandoc will write the compiled
27+
# PDF (should be the same directory as the input paper.md)
28+
path: paper.pdf
29+
30+
- name: Debug list files
31+
run: ls -R
32+
# Double-check to find the actual path of the generated PDF

CITATION.cff

Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
cff-version: "1.2.0"
2+
authors:
3+
- family-names: Allen
4+
given-names: Joshua M.
5+
orcid: "https://orcid.org/0009-0008-0002-5239"
6+
- email: krg3@uic.edu
7+
family-names: Gandhi
8+
given-names: Keshav R.
9+
orcid: "https://orcid.org/0009-0000-1718-1862"
10+
- family-names: Alhazmy
11+
given-names: Raghid
12+
orcid: "https://orcid.org/0009-0005-6614-7050"
13+
- family-names: Wasnik
14+
given-names: Yash
15+
orcid: "https://orcid.org/0009-0006-0108-7445"
16+
- email: cfliege2@illinois.edu
17+
family-names: Fliege
18+
given-names: Christina E.
19+
orcid: "https://orcid.org/0000-0001-8085-779X"
20+
contact:
21+
- email: krg3@uic.edu
22+
family-names: Gandhi
23+
given-names: Keshav R.
24+
orcid: "https://orcid.org/0009-0000-1718-1862"
25+
- email: cfliege2@illinois.edu
26+
family-names: Fliege
27+
given-names: Christina E.
28+
orcid: "https://orcid.org/0000-0001-8085-779X"
29+
doi: 10.5281/zenodo.19662441
30+
message: If you use this software, please cite both the original publication of NEAT in PLOS One (2016) as well as our
31+
article from the Journal of Open Source Software (2026). You can find both citations at the top of README.md, with
32+
DOIs provided that link to the original papers.
33+
preferred-citation:
34+
authors:
35+
- family-names: Allen
36+
given-names: Joshua M.
37+
orcid: "https://orcid.org/0009-0008-0002-5239"
38+
- email: krg3@uic.edu
39+
family-names: Gandhi
40+
given-names: Keshav R.
41+
orcid: "https://orcid.org/0009-0000-1718-1862"
42+
- family-names: Alhazmy
43+
given-names: Raghid
44+
orcid: "https://orcid.org/0009-0005-6614-7050"
45+
- family-names: Wasnik
46+
given-names: Yash
47+
orcid: "https://orcid.org/0009-0006-0108-7445"
48+
- email: cfliege2@illinois.edu
49+
family-names: Fliege
50+
given-names: Christina E.
51+
orcid: "https://orcid.org/0000-0001-8085-779X"
52+
date-published: 2026-05-04
53+
doi: 10.21105/joss.09056
54+
issn: 2475-9066
55+
issue: 121
56+
journal: Journal of Open Source Software
57+
publisher:
58+
name: Open Journals
59+
start: 9056
60+
title: "Enhancing short-read sequencing simulation: Updates to NEAT"
61+
type: article
62+
url: "https://joss.theoj.org/papers/10.21105/joss.09056"
63+
volume: 11
64+
title: "Enhancing short-read sequencing simulation: Updates to NEAT"

README.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,9 +16,13 @@ Developing and validating bioinformatics pipelines depends on access to genomic
1616

1717
NEAT is a fine-grained read simulator that simulates real-looking data using models learned from specific datasets. It was originally designed to simulate short reads and is adaptable to different machines, with custom error models and the capability to handle single-base substitutions, indel errors, and other types of mutations. Unlike simulators that rely solely on fixed error profiles, NEAT can learn empirical mutation and sequencing models from real datasets and use these models to generate realistic sequencing data, providing outputs in several common file formats (e.g., FASTQ, BAM, and VCF). There are several supporting utilities for generating models used for simulation and for comparing the outputs of alignment and variant calling to the golden BAM and golden VCF produced by NEAT.
1818

19-
To cite this work, please use:
19+
To cite this work, please use both of the following:
2020

21-
> Stephens, Z. D., Hudson, M. E., Mainzer, L. S., Taschuk, M., Weber, M. R., & Iyer, R. K. (2016). Simulating Next-Generation Sequencing Datasets from Empirical Mutation and Sequencing Models. _PLOS ONE_, _11_(11), e0167047. https://doi.org/10.1371/journal.pone.0167047
21+
1. > Stephens, Z. D., Hudson, M. E., Mainzer, L. S., Taschuk, M., Weber, M. R., & Iyer, R. K. (2016). Simulating Next-Generation Sequencing Datasets from Empirical Mutation and Sequencing Models. _PLOS ONE_, _11_(11), e0167047. https://doi.org/10.1371/journal.pone.0167047
22+
23+
2. > Allen, J. M., Gandhi, K. R., Alhazmy, R., Wasnik, Y., & Fliege, C. E. (2026). Enhancing next-generation sequencing simulation: Updates to NEAT. _Journal of Open Source Software_, _11_(121), 9056. https://doi.org/10.21105/joss.09056
24+
25+
[![DOI](https://joss.theoj.org/papers/10.21105/joss.09056/status.svg)](https://doi.org/10.21105/joss.09056)
2226

2327
## Table of Contents
2428

paper.bib

Lines changed: 270 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,270 @@
1+
@article{Stephens:2016,
2+
author = {Stephens, Z. D. and Hudson, M. E. and Mainzer, L. S. and Taschuk, M. and Weber, M. R. and Iyer, R. K.},
3+
title = {Simulating Next-Generation Sequencing Datasets from Empirical Mutation and Sequencing Models},
4+
journal = {PLOS ONE},
5+
year = {2016},
6+
volume = {11},
7+
number = {11},
8+
pages = {e0167047},
9+
doi = {10.1371/journal.pone.0167047}
10+
}
11+
12+
@article{Benjamini:2012,
13+
author = {Benjamini, Y. and Speed, T. P.},
14+
title = {Summarizing and correcting the {GC} content bias in high-throughput sequencing},
15+
journal = {Nucleic Acids Research},
16+
year = {2012},
17+
volume = {40},
18+
number = {10},
19+
pages = {e72},
20+
doi = {10.1093/nar/gks001}
21+
}
22+
23+
@article{Ross:2013,
24+
author = {Ross, M. G. and Russ, C. and Costello, M. and Hollinger, A. and Lennon, N. J. and Hegarty, R. and Nusbaum, C. and Jaffe, D. B.},
25+
title = {Characterizing and measuring bias in sequence data},
26+
journal = {Genome Biology},
27+
year = {2013},
28+
volume = {14},
29+
number = {5},
30+
pages = {R51},
31+
doi = {10.1186/gb-2013-14-5-r51}
32+
}
33+
34+
@article{Escalona:2016,
35+
author = {Escalona, M. and Rocha, S. and Posada, D.},
36+
title = {A comparison of tools for the simulation of genomic next-generation sequencing data},
37+
journal = {Nature Reviews Genetics},
38+
year = {2016},
39+
volume = {17},
40+
number = {8},
41+
pages = {459--469},
42+
doi = {10.1038/nrg.2016.57}
43+
}
44+
45+
@article{Zhao:2017,
46+
author = {Zhao, M. and Liu, D. and Qu, H.},
47+
title = {Systematic review of next-generation sequencing simulators: computational tools, features and perspectives},
48+
journal = {Briefings in Functional Genomics},
49+
year = {2017},
50+
volume = {16},
51+
number = {3},
52+
pages = {121--128},
53+
doi = {10.1093/bfgp/elw012}
54+
}
55+
56+
@article{Alosaimi:2020,
57+
author = {Alosaimi, Shatha and Bandiang, Armand and {van Biljon}, Noelle and Awany, Denis and Thami, Prisca K. and Tchamga, Milaine S. S. and Kiran, Anmol and Messaoud, Olfa and Hassan, Radia Ismaeel Mohammed and Mugo, Jacquiline and Ahmed, Azza and Bope, Christian D. and Allali, Imane and Mazandu, Gaston K. and Mulder, Nicola J. and Chimusa, Emile R.},
58+
title = {A broad survey of {DNA} sequence data simulation tools},
59+
journal = {Briefings in Functional Genomics},
60+
year = {2020},
61+
volume = {19},
62+
number = {1},
63+
pages = {49--59},
64+
doi = {10.1093/bfgp/elz033}
65+
}
66+
67+
@article{Milhaven:2023,
68+
author = {Milhaven, M. and Pfeifer, S. P.},
69+
title = {Performance evaluation of six popular short-read simulators},
70+
journal = {Heredity},
71+
year = {2023},
72+
volume = {130},
73+
pages = {55--63},
74+
doi = {10.1038/s41437-022-00577-3}
75+
}
76+
77+
@article{Schmeing:2021,
78+
author = {Schmeing, S. and Robinson, M. D.},
79+
title = {{ReSeq} simulates realistic {Illumina} high-throughput sequencing data},
80+
journal = {Genome Biology},
81+
year = {2021},
82+
volume = {22},
83+
pages = {67},
84+
doi = {10.1186/s13059-021-02265-7}
85+
}
86+
87+
@article{Huang:2012,
88+
author = {Huang, W. and Li, L. and Myers, J. R. and Marth, G. T.},
89+
title = {{ART}: a next-generation sequencing read simulator},
90+
journal = {Bioinformatics},
91+
year = {2012},
92+
volume = {28},
93+
number = {4},
94+
pages = {593--594},
95+
doi = {10.1093/bioinformatics/btr708}
96+
}
97+
98+
@article{Caboche:2014,
99+
author = {Caboche, S. and Audebert, C. and Lemoine, Y. and Hot, D.},
100+
title = {Comparison of mapping algorithms used in high-throughput sequencing: application to {Ion Torrent} data},
101+
journal = {BMC Genomics},
102+
year = {2014},
103+
volume = {15},
104+
pages = {264},
105+
doi = {10.1186/1471-2164-15-264}
106+
}
107+
108+
@misc{Homer:2010,
109+
author = {Homer, N.},
110+
title = {{DWGSIM}: Whole Genome Simulator for Next-Generation Sequencing},
111+
year = {2010},
112+
howpublished = {GitHub repository},
113+
url = {https://github.com/nh13/DWGSIM}
114+
}
115+
116+
@article{McElroy:2012,
117+
author = {McElroy, K. E. and Luciani, F. and Thomas, T.},
118+
title = {{GemSIM}: general, error-model based simulator of next-generation sequencing data},
119+
journal = {BMC Genomics},
120+
year = {2012},
121+
volume = {13},
122+
pages = {74},
123+
doi = {10.1186/1471-2164-13-74}
124+
}
125+
126+
@article{Gourle:2019,
127+
author = {Gourl{\'e}, H. and Karlsson-Lindsj{\"o}, O. and Hayer, J. and Bongcam-Rudloff, E.},
128+
title = {Simulating {Illumina} metagenomic data with {InSilicoSeq}},
129+
journal = {Bioinformatics},
130+
year = {2019},
131+
volume = {35},
132+
number = {3},
133+
pages = {521--522},
134+
doi = {10.1093/bioinformatics/bty630}
135+
}
136+
137+
@misc{Holtgrewe:2010,
138+
author = {Holtgrewe, M.},
139+
title = {Mason -- {A} Read Simulator for Second Generation Sequencing Data},
140+
year = {2010},
141+
howpublished = {Technical Report TR-B-10-06},
142+
institution = {Freie Universit{\"a}t Berlin, Fachbereich Mathematik und Informatik},
143+
number = {B-10-06},
144+
url = {https://publications.imp.fu-berlin.de/962/2/mason201009.pdf}
145+
}
146+
147+
@article{Hu:2012,
148+
author = {Hu, Xuesong and Yuan, Jianying and Shi, Yujian and Lu, Jianliang and Liu, Binghang and Li, Zhenyu and Chen, Yanxiang and Mu, Desheng and Zhang, Hao and Li, Nan and Yue, Zhen and Bai, Fan and Li, Heng and Fan, Wei},
149+
title = {{pIRS}: Profile-based {Illumina} paired-end reads simulator},
150+
journal = {Bioinformatics},
151+
year = {2012},
152+
volume = {28},
153+
number = {11},
154+
pages = {1533--1535},
155+
doi = {10.1093/bioinformatics/bts187}
156+
}
157+
158+
@article{Pattnaik:2014,
159+
author = {Pattnaik, S. and Gupta, S. and Rao, A. A. and Panda, B.},
160+
title = {{SInC}: an accurate and fast error-model based simulator for {SNPs}, {Indels} and {CNVs} coupled with a read generator for short-read sequence data},
161+
journal = {BMC Bioinformatics},
162+
year = {2014},
163+
volume = {15},
164+
pages = {40},
165+
doi = {10.1186/1471-2105-15-40}
166+
}
167+
168+
@misc{Li:2011,
169+
author = {Li, Heng},
170+
title = {{wgsim}-{Read} simulator for next generation sequencing},
171+
year = {2011},
172+
howpublished = {GitHub repository},
173+
url = {https://github.com/lh3/wgsim}
174+
}
175+
176+
@article{Rhie:2023,
177+
author = {Rhie, Arang and Nurk, Sergey and Cechova, Monika and Hoyt, Savannah J. and Taylor, Dylan J. and Altemose, Nicolas and Hook, Paul W. and Koren, Sergey and Rautiainen, Mikko and Alexandrov, Ivan A. and Allen, Jamie and Asri, Mobin and Bzikadze, Andrey V. and Chen, Nae-Chyun and Chin, Chen-Shan and Diekhans, Mark and Flicek, Paul and Formenti, Giulio and Fungtammasan, Arkarachai and Giron, Carlos Garcia and Garrison, Erik and Gershman, Ariel and Gerton, Jennifer L. and Grady, Patrick G. S. and Guarracino, Andrea and Haggerty, Leanne and Halabian, Reza and Hansen, Nancy F. and Harris, Robert and Hartley, Gabrielle A. and Harvey, William T. and Haukness, Marina and Heinz, Jakob and Hourlier, Thibaut and Hubley, Robert M. and Hunt, Sarah E. and Hwang, Stephen and Jain, Miten and Kesharwani, Rupesh K. and Lewis, Alexandra P. and Li, Heng and Logsdon, Glennis A. and Lucas, Julian K. and Makalowski, Wojciech and Markovic, Christopher and Martin, Fergal J. and {Mc Cartney}, Ann M. and McCoy, Rajiv C. and McDaniel, Jennifer and McNulty, Brandy M. and Medvedev, Paul and Mikheenko, Alla and Munson, Katherine M. and Murphy, Terence D. and Olsen, Hugh E. and Olson, Nathan D. and Paulin, Luis F. and Porubsky, David and Potapova, Tamara and Ryabov, Fedor and Salzberg, Steven L. and Sauria, Michael E. G. and Sedlazeck, Fritz J. and Shafin, Kishwar and Shepelev, Valery A. and Shumate, Alaina and Storer, Jessica M. and Surapaneni, Likhitha and {Taravella Oill}, Angela M. and Thibaud-Nissen, Fran\c{c}oise and Timp, Winston and Tomaszkiewicz, Marta and Vollger, Mitchell R. and Walenz, Brian P. and Watwood, Allison C. and Weissensteiner, Matthias H. and Wenger, Aaron M. and Wilson, Melissa A. and Zarate, Samantha and Zhu, Yiming and Zook, Justin M. and Eichler, Evan E. and O'Neill, Rachel J. and Schatz, Michael C. and Miga, Karen H. and Makova, Kateryna D. and Phillippy, Adam M.},
178+
title = {The complete sequence of a human {Y} chromosome},
179+
journal = {Nature},
180+
year = {2023},
181+
volume = {621},
182+
number = {7978},
183+
pages = {344--354},
184+
doi = {10.1038/s41586-023-06457-y}
185+
}
186+
187+
@article{Lefouili:2022,
188+
author = {Lefouili, M. and Nam, K.},
189+
title = {The evaluation of {Bcftools} mpileup and {GATK} {HaplotypeCaller} for variant calling in non-human species},
190+
journal = {Scientific Reports},
191+
year = {2022},
192+
volume = {12},
193+
pages = {11331},
194+
doi = {10.1038/s41598-022-15563-2}
195+
}
196+
197+
@article{Zhao:2020,
198+
author = {Zhao, S. and Agafonov, O. and Azab, A. and Stokowy, T. and Hovig, E.},
199+
title = {Accuracy and efficiency of germline variant calling pipelines for human genome data},
200+
journal = {Scientific Reports},
201+
year = {2020},
202+
volume = {10},
203+
pages = {20222},
204+
doi = {10.1038/s41598-020-77218-4}
205+
}
206+
207+
@article{Ahmed:2019,
208+
author = {Ahmed, A. E. and Heldenbrand, J. and Asmann, Y. and Fadlelmola, F. M. and Katz, D. S. and Kendig, K. and Kendzior, M. C. and Li, T. and Ren, Y. and Rodriguez, E. and Weber, M. R. and Wozniak, J. M. and Zermeno, J. and Mainzer, L. S.},
209+
title = {Managing genomic variant calling workflows with {Swift/T}},
210+
journal = {PLOS ONE},
211+
year = {2019},
212+
volume = {14},
213+
number = {7},
214+
pages = {e0211608},
215+
doi = {10.1371/journal.pone.0211608}
216+
}
217+
218+
@article{Kendig:2019,
219+
author = {Kendig, Katherine I. and Baheti, Saurabh and Bockol, Matthew A. and Drucker, Travis M. and Hart, Steven N. and Heldenbrand, Jacob R. and Hernaez, Mikel and Hudson, Matthew E. and Kalmbach, Michael T. and Klee, Eric W. and Mattson, Nathan R. and Ross, Christian A. and Taschuk, Morgan and Wieben, Eric D. and Wiepert, Mathieu and Wildman, Derek E. and Mainzer, Liudmila S.},
220+
title = {Sentieon {DNASeq} Variant Calling Workflow Demonstrates Strong Computational Performance and Accuracy},
221+
journal = {Frontiers in Genetics},
222+
year = {2019},
223+
volume = {10},
224+
pages = {736},
225+
doi = {10.3389/fgene.2019.00736}
226+
}
227+
228+
@article{RuizSchultz:2021,
229+
author = {{Ruiz-Schultz}, Nicole and Sant, David and Norcross, Stevie and Dansithong, Warunee and Hart, Kim and Asay, Bryce and Little, Jordan and Chung, Krystal and Oakeson, Kelly F. and Young, Erin L. and Eilbeck, Karen and Rohrwasser, Andreas},
230+
title = {Methods and feasibility study for exome sequencing as a universal second-tier test in newborn screening},
231+
journal = {Genetics in Medicine},
232+
year = {2021},
233+
volume = {23},
234+
number = {4},
235+
pages = {767--776},
236+
doi = {10.1038/s41436-020-01058-w}
237+
}
238+
239+
@article{Jandrasits:2019,
240+
author = {Jandrasits, C. and Kr{\"o}ger, S. and Haas, W. and Renard, B. Y.},
241+
title = {Computational pan-genome mapping and pairwise {SNP}-distance improve detection of {\textit{Mycobacterium tuberculosis}} transmission clusters},
242+
journal = {PLOS Computational Biology},
243+
year = {2019},
244+
volume = {15},
245+
number = {12},
246+
pages = {e1007527},
247+
doi = {10.1371/journal.pcbi.1007527}
248+
}
249+
250+
@article{Shah:2021,
251+
author = {Shah, R. N. and Ruthenburg, A. J.},
252+
title = {Sequence deeper without sequencing more: Bayesian resolution of ambiguously mapped reads},
253+
journal = {PLOS Computational Biology},
254+
year = {2021},
255+
volume = {17},
256+
number = {4},
257+
pages = {e1008926},
258+
doi = {10.1371/journal.pcbi.1008926}
259+
}
260+
261+
@article{Delhomme:2020,
262+
author = {Delhomme, Tiffany M. and Avogbe, Patrice H. and Gabriel, Aur{\'e}lie A. G. and Alcala, Nicolas and Leblay, Noemie and Voegele, Catherine and Vall{\'e}e, Maxime and Chopard, Priscilia and Chabrier, Am{\'e}lie and Abedi-Ardekani, Behnoush and Gaborieau, Val{\'e}rie and Holcatova, Ivana and Janout, Vladimir and Foretov{\'a}, Lenka and Milosavljevic, Sasa and Zaridze, David and Mukeriya, Anush and Brambilla, Elisabeth and Brennan, Paul and Scelo, Ghislaine and Fernandez-Cuesta, Lynnette and Byrnes, Graham and Calvez-Kelm, Florence L. and McKay, James D. and Foll, Matthieu},
263+
title = {Needlestack: an ultra-sensitive variant caller for multi-sample next generation sequencing data},
264+
journal = {NAR Genomics and Bioinformatics},
265+
year = {2020},
266+
volume = {2},
267+
number = {2},
268+
pages = {lqaa021},
269+
doi = {10.1093/nargab/lqaa021}
270+
}

0 commit comments

Comments
 (0)