Skip to content

Commit aba49df

Browse files
committed
updated documentation
1 parent 452a2eb commit aba49df

3 files changed

Lines changed: 35 additions & 65 deletions

File tree

ChangeLog.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,9 @@
11
# NEAT has a new home
22
NEAT is now a part of the NCSA github and active development will continue here. Please direct issues, comments, and requests to the NCSA issue tracker. Submit pull requests here insead of the old repo.
33

4+
# NEAT v4.3.1
5+
- Updated parallel module to integrate it into the code more fluidly. We also updated the options section to revise the process and allow for copying of options objects for parallelism run.
6+
47
# NEAT v4.3
58
- Added a parallelization module to run NEAT in parallel. We expect this to speed up times. Please let us know if it works for you!
69

README.md

Lines changed: 31 additions & 64 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,14 @@
1-
# The NEAT Project v4.3
2-
Welcome to the NEAT project, the NExt-generation sequencing Analysis Toolkit, version 4.2. This beta release of NEAT 4.0 includes several fixes and a little bit of restructuring. There is still lots of work to be done. See the [ChangeLog](ChangeLog.md) for notes. We may add that in as a feature in the future, if users call for it. We also removed GC bias for now. It severely complicated implementation, and had very few noticeable effects. After discussing with some people at the Illinois Institute for Genomic Biology, it sounded like GC bias may be a bit of a non-factor with improved chemistries. NEAT 4.0 represents the direction we would like to move the code, but unfortunately we ran into several issues in production, notably the very long processing times, that make it unviable for general use. If you would like to try NEAT 4.0, please do! If you run into issues, please post them on our issues page.
1+
# The NEAT Project v4.3.1
2+
Welcome to the NEAT project, the NExt-generation sequencing Analysis Toolkit, version 4.3.1. This release of NEAT 4.3.1 includes several fixes and a little bit of restructuring, including a parallel process for running NEAT read-simulator. Our tests show much improved performance. If the logs seem execssive, you might try using the `--log-level ERROR` to reduce the output from the logs. See the [ChangeLog](ChangeLog.md) for notes. NEAT 4.3.1 is the official release of NEAT 4.0. It represents a lot of hard work from several contributors at NCSA and beyond. With the addition of parallel processing, we feel that the code is ready for production, and future releases will focus on compatability, bug fixes, and testing. Future releases for the time being will be enumerations of 4.3.X
33

44
# NEAT v4.3
5-
If you would like to try our newest features in NEAT, we have now added a parallelization module that will allow you to run NEAT in a parrallel process that will split your chromosome up by contig or by blocks of sequence. This code still may have bugs, for which we apologize, but the more people who try it out, the more we can improve the software. If you need worry-free operation, then please try NEAT 3.4.
5+
Neat 4.3.1 servel as the officially 'complete' version of NEAT 4.3, implementing parallelization. To add parallelization to you run, simply add the "threads" parameter in your configuration and run read-simulator as normal. NEAT will take care of the rest. You can customize the parameters in you configuration file, as needed.
66

7-
# NEAT 3.4 - Stable
8-
NEAT 3.4 under "releases" is the stable version of NEAT, most closely following the original NEAT genReads 2.0. NEAT 4.0 ran into several production problems, including very slow runtimes on larger genomes, so we have decided to switch back to NEAT 3.4 as the default release while we try to improve NEAT 4.0. If you are cloning the repo, you can checkout tag 3.4 `git checkout 3.4` within the NEAT repo. We are also working on redeveloping NEAT in Rust, a memory and thread safe language that will lend itself well to the way NEAT works, check that out here: https://github.com/ncsa/rusty-neat
9-
10-
Stay tuned over the coming weeks for exciting updates to NEAT, and learn how to [contribute](CONTRIBUTING.md) yourself. If you'd like to use some of our code, no problem! Just review the [license](LICENSE.md), first.
7+
We have completed major revisions on NEAT since 3.4 and consider NEAT 4.3.1 to be a stable release. We will consider new features and pull requests. Please include justification for major changes. See [contribute](CONTRIBUTING.md) for more information. If you'd like to use some of our code in your own, no problem! Just review the [license](LICENSE.md), first.
118

129
NEAT's read-simulator is a fine-grained read simulator. It simulates real-looking data using models learned from specific datasets. There are several supporting utilities for generating models used for simulation and for comparing the outputs of alignment and variant calling to the golden BAM and golden VCF produced by NEAT.
1310

14-
This is release v4.2 of the software. While it has been tested, it does represent a shift in the software with the introduction of a configuration file. For a stable release using the old command line interface, please see: [NEAT 3.0](https://github.com/ncsa/NEAT/releases/tag/3.3) (or check out older tagged releases)
11+
We've deprecated NEAT's command-line interface options for the most part, opting to simplify things with configuration files. If you require the CLI for legacy purposes, NEAT 3.4 was our last release to be fully command-line interface. Please convert your CLI commands to the corresponding yaml configuration for future runs.
1512

1613
To cite this work, please use:
1714

@@ -62,30 +59,33 @@ use the poetry module in build a wheel file, which can then be pip installed. Yo
6259
commands from within the NEAT directory.
6360

6461
```
65-
> conda env create -f environment.yml -n neat
66-
> conda activate neat
67-
> poetry build
68-
> pip install dist/neat*whl
62+
$ conda env create -f environment.yml -n neat
63+
$ conda activate neat
64+
$ poetry build
65+
$ pip install dist/neat*whl
6966
```
7067

68+
This allows you to run NEAT as a command line tool directly:
69+
`neat --help`
70+
7171
Alternatively, if you wish to work with NEAT in the development environment, you can use poetry install within
7272
the NEAT repo, after creating the conda environment:
7373
```
74-
> conda env create -f environment.yml -n neat
75-
> conda activate neat
76-
> poetry install
74+
$ conda env create -f environment.yml -n neat
75+
$ conda activate neat
76+
$ poetry install
7777
```
7878

7979
Notes: If any packages are struggling to resolve, check the channels and try to manually pip install the package to see if that helps (but note that NEAT is not tested on the pip versions.)
8080

8181
Test your install by running:
8282
```
83-
> neat --help
83+
$ neat --help
8484
```
8585

8686
You can also try running it using the python command directly:
8787
```
88-
> python -m neat --help
88+
$ python -m neat --help
8989
```
9090

9191
## Usage
@@ -138,6 +138,11 @@ The default is given:
138138
`mutation_bed`: full path to a list of regions with a column describing the mutation rate of that region, as a float with values between 0 and 0.3. The mutation rate must be in the third column as, e.g., mut_rate=0.00.
139139
`rng_seed`: Manually enter a seed for the random number generator. Used for repeating runs. _Must be an integer._
140140
`min_mutations`: Set the minimum number of mutations that NEAT should add, per contig. _Default is 0._ We recommend setting this to at least one for small chromosomes, so NEAT will produce at least one mutation per contig.
141+
'threads': Number of threads to use. More than 1 will activate parallel mode and perform part of the calclutations in parallel then recombine into the desired output files.
142+
'parallel_mode': 'size' or 'contig' whether to divide the contigs into blocks or just by contig. By contig is the default, try by size. Varying the parallel_block_size parameter may help if default values are not sufficient.
143+
'parallel_block_size': Default value of 500,000.
144+
'cleanup_splits': If running more than one simulation on the same input fasta, you can reuse splits files. By default, this will be set to False, and splits files will be deleted at the end of the run.
145+
'reuse_splits': If an existing splits file exists in the output folder, it will use those splits, if this value is set to True.
141146

142147
The command line options for NEAT are as follows:
143148

@@ -155,7 +160,8 @@ read-simulator command line options
155160
| Option | Description |
156161
|---------------------|-------------------------------------|
157162
| -c VALUE, --config VALUE | The VALUE should be the name of the config file to use for this run |
158-
| -o OUTPUT, --output OUTPUT | The path, including filename prefix, to use to write the output files |
163+
| -o OUTPUT_DIR, --output_dir OUTPUT_DIR | The path to the directory to write the output files |
164+
| -p PREFIX, --prefix PREFIX | The prefix for file names |
159165

160166
## Functionality
161167

@@ -199,7 +205,7 @@ fragment_st_dev: 30
199205
200206
neat read-simulator \
201207
-c neat_config.yml \
202-
-o /home/me/simulated_reads
208+
-o /home/me/simulated_reads/
203209
```
204210

205211
### Targeted region simulation
@@ -218,7 +224,7 @@ targed_bed: hg19_exome.bed
218224
219225
neat read-simulator \
220226
-c neat_config \
221-
-o /home/me/simulated_reads
227+
-o /home/me/simulated_reads/
222228
223229
```
224230

@@ -239,7 +245,7 @@ mutation_rate: 0
239245
240246
neat read-simulator \
241247
-c neat_config.yml \
242-
-o /home/me/simulated_reads
248+
-o /home/me/simulated_reads/
243249
```
244250

245251
### Single end reads
@@ -254,7 +260,8 @@ produce_vcf: True
254260
255261
neat read-simulator \
256262
-c neat_config.yml \
257-
-o /home/me/simulated_reads
263+
-o /home/me/simulated_read/
264+
-p 126_frags
258265
```
259266

260267
### Large single end reads
@@ -278,48 +285,8 @@ Several scripts are distributed with gen_reads that are used to generate the mod
278285

279286
## neat parallel
280287

281-
Runs NEAT’s read simulator across a split reference (by contig or by fixed chunk size), in parallel, and stitches the outputs into final FASTQ/BAM/VCF.
282-
283-
### Commands:
284-
285-
Minimal: all settings come from a single YAML config
286-
```
287-
neat parallel -c /path/to/config.yml
288-
```
289-
290-
Override or supplement a few options on the CLI
291-
```
292-
neat parallel -c /path/to/config.yml \
293-
--outdir run1 --by size --size 500000 --jobs 8
294-
```
295-
296-
neat parallel reads the same config you use for neat read-simulator and also looks for these parallelization keys at the top level:
297-
298-
```
299-
# required unless you pass --outdir on the CLI
300-
outdir: /absolute/or/relative/path/for/this_run
301-
302-
# stitched outputs live under outdir; relative values are resolved under outdir
303-
final_prefix: stitched/final # default if omitted: stitched/final
304-
305-
# how to split the reference (size recommended)
306-
by: contig # values: contig | size
307-
size: 1000000 # used only when by: size
308-
309-
# parallel execution
310-
jobs: 8 # default: CPU count
311-
312-
# how to invoke the simulator
313-
neat_cmd: neat read-simulator # default
314-
315-
# external tool for stitching BAMs
316-
samtools: samtools # default, must be on PATH
317-
318-
# organization
319-
cleanup_splits: false # delete outdir/splits after stitch
320-
reuse_splits: false # reuse existing splits if present
321-
```
322-
288+
Runs `neat read-simulator` across a split reference (by contig or by fixed chunk size), in parallel, and stitches the outputs into final FASTQ/BAM/VCF.
289+
To activate parallelism, set threads to a number greater than 1. By default, NEAT will parallelize across contigs. If you have many small contigs, this should bring good results. If you have imbalanced or smaller numbers of larger contigs, then try by block. The default size of 500000 gives good results on a variety of sets, but you can fine tune to your situation.
323290

324291
## neat model-fraglen
325292

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[tool.poetry]
22
name = "neat"
3-
version = "4.0"
3+
version = "4.3.1"
44
description = "NGS Simulation toolkit"
55
authors = ["Joshua Allen <jallen17@illinois.edu>"]
66
license = "BSD 3-Clause License"

0 commit comments

Comments
 (0)