You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
*[Note on Sensitive Patient Data](#note-on-sensitive-patient-data)
39
-
39
+
*[Note on Sensitive Patient Data](#note-on-sensitive-patient-data)
40
40
41
41
## Requirements (the most up-to-date requirements are found in the environment.yml file)
42
42
43
43
* Some version of Anaconda to set up the environment
44
44
* Python == 3.10.*
45
45
* poetry == 1.3.*
46
46
* biopython == 1.79
47
+
* samtools == 1.20
47
48
* pkginfo
48
49
* matplotlib
49
50
* numpy
@@ -103,6 +104,8 @@ A config file is required. The config is a yml file specifying the input paramet
103
104
description of the potential inputs in the config file. See NEAT/config_template/template_neat_config.yml for a
104
105
template config file to copy and use for your runs.
105
106
107
+
To run the simulator in parallel with the same config file and significantly speed up runtime, please see the [Parallelization](#parallelization) section.
108
+
106
109
reference: full path to a fasta file to generate reads from
107
110
read_len: The length of the reads for the fastq (if using). Integer value, default 101.
108
111
coverage: desired coverage value. Float or int, default = 10
@@ -283,6 +286,51 @@ neat read-simulator \
283
286
# Utilities
284
287
Several scripts are distributed with gen_reads that are used to generate the models used for simulation.
285
288
289
+
## neat parallel
290
+
291
+
Runs NEAT’s read simulator across a split reference (by contig or by fixed chunk size), in parallel, and stitches the outputs into final FASTQ/BAM/VCF.
292
+
293
+
### Commands:
294
+
295
+
Minimal: all settings come from a single YAML config
296
+
```
297
+
neat parallel -c /path/to/config.yml
298
+
```
299
+
300
+
Override or supplement a few options on the CLI
301
+
```
302
+
neat parallel -c /path/to/config.yml \
303
+
--outdir run1 --by size --size 500000 --jobs 8
304
+
```
305
+
306
+
neat parallel reads the same config you use for neat read-simulator and also looks for these parallelization keys at the top level:
307
+
308
+
```
309
+
# required unless you pass --outdir on the CLI
310
+
outdir: /absolute/or/relative/path/for/this_run
311
+
312
+
# stitched outputs live under outdir; relative values are resolved under outdir
313
+
final_prefix: stitched/final # default if omitted: stitched/final
314
+
315
+
# how to split the reference (size recommended)
316
+
by: contig # values: contig | size
317
+
size: 1000000 # used only when by: size
318
+
319
+
# parallel execution
320
+
jobs: 8 # default: CPU count
321
+
322
+
# how to invoke the simulator
323
+
neat_cmd: neat read-simulator # default
324
+
325
+
# external tool for stitching BAMs
326
+
samtools: samtools # default, must be on PATH
327
+
328
+
# organization
329
+
cleanup_splits: false # delete outdir/splits after stitch
330
+
reuse_splits: false # reuse existing splits if present
331
+
```
332
+
333
+
286
334
## neat model-fraglen
287
335
288
336
Computes empirical fragment length distribution from sample data.
@@ -344,17 +392,6 @@ neat model-seq-err \
344
392
345
393
Please note that -i2 can be used in place of -i to produce paired data.
346
394
347
-
## neat plot_mutation_model
348
-
349
-
Performs plotting and comparison of mutation models generated from genMutModel.py (Not yet implemented in NEAT 4.0).
ICGC's "Access Controlled Data" documentation can be found at <a href = https://docs.icgc.org/portal/access/ target="_blank">https://docs.icgc.org/portal/access/</a>. To have access to controlled germline data, a DACO must be submitted. Open tier data can be obtained without a DACO, but germline alleles that do not match the reference genome are masked and replaced with the reference allele. Controlled data includes unmasked germline alleles.
420
+
ICGC's "Access Controlled Data" documentation can be found at <a href = https://docs.icgc.org/portal/access/ target="_blank">https://docs.icgc.org/portal/access/</a>. To have access to controlled germline data, a DACO must be submitted. Open tier data can be obtained without a DACO, but germline alleles that do not match the reference genome are masked and replaced with the reference allele. Controlled data includes unmasked germline alleles.
0 commit comments