You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Add --exome mode
- new --exome specific options in ntroot, rules in
ntroot_run_pipeline.smk
- Updated README.md
- Added exome tests to demo
* Update help page
* Add log message for --exome and --masked
* Update README.md
* Update README.md
* Adjust cutoff parameter setting for exome mode
- In case we ever want the option to set --solid, allow this in smk
- In practice, cutoff is always set, so will not happen with current
ntroot driver script
--reads READS Prefix of input reads file(s) for detecting SNVs. All files in the working directory with the specified prefix will be used. (fastq, fasta, gz, bz, zip)
69
69
--genome GENOME [GENOME ...]
70
70
Genome assembly file(s) for detecting SNVs compared to --reference
71
71
-l L input IVC VCF file with annotated variants (e.g., 1000GP_integrated_snv_v2a_27022019.GRCh38.phased_gt1.vcf.gz, clinvar.vcf, etc.)
72
+
--exome Input reads for detecting SNVs are from whole exome sequencing. If provided, must also specify either --exome_bed or --masked. --cutoff 2 is implied unless otherwise specified.
72
73
-k K k-mer size
73
74
--tile TILE Tile size for ancestry fraction inference (bp) [default=5000000]
74
75
--lai Output ancestry predictons per tile in a separate output file
75
76
-t T Number of threads [default=4]
76
77
-z Z Minimum contig length [default=100]
77
78
-j J controls size of k-mer subset. When checking subset of k-mers, check every jth k-mer [default=3]
79
+
--cutoff CUTOFF Minimum coverage of k-mers in ntEdit Bloom filter. Solid k-mers are used if set to 0 [0]
78
80
-Y Y Ratio of number of k-mers in the k subset that should be present to accept an edit (higher=stringent) [default=0.55]
79
81
--custom_vcf CUSTOM_VCF
80
82
Input VCF for computing ancestry. When specified, ntRoot will skip the ntEdit step, and predict ancestry from the provided VCF.
83
+
--masked Exome Mode (--exome) only: Indicates that the reference genome provided with --reference has all NON-targeted exonic regions hard-masked.
84
+
--exome_bed EXOME_BED
85
+
Exome Mode (--exome) only: BED file of exome targeted regions.
81
86
--strip_info When using --custom_vcf, strip the existing INFO field from the input VCF.
82
87
-v, --verbose Verbose mode [default=False]
83
88
-V, --version show program's version number and exit
@@ -117,6 +122,7 @@ GRCh38.fa.gz
117
122
readme
118
123
</pre>
119
124
125
+
**Running ntRoot with whole genome sequencing reads or genome assemblies**
If your input reads are from whole exome sequencing, the regions of your reference genome that are NOT targeted exonic regions should be hard-masked (converted to Ns):
ntRoot can perform the masking automatically if you do not already have a masked reference file. In that case, provide a BED file with all the targeted regions, and ntRoot will use bedtools to mask the reference regions that are NOT targeted regions:
0 commit comments