You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/output.md
+60-8Lines changed: 60 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,12 +8,16 @@ The directories listed below will be created in the results directory after the
8
8
9
9
## Pipeline overview
10
10
11
-
The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes data using the following steps:
11
+
The pipeline is built using [Nextflow](https://www.nextflow.io/) and can generate output files from the following steps:
12
12
13
13
-[Seqtk](#seqtk) - Subsample a specific number of reads per sample
14
14
-[FastQC](#fastqc) - Raw read QC
15
15
-[SeqFu Stats](#seqfu_stats) - Statistics for FASTA or FASTQ files
16
16
-[FastQ Screen](#fastqscreen) - Mapping against a set of references for basic contamination QC
17
+
-[BWA-MEM2_INDEX](#bwamem2_index) - Create BWA-MEM2 index of a chosen reference genome OR use pre-built index
18
+
-[BWA-MEM2_MEM](#bwamem2_mem) - Mapping reads against a chosen reference genome
19
+
-[Samtools index](#samtools-index) - Index BAM files with Samtools
20
+
-[Picard collect multiple metrics](#picard-collect-multiple-metrics) - Combine BAM and BAI outputs for Picard
17
21
-[Picard collecthsmetrics](#picard-collecthsmetrics) - Collect alignment QC metrics of hybrid-selection data
18
22
-[MultiQC](#multiqc) - Aggregate report describing results and QC from the whole pipeline
19
23
-[Pipeline information](#pipeline-information) - Report metrics generated during the workflow execution
@@ -43,14 +47,27 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
43
47
44
48
[FastQC](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) gives general quality metrics about your sequenced reads. It provides information about the quality score distribution across your reads, per base sequence content (%A/T/G/C), adapter contamination and overrepresented sequences. For further reading and documentation see the [FastQC help pages](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/).
-`*_mqc.txt`: File containing the same quality metrics as the TSV file, ready to be read by MultiQC.
58
+
59
+
</details>
60
+
61
+
[SeqFu](https://telatin.github.io/seqfu2/) is general-purpose program to manipulate and parse information from FASTA/FASTQ files, supporting gzipped input files. Includes functions to interleave and de-interleave FASTQ files, to rename sequences and to count and print statistics on sequence lengths. In this pipeline, the `seqfu stats` module is used to produce general quality metrics statistics.
62
+
46
63
### FastQ Screen
47
64
48
65
<detailsmarkdown="1">
49
66
<summary>Output files</summary>
50
67
51
68
-`fastqscreen/`
52
69
-`*_screen.html`: Interactive graphical report.
53
-
-`*_screen.pdf`: Static graphical report.
70
+
-`*_screen.png`: Static graphical report.
54
71
-`*_screen.txt` : Text-based report.
55
72
56
73
</details>
@@ -68,18 +85,53 @@ See `assets/example_fastq_screen_references.csv` for example.
68
85
69
86
The `.csv` is provided as a pipeline parameter `fastq_screen_references` and is used to construct a `FastQ Screen` configuration file within the context of the process work directory in order to properly mount the references.
-`*_mqc.txt`: File containing the same quality metrics as the TSV file, ready to be read by MultiQC.
93
+
Generates the full set of bwamem2 indexes:
79
94
80
-
</details>
95
+
-`bwamem2_index/`
96
+
-`*.fa`
97
+
-`*.fa.amb`
98
+
-`*.fa.ann`
99
+
-`*.fa.bwt`
100
+
-`*.fa.pac`
81
101
82
-
[SeqFu](https://telatin.github.io/seqfu2/) is general-purpose program to manipulate and parse information from FASTA/FASTQ files, supporting gzipped input files. Includes functions to interleave and de-interleave FASTQ files, to rename sequences and to count and print statistics on sequence lengths. In this pipeline, the `seqfu stats` module is used to produce general quality metrics statistics.
102
+
### BWAMEM2_MEM
103
+
104
+
[BWA-mem2](https://github.com/bwa-mem2/bwa-mem2) is a tool next version of bwa-mem for mapping sequencies with low divergence against a reference genome with increased processing speed (~1.3-3.1x). Aligned reads are then potentially filtered and coordinate-sorted using [samtools](#samtools-index).
105
+
106
+
<detailsmarkdown="1">
107
+
<summary>Output files</summary>
108
+
109
+
-`bwamem2/`
110
+
-`*.bam`: The original BAM file containing read alignments to the reference genome.
0 commit comments