Skip to content

Commit a59ace5

Browse files
authored
Merge pull request #152 from FranBonath/output_changes
update output.md
2 parents f5841ec + 6da2acd commit a59ace5

1 file changed

Lines changed: 60 additions & 8 deletions

File tree

docs/output.md

Lines changed: 60 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -8,12 +8,16 @@ The directories listed below will be created in the results directory after the
88

99
## Pipeline overview
1010

11-
The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes data using the following steps:
11+
The pipeline is built using [Nextflow](https://www.nextflow.io/) and can generate output files from the following steps:
1212

1313
- [Seqtk](#seqtk) - Subsample a specific number of reads per sample
1414
- [FastQC](#fastqc) - Raw read QC
1515
- [SeqFu Stats](#seqfu_stats) - Statistics for FASTA or FASTQ files
1616
- [FastQ Screen](#fastqscreen) - Mapping against a set of references for basic contamination QC
17+
- [BWA-MEM2_INDEX](#bwamem2_index) - Create BWA-MEM2 index of a chosen reference genome OR use pre-built index
18+
- [BWA-MEM2_MEM](#bwamem2_mem) - Mapping reads against a chosen reference genome
19+
- [Samtools index](#samtools-index) - Index BAM files with Samtools
20+
- [Picard collect multiple metrics](#picard-collect-multiple-metrics) - Combine BAM and BAI outputs for Picard
1721
- [Picard collecthsmetrics](#picard-collecthsmetrics) - Collect alignment QC metrics of hybrid-selection data
1822
- [MultiQC](#multiqc) - Aggregate report describing results and QC from the whole pipeline
1923
- [Pipeline information](#pipeline-information) - Report metrics generated during the workflow execution
@@ -43,14 +47,27 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
4347

4448
[FastQC](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) gives general quality metrics about your sequenced reads. It provides information about the quality score distribution across your reads, per base sequence content (%A/T/G/C), adapter contamination and overrepresented sequences. For further reading and documentation see the [FastQC help pages](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/).
4549

50+
### SeqFu Stats
51+
52+
<details markdown="1">
53+
<summary>Output files</summary>
54+
55+
- `seqfu_stats/`
56+
- `*.tsv`: Tab-separated file containing quality metrics.
57+
- `*_mqc.txt`: File containing the same quality metrics as the TSV file, ready to be read by MultiQC.
58+
59+
</details>
60+
61+
[SeqFu](https://telatin.github.io/seqfu2/) is general-purpose program to manipulate and parse information from FASTA/FASTQ files, supporting gzipped input files. Includes functions to interleave and de-interleave FASTQ files, to rename sequences and to count and print statistics on sequence lengths. In this pipeline, the `seqfu stats` module is used to produce general quality metrics statistics.
62+
4663
### FastQ Screen
4764

4865
<details markdown="1">
4966
<summary>Output files</summary>
5067

5168
- `fastqscreen/`
5269
- `*_screen.html`: Interactive graphical report.
53-
- `*_screen.pdf`: Static graphical report.
70+
- `*_screen.png`: Static graphical report.
5471
- `*_screen.txt` : Text-based report.
5572

5673
</details>
@@ -68,18 +85,53 @@ See `assets/example_fastq_screen_references.csv` for example.
6885

6986
The `.csv` is provided as a pipeline parameter `fastq_screen_references` and is used to construct a `FastQ Screen` configuration file within the context of the process work directory in order to properly mount the references.
7087

71-
### SeqFu Stats
88+
### BWAMEM2_INDEX
7289

7390
<details markdown="1">
7491
<summary>Output files</summary>
7592

76-
- `seqfu/`
77-
- `*.tsv`: Tab-separated file containing quality metrics.
78-
- `*_mqc.txt`: File containing the same quality metrics as the TSV file, ready to be read by MultiQC.
93+
Generates the full set of bwamem2 indexes:
7994

80-
</details>
95+
- `bwamem2_index/`
96+
- `*.fa`
97+
- `*.fa.amb`
98+
- `*.fa.ann`
99+
- `*.fa.bwt`
100+
- `*.fa.pac`
81101

82-
[SeqFu](https://telatin.github.io/seqfu2/) is general-purpose program to manipulate and parse information from FASTA/FASTQ files, supporting gzipped input files. Includes functions to interleave and de-interleave FASTQ files, to rename sequences and to count and print statistics on sequence lengths. In this pipeline, the `seqfu stats` module is used to produce general quality metrics statistics.
102+
### BWAMEM2_MEM
103+
104+
[BWA-mem2](https://github.com/bwa-mem2/bwa-mem2) is a tool next version of bwa-mem for mapping sequencies with low divergence against a reference genome with increased processing speed (~1.3-3.1x). Aligned reads are then potentially filtered and coordinate-sorted using [samtools](#samtools-index).
105+
106+
<details markdown="1">
107+
<summary>Output files</summary>
108+
109+
- `bwamem2/`
110+
- `*.bam`: The original BAM file containing read alignments to the reference genome.
111+
- `*.bam.bai`: BAM index files
112+
113+
### Samtools index
114+
115+
<details markdown="1">
116+
<summary>Output files</summary>
117+
118+
- `samtools_faidex`
119+
- `*.fa.fai`
120+
- `*.fa.fai`
121+
122+
### Picard collect multiple metrics
123+
124+
<details markdown="1">
125+
<summary>Output files</summary>
126+
127+
- `picard_collectmultiplemetrics`
128+
- `*.CollectMultipleMetrics.alignment_summary_metrics`
129+
- `*.CollectMultipleMetrics.base_distribution_by_cycle_metrics`
130+
- `*.CollectMultipleMetrics.base_distribution_by_cycle.pdf`
131+
- `*.CollectMultipleMetrics.quality_by_cycle_metrics`
132+
- `*.CollectMultipleMetrics.quality_by_cycle.pdf`
133+
- `*.CollectMultipleMetrics.quality_distribution.pdf`
134+
- `*.CollectMultipleMetrics.read_length_histogram.pdf`
83135

84136
### Picard CollectHSmetrics
85137

0 commit comments

Comments
 (0)