You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-[#132](https://github.com/nf-core/seqinspector/pull/132) Added a bwamem2 index params for faster output
32
+
-[#135](https://github.com/nf-core/seqinspector/pull/135) Added index section to MultiQC reports to facilitate report navigation (#125)
32
33
-[#151](https://github.com/nf-core/seqinspector/pull/151) Added a prepare_genome subworkflow to handle bwamem2 indexing
33
34
-[#156](https://github.com/nf-core/seqinspector/pull/156) Added relative sample_size and warning when a sample has less reads than desired sample_size.
35
+
-[#158](https://github.com/nf-core/seqinspector/pull/158) Moved picard_collectmultiplemetrics to the subworkflow QC_BAM
36
+
-[#159](https://github.com/nf-core/seqinspector/pull/159) Added a subworkflow QC_BAM including picard_collecthsmetrics for alignment QC of hybrid-selection data
37
+
-[#162](https://github.com/nf-core/seqinspector/pull/162) Add tests for prepare_genome subworkflow
34
38
35
39
### `Fixed`
36
40
@@ -39,6 +43,8 @@ Initial release of nf-core/seqinspector, created with the [nf-core](https://nf-c
39
43
-[#107](https://github.com/nf-core/seqinspector/pull/107) Put SeqFU-stats section reports together
40
44
-[#112](https://github.com/nf-core/seqinspector/pull/112) Making fastq_screen_references value to use parentDir
41
45
-[#94] (https://github.com/nf-core/seqinspector/issues/94) Go through and validate test data
46
+
-[#162](https://github.com/nf-core/seqinspector/pull/162) Fix bugs in qc_bam and prepare_genome subworkflows and add tests
47
+
-[#163](https://github.com/nf-core/seqinspector/pull/163) Run fastqscreen with subsampled data if available
|`Subsampling`|[`Seqtk`](https://github.com/lh3/seqtk)| Global subsampling of reads. Only performs subsampling if `--sample_size` parameter is given. |[RNA, DNA, synthetic]|[N/A]| no |
43
-
|`Indexing, Mapping`|[`Bwamem2`](https://github.com/bwa-mem2/bwa-mem2)| Align reads to reference |[RNA, DNA]|[N/A]| yes |
44
-
|`Indexing`|[`SAMtools`](http://github.com/samtools)| Index aligned BAM files, create FASTA index |[DNA]|[N/A]| yes |
|`Subsampling`|[`Seqtk`](https://github.com/lh3/seqtk)| Global subsampling of reads. Only performs subsampling if `--sample_size` parameter is given. |[RNA, DNA, synthetic]|[N/A]| no |
43
+
|`Indexing, Mapping`|[`Bwamem2`](https://github.com/bwa-mem2/bwa-mem2)| Align reads to reference |[RNA, DNA]|[N/A]| yes |
44
+
|`Indexing`|[`SAMtools`](http://github.com/samtools)| Index aligned BAM files, create FASTA index |[DNA]|[N/A]| yes |
Copy file name to clipboardExpand all lines: docs/output.md
+72-7Lines changed: 72 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,12 +8,17 @@ The directories listed below will be created in the results directory after the
8
8
9
9
## Pipeline overview
10
10
11
-
The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes data using the following steps:
11
+
The pipeline is built using [Nextflow](https://www.nextflow.io/) and can generate output files from the following steps:
12
12
13
13
-[Seqtk](#seqtk) - Subsample a specific number of reads per sample
14
14
-[FastQC](#fastqc) - Raw read QC
15
15
-[SeqFu Stats](#seqfu_stats) - Statistics for FASTA or FASTQ files
16
16
-[FastQ Screen](#fastqscreen) - Mapping against a set of references for basic contamination QC
17
+
-[BWA-MEM2_INDEX](#bwamem2_index) - Create BWA-MEM2 index of a chosen reference genome OR use pre-built index
18
+
-[BWA-MEM2_MEM](#bwamem2_mem) - Mapping reads against a chosen reference genome
19
+
-[Samtools index](#samtools-index) - Index BAM files with Samtools
20
+
-[Picard collect multiple metrics](#picard-collect-multiple-metrics) - Combine BAM and BAI outputs for Picard
21
+
-[Picard collecthsmetrics](#picard-collecthsmetrics) - Collect alignment QC metrics of hybrid-selection data
17
22
-[MultiQC](#multiqc) - Aggregate report describing results and QC from the whole pipeline
18
23
-[Pipeline information](#pipeline-information) - Report metrics generated during the workflow execution
19
24
@@ -42,14 +47,27 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
42
47
43
48
[FastQC](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) gives general quality metrics about your sequenced reads. It provides information about the quality score distribution across your reads, per base sequence content (%A/T/G/C), adapter contamination and overrepresented sequences. For further reading and documentation see the [FastQC help pages](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/).
-`*_mqc.txt`: File containing the same quality metrics as the TSV file, ready to be read by MultiQC.
58
+
59
+
</details>
60
+
61
+
[SeqFu](https://telatin.github.io/seqfu2/) is general-purpose program to manipulate and parse information from FASTA/FASTQ files, supporting gzipped input files. Includes functions to interleave and de-interleave FASTQ files, to rename sequences and to count and print statistics on sequence lengths. In this pipeline, the `seqfu stats` module is used to produce general quality metrics statistics.
62
+
45
63
### FastQ Screen
46
64
47
65
<detailsmarkdown="1">
48
66
<summary>Output files</summary>
49
67
50
68
-`fastqscreen/`
51
69
-`*_screen.html`: Interactive graphical report.
52
-
-`*_screen.pdf`: Static graphical report.
70
+
-`*_screen.png`: Static graphical report.
53
71
-`*_screen.txt` : Text-based report.
54
72
55
73
</details>
@@ -67,18 +85,65 @@ See `assets/example_fastq_screen_references.csv` for example.
67
85
68
86
The `.csv` is provided as a pipeline parameter `fastq_screen_references` and is used to construct a `FastQ Screen` configuration file within the context of the process work directory in order to properly mount the references.
-`*_mqc.txt`: File containing the same quality metrics as the TSV file, ready to be read by MultiQC.
93
+
Generates the full set of bwamem2 indexes:
94
+
95
+
-`bwamem2_index/`
96
+
-`*.fa`
97
+
-`*.fa.amb`
98
+
-`*.fa.ann`
99
+
-`*.fa.bwt`
100
+
-`*.fa.pac`
101
+
102
+
### BWAMEM2_MEM
103
+
104
+
[BWA-mem2](https://github.com/bwa-mem2/bwa-mem2) is a tool next version of bwa-mem for mapping sequencies with low divergence against a reference genome with increased processing speed (~1.3-3.1x). Aligned reads are then potentially filtered and coordinate-sorted using [samtools](#samtools-index).
105
+
106
+
<detailsmarkdown="1">
107
+
<summary>Output files</summary>
108
+
109
+
-`bwamem2/`
110
+
-`*.bam`: The original BAM file containing read alignments to the reference genome.
-`*.coverage_metrics`: Tab-separated file containing quality metrics for hybrid-selection data.
78
143
79
144
</details>
80
145
81
-
[SeqFu](https://telatin.github.io/seqfu2/) is general-purpose program to manipulate and parse information from FASTA/FASTQ files, supporting gzipped input files. Includes functions to interleave and de-interleave FASTQ files, to rename sequences and to count and print statistics on sequence lengths. In this pipeline, the `seqfu stats` module is used to produce general quality metrics statistics.
146
+
[Picard_collecthsmetrics](https://gatk.broadinstitute.org/hc/en-us/articles/360036856051-CollectHsMetrics-Picard) is a tool to collect metrics on the aligment SAM/BAM files that are specific for sequence datasets generated through hybrid-selection (mostly used to capture exon-specific sequences for targeted sequencing).
Copy file name to clipboardExpand all lines: docs/usage.md
+4Lines changed: 4 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -227,3 +227,7 @@ We recommend adding the following line to your environment to limit this (typica
227
227
```bash
228
228
NXF_OPTS='-Xms1g -Xmx4g'
229
229
```
230
+
231
+
## Hybrid-selection QC metrics
232
+
233
+
The pipeline supports hybrid-selection (HS) QC metrics collection . Use `--run_picard_collecthsmetrics true` to run the QC tool [picard CollectHSmetrics](https://gatk.broadinstitute.org/hc/en-us/articles/360036856051-CollectHsMetrics-Picard). This tool is otherwise not run by default.
0 commit comments