Skip to content

Commit 3ff0f46

Browse files
committed
Merge branch 'dev' into seqtk_sample_size_warning
2 parents 267213c + a59ace5 commit 3ff0f46

37 files changed

Lines changed: 3290 additions & 88 deletions

.nf-core.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
lint:
2+
multiqc_config: false
23
files_exist:
34
- tests/default.nf.test
45
files_unchanged:

CHANGELOG.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,8 +29,12 @@ Initial release of nf-core/seqinspector, created with the [nf-core](https://nf-c
2929
- [#127](https://github.com/nf-core/seqinspector/pull/127) Added alignment tools - bwamem2 - index and mem
3030
- [#128](https://github.com/nf-core/seqinspector/pull/128) Added Picard tools - Collect Multiple Mterics to collect QC metrics
3131
- [#132](https://github.com/nf-core/seqinspector/pull/132) Added a bwamem2 index params for faster output
32+
- [#135](https://github.com/nf-core/seqinspector/pull/135) Added index section to MultiQC reports to facilitate report navigation (#125)
3233
- [#151](https://github.com/nf-core/seqinspector/pull/151) Added a prepare_genome subworkflow to handle bwamem2 indexing
3334
- [#156](https://github.com/nf-core/seqinspector/pull/156) Added relative sample_size and warning when a sample has less reads than desired sample_size.
35+
- [#158](https://github.com/nf-core/seqinspector/pull/158) Moved picard_collectmultiplemetrics to the subworkflow QC_BAM
36+
- [#159](https://github.com/nf-core/seqinspector/pull/159) Added a subworkflow QC_BAM including picard_collecthsmetrics for alignment QC of hybrid-selection data
37+
- [#162](https://github.com/nf-core/seqinspector/pull/162) Add tests for prepare_genome subworkflow
3438

3539
### `Fixed`
3640

@@ -39,6 +43,8 @@ Initial release of nf-core/seqinspector, created with the [nf-core](https://nf-c
3943
- [#107](https://github.com/nf-core/seqinspector/pull/107) Put SeqFU-stats section reports together
4044
- [#112](https://github.com/nf-core/seqinspector/pull/112) Making fastq_screen_references value to use parentDir
4145
- [#94] (https://github.com/nf-core/seqinspector/issues/94) Go through and validate test data
46+
- [#162](https://github.com/nf-core/seqinspector/pull/162) Fix bugs in qc_bam and prepare_genome subworkflows and add tests
47+
- [#163](https://github.com/nf-core/seqinspector/pull/163) Run fastqscreen with subsampled data if available
4248

4349
### `Dependencies`
4450

CITATIONS.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,8 @@
3838
3939
- [Picard Tools](https://broadinstitute.github.io/picard/)
4040

41+
> Broad Institute, “Picard Toolkit.” 2019. GitHub Repository. https://broadinstitute.github.io/picard/
42+
4143
## Software packaging/containerisation tools
4244

4345
- [Anaconda](https://anaconda.com)

README.md

Lines changed: 11 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -37,16 +37,17 @@
3737

3838
<!-- TODO: add a search tool that accepts a tree for `Compatibility with Data`. -->
3939

40-
| Tool Type | Tool Name | Tool Description | Compatibility with Data | Dependencies | Default tool |
41-
| ------------------- | ----------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------- | ----------------------- | ------------------------------- | ------------ |
42-
| `Subsampling` | [`Seqtk`](https://github.com/lh3/seqtk) | Global subsampling of reads. Only performs subsampling if `--sample_size` parameter is given. | [RNA, DNA, synthetic] | [N/A] | no |
43-
| `Indexing, Mapping` | [`Bwamem2`](https://github.com/bwa-mem2/bwa-mem2) | Align reads to reference | [RNA, DNA] | [N/A] | yes |
44-
| `Indexing` | [`SAMtools`](http://github.com/samtools) | Index aligned BAM files, create FASTA index | [DNA] | [N/A] | yes |
45-
| `QC` | [`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) | Read QC | [RNA, DNA] | [N/A] | yes |
46-
| `QC` | [`FastqScreen`](https://www.bioinformatics.babraham.ac.uk/projects/fastq_screen/) | Basic contamination detection | [RNA, DNA] | [N/A] | yes |
47-
| `QC` | [`SeqFu Stats`](https://github.com/telatin/seqfu2) | Sequence statistics | [RNA, DNA] | [N/A] | yes |
48-
| `QC` | [`Picard collect multiple metrics`](https://broadinstitute.github.io/picard/picard-metric-definitions.html) | Collect multiple QC metrics | [RNA, DNA] | [Bwamem2, SAMtools, `--genome`] | yes |
49-
| `Reporting` | [`MultiQC`](http://multiqc.info/) | Present QC for raw reads | [RNA, DNA, synthetic] | [N/A] | yes |
40+
| Tool Type | Tool Name | Tool Description | Compatibility with Data | Dependencies | Default tool |
41+
| ------------------- | ------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------- | ----------------------- | ------------------------------------------------------------------------------------------------------------------------- | ------------ |
42+
| `Subsampling` | [`Seqtk`](https://github.com/lh3/seqtk) | Global subsampling of reads. Only performs subsampling if `--sample_size` parameter is given. | [RNA, DNA, synthetic] | [N/A] | no |
43+
| `Indexing, Mapping` | [`Bwamem2`](https://github.com/bwa-mem2/bwa-mem2) | Align reads to reference | [RNA, DNA] | [N/A] | yes |
44+
| `Indexing` | [`SAMtools`](http://github.com/samtools) | Index aligned BAM files, create FASTA index | [DNA] | [N/A] | yes |
45+
| `QC` | [`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) | Read QC | [RNA, DNA] | [N/A] | yes |
46+
| `QC` | [`FastqScreen`](https://www.bioinformatics.babraham.ac.uk/projects/fastq_screen/) | Basic contamination detection | [RNA, DNA] | [N/A] | yes |
47+
| `QC` | [`SeqFu Stats`](https://github.com/telatin/seqfu2) | Sequence statistics | [RNA, DNA] | [N/A] | yes |
48+
| `QC` | [`Picard collect multiple metrics`](https://broadinstitute.github.io/picard/picard-metric-definitions.html) | Collect multiple QC metrics | [RNA, DNA] | [Bwamem2, SAMtools, `--genome`] | yes |
49+
| `QC` | [`Picard_collecthsmetrics`](https://gatk.broadinstitute.org/hc/en-us/articles/360036856051-CollectHsMetrics-Picard) | Collect alignment QC metrics of hybrid-selection data. | [RNA, DNA] | [Bwamem2, SAMtools, `--fasta`, `--run_picard_collecths_metrics`, `--bait_intervals`, `--target_intervals` (`--ref_dict`)] | no |
50+
| `Reporting` | [`MultiQC`](http://multiqc.info/) | Present QC for raw reads | [RNA, DNA, synthetic] | [N/A] | yes |
5051

5152
<picture>
5253
<source media="(prefers-color-scheme: dark)" srcset="docs/images/seqinspector_tubemap_V1.0_dark.png">

assets/multiqc_config.yml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,11 @@ report_comment: >
33
analysis pipeline. For information about how to interpret these results, please see the
44
<a href="https://nf-co.re/seqinspector/dev/docs/output" target="_blank">documentation</a>.
55
report_section_order:
6+
"nf-core-seqinspector-index":
7+
order: -999
68
"nf-core-seqinspector-methods-description":
79
order: -1000
8-
software_versions:
10+
multiqc_software_versions:
911
order: -1001
1012
"nf-core-seqinspector-summary":
1113
order: -1002

conf/modules.config

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,14 @@ process {
4343
]
4444
}
4545

46+
withName: 'PICARD_CREATESEQUENCEDICTIONARY' {
47+
publishDir = [
48+
path: { "${params.outdir}/picard_createsequencedictionary" },
49+
mode: params.publish_dir_mode,
50+
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
51+
]
52+
}
53+
4654
withName: 'BWAMEM2_MEM' {
4755
publishDir = [
4856
path: { "${params.outdir}/bwamem2_mem" },
@@ -59,6 +67,15 @@ process {
5967
]
6068
}
6169

70+
withName: 'PICARD_COLLECTHSMETRICS' {
71+
publishDir = [
72+
path: { "${params.outdir}/picard_collecthsmetrics" },
73+
mode: params.publish_dir_mode,
74+
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
75+
]
76+
ext.args = {"--TMP_DIR ."}
77+
}
78+
6279
withName: 'SAMTOOLS_FAIDX' {
6380
publishDir = [
6481
path: { "${params.outdir}/samtools_faidx" },

docs/output.md

Lines changed: 72 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -8,12 +8,17 @@ The directories listed below will be created in the results directory after the
88

99
## Pipeline overview
1010

11-
The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes data using the following steps:
11+
The pipeline is built using [Nextflow](https://www.nextflow.io/) and can generate output files from the following steps:
1212

1313
- [Seqtk](#seqtk) - Subsample a specific number of reads per sample
1414
- [FastQC](#fastqc) - Raw read QC
1515
- [SeqFu Stats](#seqfu_stats) - Statistics for FASTA or FASTQ files
1616
- [FastQ Screen](#fastqscreen) - Mapping against a set of references for basic contamination QC
17+
- [BWA-MEM2_INDEX](#bwamem2_index) - Create BWA-MEM2 index of a chosen reference genome OR use pre-built index
18+
- [BWA-MEM2_MEM](#bwamem2_mem) - Mapping reads against a chosen reference genome
19+
- [Samtools index](#samtools-index) - Index BAM files with Samtools
20+
- [Picard collect multiple metrics](#picard-collect-multiple-metrics) - Combine BAM and BAI outputs for Picard
21+
- [Picard collecthsmetrics](#picard-collecthsmetrics) - Collect alignment QC metrics of hybrid-selection data
1722
- [MultiQC](#multiqc) - Aggregate report describing results and QC from the whole pipeline
1823
- [Pipeline information](#pipeline-information) - Report metrics generated during the workflow execution
1924

@@ -42,14 +47,27 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
4247

4348
[FastQC](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) gives general quality metrics about your sequenced reads. It provides information about the quality score distribution across your reads, per base sequence content (%A/T/G/C), adapter contamination and overrepresented sequences. For further reading and documentation see the [FastQC help pages](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/).
4449

50+
### SeqFu Stats
51+
52+
<details markdown="1">
53+
<summary>Output files</summary>
54+
55+
- `seqfu_stats/`
56+
- `*.tsv`: Tab-separated file containing quality metrics.
57+
- `*_mqc.txt`: File containing the same quality metrics as the TSV file, ready to be read by MultiQC.
58+
59+
</details>
60+
61+
[SeqFu](https://telatin.github.io/seqfu2/) is general-purpose program to manipulate and parse information from FASTA/FASTQ files, supporting gzipped input files. Includes functions to interleave and de-interleave FASTQ files, to rename sequences and to count and print statistics on sequence lengths. In this pipeline, the `seqfu stats` module is used to produce general quality metrics statistics.
62+
4563
### FastQ Screen
4664

4765
<details markdown="1">
4866
<summary>Output files</summary>
4967

5068
- `fastqscreen/`
5169
- `*_screen.html`: Interactive graphical report.
52-
- `*_screen.pdf`: Static graphical report.
70+
- `*_screen.png`: Static graphical report.
5371
- `*_screen.txt` : Text-based report.
5472

5573
</details>
@@ -67,18 +85,65 @@ See `assets/example_fastq_screen_references.csv` for example.
6785

6886
The `.csv` is provided as a pipeline parameter `fastq_screen_references` and is used to construct a `FastQ Screen` configuration file within the context of the process work directory in order to properly mount the references.
6987

70-
### SeqFu Stats
88+
### BWAMEM2_INDEX
7189

7290
<details markdown="1">
7391
<summary>Output files</summary>
7492

75-
- `seqfu/`
76-
- `*.tsv`: Tab-separated file containing quality metrics.
77-
- `*_mqc.txt`: File containing the same quality metrics as the TSV file, ready to be read by MultiQC.
93+
Generates the full set of bwamem2 indexes:
94+
95+
- `bwamem2_index/`
96+
- `*.fa`
97+
- `*.fa.amb`
98+
- `*.fa.ann`
99+
- `*.fa.bwt`
100+
- `*.fa.pac`
101+
102+
### BWAMEM2_MEM
103+
104+
[BWA-mem2](https://github.com/bwa-mem2/bwa-mem2) is a tool next version of bwa-mem for mapping sequencies with low divergence against a reference genome with increased processing speed (~1.3-3.1x). Aligned reads are then potentially filtered and coordinate-sorted using [samtools](#samtools-index).
105+
106+
<details markdown="1">
107+
<summary>Output files</summary>
108+
109+
- `bwamem2/`
110+
- `*.bam`: The original BAM file containing read alignments to the reference genome.
111+
- `*.bam.bai`: BAM index files
112+
113+
### Samtools index
114+
115+
<details markdown="1">
116+
<summary>Output files</summary>
117+
118+
- `samtools_faidex`
119+
- `*.fa.fai`
120+
- `*.fa.fai`
121+
122+
### Picard collect multiple metrics
123+
124+
<details markdown="1">
125+
<summary>Output files</summary>
126+
127+
- `picard_collectmultiplemetrics`
128+
- `*.CollectMultipleMetrics.alignment_summary_metrics`
129+
- `*.CollectMultipleMetrics.base_distribution_by_cycle_metrics`
130+
- `*.CollectMultipleMetrics.base_distribution_by_cycle.pdf`
131+
- `*.CollectMultipleMetrics.quality_by_cycle_metrics`
132+
- `*.CollectMultipleMetrics.quality_by_cycle.pdf`
133+
- `*.CollectMultipleMetrics.quality_distribution.pdf`
134+
- `*.CollectMultipleMetrics.read_length_histogram.pdf`
135+
136+
### Picard CollectHSmetrics
137+
138+
<details markdown="1">
139+
<summary>Output files</summary>
140+
141+
- `picard_collecthsmetrics/`
142+
- `*.coverage_metrics`: Tab-separated file containing quality metrics for hybrid-selection data.
78143

79144
</details>
80145

81-
[SeqFu](https://telatin.github.io/seqfu2/) is general-purpose program to manipulate and parse information from FASTA/FASTQ files, supporting gzipped input files. Includes functions to interleave and de-interleave FASTQ files, to rename sequences and to count and print statistics on sequence lengths. In this pipeline, the `seqfu stats` module is used to produce general quality metrics statistics.
146+
[Picard_collecthsmetrics](https://gatk.broadinstitute.org/hc/en-us/articles/360036856051-CollectHsMetrics-Picard) is a tool to collect metrics on the aligment SAM/BAM files that are specific for sequence datasets generated through hybrid-selection (mostly used to capture exon-specific sequences for targeted sequencing).
82147

83148
### MultiQC
84149

docs/usage.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -227,3 +227,7 @@ We recommend adding the following line to your environment to limit this (typica
227227
```bash
228228
NXF_OPTS='-Xms1g -Xmx4g'
229229
```
230+
231+
## Hybrid-selection QC metrics
232+
233+
The pipeline supports hybrid-selection (HS) QC metrics collection . Use `--run_picard_collecthsmetrics true` to run the QC tool [picard CollectHSmetrics](https://gatk.broadinstitute.org/hc/en-us/articles/360036856051-CollectHsMetrics-Picard). This tool is otherwise not run by default.

modules.json

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,11 +42,21 @@
4242
"git_sha": "82a79183037a403ad1b6714e5dbcff25500efaf6",
4343
"installed_by": ["modules"]
4444
},
45+
"picard/collecthsmetrics": {
46+
"branch": "master",
47+
"git_sha": "e753770db613ce014b3c4bc94f6cba443427b726",
48+
"installed_by": ["modules"]
49+
},
4550
"picard/collectmultiplemetrics": {
4651
"branch": "master",
4752
"git_sha": "df124e87c74d8b40285199f8cc20151f5aa57255",
4853
"installed_by": ["modules"]
4954
},
55+
"picard/createsequencedictionary": {
56+
"branch": "master",
57+
"git_sha": "df124e87c74d8b40285199f8cc20151f5aa57255",
58+
"installed_by": ["modules"]
59+
},
5060
"samtools/faidx": {
5161
"branch": "master",
5262
"git_sha": "e753770db613ce014b3c4bc94f6cba443427b726",

modules/nf-core/picard/collecthsmetrics/environment.yml

Lines changed: 8 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)