nf-core
diff --git a/‎.nf-core.yml‎
Lines changed: 1 addition & 0 deletions b/‎.nf-core.yml‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎CHANGELOG.md‎
Lines changed: 6 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 6 additions & 0 deletions
diff --git a/‎CITATIONS.md‎
Lines changed: 2 additions & 0 deletions b/‎CITATIONS.md‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 11 additions & 10 deletions b/‎README.md‎
Lines changed: 11 additions & 10 deletions
diff --git a/‎assets/multiqc_config.yml‎
Lines changed: 3 additions & 1 deletion b/‎assets/multiqc_config.yml‎
Lines changed: 3 additions & 1 deletion
diff --git a/‎conf/modules.config‎
Lines changed: 17 additions & 0 deletions b/‎conf/modules.config‎
Lines changed: 17 additions & 0 deletions
diff --git a/‎docs/output.md‎
Lines changed: 72 additions & 7 deletions b/‎docs/output.md‎
Lines changed: 72 additions & 7 deletions
diff --git a/‎docs/usage.md‎
Lines changed: 4 additions & 0 deletions b/‎docs/usage.md‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎modules.json‎
Lines changed: 10 additions & 0 deletions b/‎modules.json‎
Lines changed: 10 additions & 0 deletions
diff --git a/‎modules/nf-core/picard/collecthsmetrics/environment.yml‎
Lines changed: 8 additions & 0 deletions b/‎modules/nf-core/picard/collecthsmetrics/environment.yml‎
Lines changed: 8 additions & 0 deletions
@@ -1,4 +1,5 @@
 lint:
+  multiqc_config: false
   files_exist:
     - tests/default.nf.test
   files_unchanged:
 
@@ -29,8 +29,12 @@ Initial release of nf-core/seqinspector, created with the [nf-core](https://nf-c
 - [#127](https://github.com/nf-core/seqinspector/pull/127) Added alignment tools - bwamem2 - index and mem
 - [#128](https://github.com/nf-core/seqinspector/pull/128) Added Picard tools - Collect Multiple Mterics to collect QC metrics
 - [#132](https://github.com/nf-core/seqinspector/pull/132) Added a bwamem2 index params for faster output
+- [#135](https://github.com/nf-core/seqinspector/pull/135) Added index section to MultiQC reports to facilitate report navigation (#125)
 - [#151](https://github.com/nf-core/seqinspector/pull/151) Added a prepare_genome subworkflow to handle bwamem2 indexing
 - [#156](https://github.com/nf-core/seqinspector/pull/156) Added relative sample_size and warning when a sample has less reads than desired sample_size.
+- [#158](https://github.com/nf-core/seqinspector/pull/158) Moved picard_collectmultiplemetrics to the subworkflow QC_BAM
+- [#159](https://github.com/nf-core/seqinspector/pull/159) Added a subworkflow QC_BAM including picard_collecthsmetrics for alignment QC of hybrid-selection data
+- [#162](https://github.com/nf-core/seqinspector/pull/162) Add tests for prepare_genome subworkflow
 
 ### `Fixed`
 
@@ -39,6 +43,8 @@ Initial release of nf-core/seqinspector, created with the [nf-core](https://nf-c
 - [#107](https://github.com/nf-core/seqinspector/pull/107) Put SeqFU-stats section reports together
 - [#112](https://github.com/nf-core/seqinspector/pull/112) Making fastq_screen_references value to use parentDir
 - [#94] (https://github.com/nf-core/seqinspector/issues/94) Go through and validate test data
+- [#162](https://github.com/nf-core/seqinspector/pull/162) Fix bugs in qc_bam and prepare_genome subworkflows and add tests
+- [#163](https://github.com/nf-core/seqinspector/pull/163) Run fastqscreen with subsampled data if available
 
 ### `Dependencies`
 
 
@@ -38,6 +38,8 @@
 
 - [Picard Tools](https://broadinstitute.github.io/picard/)
 
+> Broad Institute, “Picard Toolkit.” 2019. GitHub Repository. https://broadinstitute.github.io/picard/
+
 ## Software packaging/containerisation tools
 
 - [Anaconda](https://anaconda.com)
 
@@ -37,16 +37,17 @@
 
 <!-- TODO: add a search tool that accepts a tree for `Compatibility with Data`. -->
 
-| Tool Type           | Tool Name                                                                                                   | Tool Description                                                                              | Compatibility with Data | Dependencies                    | Default tool |
-| ------------------- | ----------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------- | ----------------------- | ------------------------------- | ------------ |
-| `Subsampling`       | [`Seqtk`](https://github.com/lh3/seqtk)                                                                     | Global subsampling of reads. Only performs subsampling if `--sample_size` parameter is given. | [RNA, DNA, synthetic]   | [N/A]                           | no           |
-| `Indexing, Mapping` | [`Bwamem2`](https://github.com/bwa-mem2/bwa-mem2)                                                           | Align reads to reference                                                                      | [RNA, DNA]              | [N/A]                           | yes          |
-| `Indexing`          | [`SAMtools`](http://github.com/samtools)                                                                    | Index aligned BAM files, create FASTA index                                                   | [DNA]                   | [N/A]                           | yes          |
-| `QC`                | [`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)                                      | Read QC                                                                                       | [RNA, DNA]              | [N/A]                           | yes          |
-| `QC`                | [`FastqScreen`](https://www.bioinformatics.babraham.ac.uk/projects/fastq_screen/)                           | Basic contamination detection                                                                 | [RNA, DNA]              | [N/A]                           | yes          |
-| `QC`                | [`SeqFu Stats`](https://github.com/telatin/seqfu2)                                                          | Sequence statistics                                                                           | [RNA, DNA]              | [N/A]                           | yes          |
-| `QC`                | [`Picard collect multiple metrics`](https://broadinstitute.github.io/picard/picard-metric-definitions.html) | Collect multiple QC metrics                                                                   | [RNA, DNA]              | [Bwamem2, SAMtools, `--genome`] | yes          |
-| `Reporting`         | [`MultiQC`](http://multiqc.info/)                                                                           | Present QC for raw reads                                                                      | [RNA, DNA, synthetic]   | [N/A]                           | yes          |
+| Tool Type           | Tool Name                                                                                                           | Tool Description                                                                              | Compatibility with Data | Dependencies                                                                                                              | Default tool |
+| ------------------- | ------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------- | ----------------------- | ------------------------------------------------------------------------------------------------------------------------- | ------------ |
+| `Subsampling`       | [`Seqtk`](https://github.com/lh3/seqtk)                                                                             | Global subsampling of reads. Only performs subsampling if `--sample_size` parameter is given. | [RNA, DNA, synthetic]   | [N/A]                                                                                                                     | no           |
+| `Indexing, Mapping` | [`Bwamem2`](https://github.com/bwa-mem2/bwa-mem2)                                                                   | Align reads to reference                                                                      | [RNA, DNA]              | [N/A]                                                                                                                     | yes          |
+| `Indexing`          | [`SAMtools`](http://github.com/samtools)                                                                            | Index aligned BAM files, create FASTA index                                                   | [DNA]                   | [N/A]                                                                                                                     | yes          |
+| `QC`                | [`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)                                              | Read QC                                                                                       | [RNA, DNA]              | [N/A]                                                                                                                     | yes          |
+| `QC`                | [`FastqScreen`](https://www.bioinformatics.babraham.ac.uk/projects/fastq_screen/)                                   | Basic contamination detection                                                                 | [RNA, DNA]              | [N/A]                                                                                                                     | yes          |
+| `QC`                | [`SeqFu Stats`](https://github.com/telatin/seqfu2)                                                                  | Sequence statistics                                                                           | [RNA, DNA]              | [N/A]                                                                                                                     | yes          |
+| `QC`                | [`Picard collect multiple metrics`](https://broadinstitute.github.io/picard/picard-metric-definitions.html)         | Collect multiple QC metrics                                                                   | [RNA, DNA]              | [Bwamem2, SAMtools, `--genome`]                                                                                           | yes          |
+| `QC`                | [`Picard_collecthsmetrics`](https://gatk.broadinstitute.org/hc/en-us/articles/360036856051-CollectHsMetrics-Picard) | Collect alignment QC metrics of hybrid-selection data.                                        | [RNA, DNA]              | [Bwamem2, SAMtools, `--fasta`, `--run_picard_collecths_metrics`, `--bait_intervals`, `--target_intervals` (`--ref_dict`)] | no           |
+| `Reporting`         | [`MultiQC`](http://multiqc.info/)                                                                                   | Present QC for raw reads                                                                      | [RNA, DNA, synthetic]   | [N/A]                                                                                                                     | yes          |
 
 <picture>
   <source media="(prefers-color-scheme: dark)" srcset="docs/images/seqinspector_tubemap_V1.0_dark.png">
 
@@ -3,9 +3,11 @@ report_comment: >
   analysis pipeline. For information about how to interpret these results, please see the
   <a href="https://nf-co.re/seqinspector/dev/docs/output" target="_blank">documentation</a>.
 report_section_order:
+  "nf-core-seqinspector-index":
+    order: -999
   "nf-core-seqinspector-methods-description":
     order: -1000
-  software_versions:
+  multiqc_software_versions:
     order: -1001
   "nf-core-seqinspector-summary":
     order: -1002
 
@@ -43,6 +43,14 @@ process {
         ]
     }
 
+        withName: 'PICARD_CREATESEQUENCEDICTIONARY' {
+        publishDir = [
+            path: { "${params.outdir}/picard_createsequencedictionary" },
+            mode: params.publish_dir_mode,
+            saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
+        ]
+    }
+
     withName: 'BWAMEM2_MEM' {
         publishDir = [
             path: { "${params.outdir}/bwamem2_mem" },
@@ -59,6 +67,15 @@ process {
         ]
     }
 
+    withName: 'PICARD_COLLECTHSMETRICS' {
+        publishDir = [
+            path: { "${params.outdir}/picard_collecthsmetrics" },
+            mode: params.publish_dir_mode,
+            saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
+        ]
+        ext.args = {"--TMP_DIR ."}
+    }
+
     withName: 'SAMTOOLS_FAIDX' {
         publishDir = [
             path: { "${params.outdir}/samtools_faidx" },
 
@@ -8,12 +8,17 @@ The directories listed below will be created in the results directory after the
 
 ## Pipeline overview
 
-The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes data using the following steps:
+The pipeline is built using [Nextflow](https://www.nextflow.io/) and can generate output files from the following steps:
 
 - [Seqtk](#seqtk) - Subsample a specific number of reads per sample
 - [FastQC](#fastqc) - Raw read QC
 - [SeqFu Stats](#seqfu_stats) - Statistics for FASTA or FASTQ files
 - [FastQ Screen](#fastqscreen) - Mapping against a set of references for basic contamination QC
+- [BWA-MEM2_INDEX](#bwamem2_index) - Create BWA-MEM2 index of a chosen reference genome OR use pre-built index
+- [BWA-MEM2_MEM](#bwamem2_mem) - Mapping reads against a chosen reference genome
+- [Samtools index](#samtools-index) - Index BAM files with Samtools
+- [Picard collect multiple metrics](#picard-collect-multiple-metrics) - Combine BAM and BAI outputs for Picard
+- [Picard collecthsmetrics](#picard-collecthsmetrics) - Collect alignment QC metrics of hybrid-selection data
 - [MultiQC](#multiqc) - Aggregate report describing results and QC from the whole pipeline
 - [Pipeline information](#pipeline-information) - Report metrics generated during the workflow execution
 
@@ -42,14 +47,27 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
 
 [FastQC](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) gives general quality metrics about your sequenced reads. It provides information about the quality score distribution across your reads, per base sequence content (%A/T/G/C), adapter contamination and overrepresented sequences. For further reading and documentation see the [FastQC help pages](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/).
 
+### SeqFu Stats
+
+<details markdown="1">
+<summary>Output files</summary>
+
+- `seqfu_stats/`
+  - `*.tsv`: Tab-separated file containing quality metrics.
+  - `*_mqc.txt`: File containing the same quality metrics as the TSV file, ready to be read by MultiQC.
+
+</details>
+
+[SeqFu](https://telatin.github.io/seqfu2/) is general-purpose program to manipulate and parse information from FASTA/FASTQ files, supporting gzipped input files. Includes functions to interleave and de-interleave FASTQ files, to rename sequences and to count and print statistics on sequence lengths. In this pipeline, the `seqfu stats` module is used to produce general quality metrics statistics.
+
 ### FastQ Screen
 
 <details markdown="1">
 <summary>Output files</summary>
 
 - `fastqscreen/`
   - `*_screen.html`: Interactive graphical report.
-  - `*_screen.pdf`: Static graphical report.
+  - `*_screen.png`: Static graphical report.
   - `*_screen.txt` : Text-based report.
 
 </details>
@@ -67,18 +85,65 @@ See `assets/example_fastq_screen_references.csv` for example.
 
 The `.csv` is provided as a pipeline parameter `fastq_screen_references` and is used to construct a `FastQ Screen` configuration file within the context of the process work directory in order to properly mount the references.
 
-### SeqFu Stats
+### BWAMEM2_INDEX
 
 <details markdown="1">
 <summary>Output files</summary>
 
-- `seqfu/`
-  - `*.tsv`: Tab-separated file containing quality metrics.
-  - `*_mqc.txt`: File containing the same quality metrics as the TSV file, ready to be read by MultiQC.
+Generates the full set of bwamem2 indexes:
+
+- `bwamem2_index/`
+  - `*.fa`
+  - `*.fa.amb`
+  - `*.fa.ann`
+  - `*.fa.bwt`
+  - `*.fa.pac`
+
+### BWAMEM2_MEM
+
+[BWA-mem2](https://github.com/bwa-mem2/bwa-mem2) is a tool next version of bwa-mem for mapping sequencies with low divergence against a reference genome with increased processing speed (~1.3-3.1x). Aligned reads are then potentially filtered and coordinate-sorted using [samtools](#samtools-index).
+
+<details markdown="1">
+<summary>Output files</summary>
+
+- `bwamem2/`
+  - `*.bam`: The original BAM file containing read alignments to the reference genome.
+  - `*.bam.bai`: BAM index files
+
+### Samtools index
+
+<details markdown="1">
+<summary>Output files</summary>
+
+- `samtools_faidex`
+  - `*.fa.fai`
+  - `*.fa.fai`
+
+### Picard collect multiple metrics
+
+<details markdown="1">
+<summary>Output files</summary>
+
+- `picard_collectmultiplemetrics`
+  - `*.CollectMultipleMetrics.alignment_summary_metrics`
+  - `*.CollectMultipleMetrics.base_distribution_by_cycle_metrics`
+  - `*.CollectMultipleMetrics.base_distribution_by_cycle.pdf`
+  - `*.CollectMultipleMetrics.quality_by_cycle_metrics`
+  - `*.CollectMultipleMetrics.quality_by_cycle.pdf`
+  - `*.CollectMultipleMetrics.quality_distribution.pdf`
+  - `*.CollectMultipleMetrics.read_length_histogram.pdf`
+
+### Picard CollectHSmetrics
+
+<details markdown="1">
+<summary>Output files</summary>
+
+- `picard_collecthsmetrics/`
+  - `*.coverage_metrics`: Tab-separated file containing quality metrics for hybrid-selection data.
 
 </details>
 
-[SeqFu](https://telatin.github.io/seqfu2/) is general-purpose program to manipulate and parse information from FASTA/FASTQ files, supporting gzipped input files. Includes functions to interleave and de-interleave FASTQ files, to rename sequences and to count and print statistics on sequence lengths. In this pipeline, the `seqfu stats` module is used to produce general quality metrics statistics.
+[Picard_collecthsmetrics](https://gatk.broadinstitute.org/hc/en-us/articles/360036856051-CollectHsMetrics-Picard) is a tool to collect metrics on the aligment SAM/BAM files that are specific for sequence datasets generated through hybrid-selection (mostly used to capture exon-specific sequences for targeted sequencing).
 
 ### MultiQC
 
 
@@ -227,3 +227,7 @@ We recommend adding the following line to your environment to limit this (typica
 ```bash
 NXF_OPTS='-Xms1g -Xmx4g'
 ```
+
+## Hybrid-selection QC metrics
+
+The pipeline supports hybrid-selection (HS) QC metrics collection . Use `--run_picard_collecthsmetrics true` to run the QC tool [picard CollectHSmetrics](https://gatk.broadinstitute.org/hc/en-us/articles/360036856051-CollectHsMetrics-Picard). This tool is otherwise not run by default.
@@ -42,11 +42,21 @@
                         "git_sha": "82a79183037a403ad1b6714e5dbcff25500efaf6",
                         "installed_by": ["modules"]
                     },
+                    "picard/collecthsmetrics": {
+                        "branch": "master",
+                        "git_sha": "e753770db613ce014b3c4bc94f6cba443427b726",
+                        "installed_by": ["modules"]
+                    },
                     "picard/collectmultiplemetrics": {
                         "branch": "master",
                         "git_sha": "df124e87c74d8b40285199f8cc20151f5aa57255",
                         "installed_by": ["modules"]
                     },
+                    "picard/createsequencedictionary": {
+                        "branch": "master",
+                        "git_sha": "df124e87c74d8b40285199f8cc20151f5aa57255",
+                        "installed_by": ["modules"]
+                    },
                     "samtools/faidx": {
                         "branch": "master",
                         "git_sha": "e753770db613ce014b3c4bc94f6cba443427b726",
Original file line number	Diff line number	Diff line change
`@@ -1,4 +1,5 @@`
`1`	`1`	`lint:`
	`2`	`+ multiqc_config: false`
`2`	`3`	`files_exist:`
`3`	`4`	`- tests/default.nf.test`
`4`	`5`	`files_unchanged:`