Bug: CollectHsMetrics only runs for one sample regardless of input count
Description
When --run_picard_collecthsmetrics true is set, PICARD_COLLECTHSMETRICS only ever runs for one of the input samples — no matter how many FASTQ pairs / BAMs are in the run. The other samples are silently skipped with no error and no warning. The Nextflow progress UI confirms only one task is ever instantiated:
[xx/xxxxxx] NFCORE_SEQINSPECTOR:SEQINSPECTOR:QC_BAM:PICARD_COLLECTHSMETRICS (sampleX) [100%] 1 of 1
PICARD_COLLECTMULTIPLEMETRICS, which sits in the same subworkflow but does not consume the interval channels, runs correctly for all samples.
Expected behaviour
PICARD_COLLECTHSMETRICS should run once per input BAM, just like PICARD_COLLECTMULTIPLEMETRICS does.
Actual behaviour
Only one task is created. The "winning" sample is whichever one the Nextflow scheduler happens to pull off the BAM queue first — non-deterministic.
Root cause
In workflows/seqinspector.nf the bait and target interval channels are constructed with .collect():
ch_bait_intervals = bait_intervals ? channel.fromPath(bait_intervals).collect() : channel.empty()
ch_target_intervals = target_intervals ? channel.fromPath(target_intervals).collect() : channel.empty()
channel.fromPath(...).collect() produces a queue channel that emits once (one list of paths). In subworkflows/local/qc_bam/main.nf these channels are joined to the BAM channel via .combine(...):
ch_hsmetrics_in = ch_bam_bai
.combine(ch_bait_intervals)
.combine(ch_target_intervals)
When .combine() is applied to two queue channels, the right-hand channel is consumed once. With a single emission on the right, the first BAM consumes that emission and the channel ends — the remaining N-1 BAMs find nothing to combine with, so the cartesian product collapses to one tuple.
PICARD_COLLECTMULTIPLEMETRICS is unaffected because it does not combine with the interval channels.
Reproduction
- Provide a samplesheet with at least two samples / FASTQ pairs.
- Set
--run_picard_collecthsmetrics true and supply --bait_intervals and --target_intervals (any valid BED or interval_list works).
- Run the pipeline.
- Observe
results/picard_collecthsmetrics/ — only one *.CollectHsMetrics.coverage_metrics file is produced.
Minimal reproducer config
input: samplesheet.csv # >=2 rows
genome: GRCh38
fasta: /path/to/Homo_sapiens_assembly38.fasta
dict: /path/to/Homo_sapiens_assembly38.dict
bwamem2: /path/to/BWAmem2Index
run_picard_collecthsmetrics: true
bait_intervals: /path/to/wgs_calling_regions.bed
target_intervals: /path/to/wgs_calling_regions.bed
Evidence from a real run
- 4 input FASTQ pairs → 4 BAMs produced by
BWAMEM2_MEM
PICARD_COLLECTMULTIPLEMETRICS → 4 tasks ✅
PICARD_COLLECTHSMETRICS → 1 task ❌
.nextflow.log shows only a single PICARD_COLLECTHSMETRICS cache/submission entry across the whole run.
Proposed fix
Use .first() (which produces a value channel that broadcasts to every consumer) instead of .collect():
--- a/workflows/seqinspector.nf
+++ b/workflows/seqinspector.nf
@@ -184,8 +184,8 @@
if (!("picard_collectmultiplemetrics" in skip_tools)) {
- ch_bait_intervals = bait_intervals ? channel.fromPath(bait_intervals).collect() : channel.empty()
- ch_target_intervals = target_intervals ? channel.fromPath(target_intervals).collect() : channel.empty()
+ ch_bait_intervals = bait_intervals ? channel.fromPath(bait_intervals).first() : channel.empty()
+ ch_target_intervals = target_intervals ? channel.fromPath(target_intervals).first() : channel.empty()
QC_BAM(
ch_bwamem2_mem,
Verified locally: applying this patch produces the expected N tasks for N input BAMs, all four *.CollectHsMetrics.coverage_metrics files appear under results/picard_collecthsmetrics/, and MultiQC's HsMetrics section now lists every sample.
Environment
- nf-core/seqinspector: master (commit at time of report — please replace with
nextflow info nf-core/seqinspector output)
- Nextflow: 25.10.4
- Profile:
singularity
- Picard: 3.4.0 (from the pipeline-pinned container)
- Executor: SLURM
Related code
Bug:
CollectHsMetricsonly runs for one sample regardless of input countDescription
When
--run_picard_collecthsmetrics trueis set,PICARD_COLLECTHSMETRICSonly ever runs for one of the input samples — no matter how many FASTQ pairs / BAMs are in the run. The other samples are silently skipped with no error and no warning. The Nextflow progress UI confirms only one task is ever instantiated:PICARD_COLLECTMULTIPLEMETRICS, which sits in the same subworkflow but does not consume the interval channels, runs correctly for all samples.Expected behaviour
PICARD_COLLECTHSMETRICSshould run once per input BAM, just likePICARD_COLLECTMULTIPLEMETRICSdoes.Actual behaviour
Only one task is created. The "winning" sample is whichever one the Nextflow scheduler happens to pull off the BAM queue first — non-deterministic.
Root cause
In
workflows/seqinspector.nfthe bait and target interval channels are constructed with.collect():channel.fromPath(...).collect()produces a queue channel that emits once (one list of paths). Insubworkflows/local/qc_bam/main.nfthese channels are joined to the BAM channel via.combine(...):ch_hsmetrics_in = ch_bam_bai .combine(ch_bait_intervals) .combine(ch_target_intervals)When
.combine()is applied to two queue channels, the right-hand channel is consumed once. With a single emission on the right, the first BAM consumes that emission and the channel ends — the remainingN-1BAMs find nothing to combine with, so the cartesian product collapses to one tuple.PICARD_COLLECTMULTIPLEMETRICSis unaffected because it does not combine with the interval channels.Reproduction
--run_picard_collecthsmetrics trueand supply--bait_intervalsand--target_intervals(any valid BED or interval_list works).results/picard_collecthsmetrics/— only one*.CollectHsMetrics.coverage_metricsfile is produced.Minimal reproducer config
Evidence from a real run
BWAMEM2_MEMPICARD_COLLECTMULTIPLEMETRICS→ 4 tasks ✅PICARD_COLLECTHSMETRICS→ 1 task ❌.nextflow.logshows only a singlePICARD_COLLECTHSMETRICScache/submission entry across the whole run.Proposed fix
Use
.first()(which produces a value channel that broadcasts to every consumer) instead of.collect():Verified locally: applying this patch produces the expected
Ntasks forNinput BAMs, all four*.CollectHsMetrics.coverage_metricsfiles appear underresults/picard_collecthsmetrics/, and MultiQC's HsMetrics section now lists every sample.Environment
nextflow info nf-core/seqinspectoroutput)singularityRelated code