Skip to content

Commit 02477d1

Browse files
authored
Merge pull request #156 from nkongenelly/seqtk_sample_size_warning
Seqtk add relative sample sets and warning
2 parents f1a3b8c + 705c56b commit 02477d1

6 files changed

Lines changed: 2017 additions & 28 deletions

File tree

CHANGELOG.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,18 +31,19 @@ Initial release of nf-core/seqinspector, created with the [nf-core](https://nf-c
3131
- [#132](https://github.com/nf-core/seqinspector/pull/132) Added a bwamem2 index params for faster output
3232
- [#135](https://github.com/nf-core/seqinspector/pull/135) Added index section to MultiQC reports to facilitate report navigation (#125)
3333
- [#151](https://github.com/nf-core/seqinspector/pull/151) Added a prepare_genome subworkflow to handle bwamem2 indexing
34+
- [#156](https://github.com/nf-core/seqinspector/pull/156) Added relative sample_size and warning when a sample has less reads than desired sample_size.
3435
- [#158](https://github.com/nf-core/seqinspector/pull/158) Moved picard_collectmultiplemetrics to the subworkflow QC_BAM
3536
- [#159](https://github.com/nf-core/seqinspector/pull/159) Added a subworkflow QC_BAM including picard_collecthsmetrics for alignment QC of hybrid-selection data
3637
- [#162](https://github.com/nf-core/seqinspector/pull/162) Add tests for prepare_genome subworkflow
3738

3839
### `Fixed`
3940

4041
- [#71](https://github.com/nf-core/seqinspector/pull/71) FASTQSCREEN does not fail when multiple reads are provided.
42+
- [#94](https://github.com/nf-core/seqinspector/issues/94) Go through and validate test data
4143
- [#99](https://github.com/nf-core/seqinspector/pull/99) Fix group reports for paired reads
4244
- [#107](https://github.com/nf-core/seqinspector/pull/107) Put SeqFU-stats section reports together
4345
- [#112](https://github.com/nf-core/seqinspector/pull/112) Making fastq_screen_references value to use parentDir
4446
- [#121](https://github.com/nf-core/seqinspector/pull/121) Cleanup sample naming for MultiQC report (#105)
45-
- [#94] (https://github.com/nf-core/seqinspector/issues/94) Go through and validate test data
4647
- [#162](https://github.com/nf-core/seqinspector/pull/162) Fix bugs in qc_bam and prepare_genome subworkflows and add tests
4748
- [#163](https://github.com/nf-core/seqinspector/pull/163) Run fastqscreen with subsampled data if available
4849

docs/usage.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -98,7 +98,7 @@ genome: 'GRCh37'
9898

9999
You can also generate such `YAML`/`JSON` files via [nf-core/launch](https://nf-co.re/launch).
100100

101-
Optionally, the `sample_size` parameter allows you to subset a random number of reads to be analysed. Note that it refers to an absolute number.
101+
Optionally, the `sample_size` parameter allows you to subset a random number of reads to be analysed. Both absolute numbers (e.g 100) and relative numbers (e.g 0.25) can be specified.
102102

103103
```bash
104104
nextflow run nf-core/seqinspector --input ./samplesheet.csv --outdir ./results --sample_size 1000000 -profile docker

nextflow_schema.json

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -24,9 +24,9 @@
2424
"fa_icon": "fas fa-file-csv"
2525
},
2626
"sample_size": {
27-
"type": "integer",
28-
"description": "Take this number of reads as a subset.",
29-
"help_text": "Choose the size of the subset or 0, if no subsampling shall be performed. Note that it refers to an absolute number.",
27+
"type": "number",
28+
"description": "Take a subset of reads for analysis.",
29+
"help_text": "Subset can be used as a fraction of reads (ex/ 0.20) or an absolute number of reads per sample (integer). Pipeline will still run if a sample has less reads than selected subset value.",
3030
"default": 0
3131
},
3232
"outdir": {

tests/NovaSeq6000.main_subsample.nf.test

Lines changed: 74 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
nextflow_pipeline {
22

3-
name "Test Workflow main.nf on NovaSeq6000 data sample size 90"
3+
name "Test Workflow main.nf on NovaSeq6000 data with different sample sizes"
44
script "../main.nf"
55
tag "seqinspector"
66
tag "PIPELINE"
@@ -38,4 +38,77 @@ nextflow_pipeline {
3838
)
3939
}
4040
}
41+
42+
test("NovaSeq6000 data test relative sample size") {
43+
44+
when {
45+
config "./NovaSeq6000.main_subsample.nf.test.config"
46+
params {
47+
outdir = "$outputDir"
48+
sample_size = 0.9
49+
}
50+
}
51+
52+
then {
53+
// stable_name: All files + folders in ${params.outdir}/ with a stable name
54+
def stable_name = getAllFilesFromDir(
55+
params.outdir,
56+
relative: true, includeDir: true, ignore: ['pipeline_info/*.{html,json,txt}']
57+
)
58+
// stable_path: All files in ${params.outdir}/ with stable content
59+
def stable_path = getAllFilesFromDir(
60+
params.outdir,
61+
ignoreFile: 'tests/.nftignore'
62+
)
63+
assertAll(
64+
{ assert workflow.success},
65+
{ assert snapshot(
66+
// pipeline versions.yml file for multiqc from which Nextflow version is removed because we tests pipelines on multiple Nextflow versions
67+
removeNextflowVersion("$outputDir/pipeline_info/nf_core_seqinspector_software_mqc_versions.yml"),
68+
// All stable path names, with a relative path
69+
stable_name,
70+
// All files with stable contents
71+
stable_path
72+
).match() }
73+
)
74+
}
75+
}
76+
77+
test("NovaSeq6000 data test sample size exceeds available reads") {
78+
tag "warning"
79+
80+
when {
81+
config "./NovaSeq6000.main_subsample.nf.test.config"
82+
params {
83+
outdir = "$outputDir"
84+
sample_size = 120
85+
}
86+
}
87+
88+
then {
89+
// stable_name: All files + folders in ${params.outdir}/ with a stable name
90+
def stable_name = getAllFilesFromDir(
91+
params.outdir,
92+
relative: true, includeDir: true, ignore: ['pipeline_info/*.{html,json,txt}']
93+
)
94+
// stable_path: All files in ${params.outdir}/ with stable content
95+
def stable_path = getAllFilesFromDir(
96+
params.outdir,
97+
ignoreFile: 'tests/.nftignore'
98+
)
99+
assert workflow.success
100+
assertAll(
101+
{ assert snapshot(
102+
// pipeline versions.yml file for multiqc from which Nextflow version is removed because we tests pipelines on multiple Nextflow versions
103+
removeNextflowVersion("$outputDir/pipeline_info/nf_core_seqinspector_software_mqc_versions.yml"),
104+
// All stable path names, with a relative path
105+
stable_name,
106+
// All files with stable contents
107+
stable_path,
108+
// get all messages containing Requested sample_size (120)
109+
filterNextflowOutput(workflow.stdout + workflow.stderr, ignore: ['Downloading plugin'], include:['Requested sample_size (120)'])
110+
).match() }
111+
)
112+
}
113+
}
41114
}

0 commit comments

Comments
 (0)