Skip to content

Commit 277d34e

Browse files
pinin4fjordsclaude
andauthored
Add dotseq/dotseq (#11742)
* Add dotseq/dotseq module DOTSeq is a Bioconductor package for detecting differential ORF usage (DOU) and ORF-level differential translation efficiency (DTE) from Ribo-seq with matched RNA-seq. Module wraps DOTSeqDataSetsFromFeatureCounts + DOTSeq() + getContrasts() and emits per-ORF TSVs for the DOU and DTE interaction contrasts plus the serialised DOTSeqDataSets object. Pre-requisites (in flight): - Bioconda recipe: bioconda/bioconda-recipes#65677 - Test data: nf-core/test-datasets#2072 * Use Wave community container for bioconductor-dotseq 1.0.0 Bioconda recipe (bioconda/bioconda-recipes#65677) merged; biocontainer image is not yet built so swap the placeholder quay.io/depot URLs for a Wave community container built from the now-merged bioconda package. Also widen the singularity guard to include 'apptainer' and add the versions topic block in meta.yml (via nf-core modules lint --fix). * Add native plotDOT outputs, simplify template, tidyverse syntax - Restructure the R template around optparse + readr + dplyr + purrr + ggplot2; drop the homemade parse_args / read_delim_flexible helpers in favour of the standard package idioms and native pipe. - Output set is now what DOTSeq itself emits natively: per-ORF DTE contrasts (translation.dotseq.results.tsv), DOU contrasts (dou.dotseq.results.tsv), optional dou_strategy / dte_strategy per-condition Ribo-vs-RNA contrasts, plus the four plotDOT() PNGs (volcano / composite / venn / heatmap) and a DTE p-value distribution histogram drawn directly from DOTSeq's padj column. - Container picks up r-eulerr + r-ggsignif (required for plotDOT venn) and explicit r-ggplot2 so the histogram has a stable ggplot version. - plotDOT() default of force_new_device=TRUE was killing our png() device on each call; pass FALSE so the PNGs land where Nextflow expects them. * Simplify R template helpers, add heatmap sorf_type fallback - Drop the homemade read_delim_flexible() and write_results_tsv() wrappers in favour of read_tsv() / read_csv() / write_tsv() directly. The earlier to_orf_tibble() conditional is also gone now that we know getContrasts() always returns a frame with orf_id as a column (per the DOTSeq source in posthoc.R + main.R). - plotDOT(heatmap) requires gene-paired mORF + sorf entries; try uORF first (the package default) and fall back to dORF when no significant gene has both. tryCatch in safe_plot_dot still makes either a no-op when neither succeeds. * Address code-review feedback: stub block, validation hardening, plot fallback robustness - Add stub: block to main.nf matching the proteus/readproteingroups precedent. - Read sample sheet with read_delim() picking comma/tab from the file extension so the meta.yml-advertised TSV variant actually works. - Refuse to clobber an existing canonical column (e.g. an existing 'condition' column when --contrast_variable=treatment is supplied). - Dedupe multi-lane sample sheets and validate that both Ribo and RNA strategies are present (DOTSeq's interaction design is unestimable otherwise). - Add an is_set() predicate that catches NULL / empty stringent + required options before the tri-state switch silently returns NULL. - safe_plot_dot now unlinks the partially-written PNG on plotDOT error and returns success so the heatmap fallback (uORF then dORF) keys off whether the first call actually drew, not file.exists() of a stale handle. - getContrasts(type='interaction') errors propagate (headline outputs); type='strategy' stays tryCatch'd because absence is legitimate. - Cache getDOU(d) / getDTE(d) once and share across contrasts + plotDOT. - Drop redundant file.exists() walk - Nextflow's path staging already guarantees the inputs exist. - Expand the test to assert volcano / composite / venn plot emission and add a -stub test. * TEMPORARY: point test at the pending test-datasets PR fork branch Lets CI verify the module is actually green; revert this commit once nf-core/test-datasets#2072 merges and the canonical modules-branch URL resolves. * refactor(dotseq/dotseq): take a count-matrix shape for consumer parity Aligns the module's input contract with deltate / anota2seq so that consumers can dispatch between the three ORF-DTE methods without maintaining a separate prep step for dotseq. The four featureCounts/GTF/BED inputs collapse to a per-ORF count matrix (orf_id + sample columns) plus a per-ORF annotation TSV (orf_id + gene_id + optional orf_type/coords). The R template now calls DOTSeqDataSetsFromSummarizeOverlaps() and builds the required GRanges in-process from the annotation TSV; the model fit, contrast tables, and plotDOT outputs are unchanged. Test fixtures updated alongside in nf-core/test-datasets#2072 (commit 8c9b27c). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(dotseq/dotseq): synthesize `replicate` column when absent DOTSeq's parse_condition_table() requires a `replicate` column for stable ordering of samples within strategy+condition. Pipeline samplesheets often have a `pair` column (or none at all), so the R template now treats the column as optional: when present it is renamed to `replicate` as before; when absent the template assigns a per-(strategy, condition) row counter so the model fit is unaffected. This matches how anota2seq/deltate consume the same samplesheets. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(dotseq/dotseq): support running a single module Dropping a module from --modules left DOTSeq()'s skipped slot unfitted (a bare DESeqDataSet for DTE), and getContrasts() has no method for it, so a DOU-only run crashed when extracting the DTE interaction table. Gate interaction and strategy contrast extraction on the selected modules, and write each module's interaction table only when that module ran. Mark the translation and dou outputs optional to match, and add a DOU-only regression test. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent bbd6423 commit 277d34e

8 files changed

Lines changed: 913 additions & 0 deletions

File tree

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
---
2+
# yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/modules/environment-schema.json
3+
channels:
4+
- conda-forge
5+
- bioconda
6+
dependencies:
7+
- bioconda::bioconductor-dotseq=1.0.0
8+
- conda-forge::r-dplyr=1.2.1
9+
- conda-forge::r-eulerr=7.1.0
10+
- conda-forge::r-ggplot2=4.0.3
11+
- conda-forge::r-ggrepel=0.9.8
12+
- conda-forge::r-ggsignif=0.6.4
13+
- conda-forge::r-optparse=1.8.2
14+
- conda-forge::r-purrr=1.2.2
15+
- conda-forge::r-readr=2.2.0
16+
- conda-forge::r-tibble=3.3.1
17+
- conda-forge::r-tidyr=1.3.2
Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
process DOTSEQ_DOTSEQ {
2+
tag "$meta.id"
3+
label 'process_medium'
4+
5+
conda "${moduleDir}/environment.yml"
6+
container "${ workflow.containerEngine in ['singularity', 'apptainer'] && !task.ext.singularity_pull_docker_container ?
7+
'https://community-cr-prod.seqera.io/docker/registry/v2/blobs/sha256/12/12667d472e9ae0f1602041dc018ba6bde294e6190e67999d71b65e7a2df7ea1f/data' :
8+
'community.wave.seqera.io/library/bioconductor-dotseq_r-dplyr_r-eulerr_r-ggplot2_pruned:6c8a9ebdec36c958' }"
9+
10+
input:
11+
tuple val(meta), val(contrast_variable), val(reference), val(target)
12+
tuple val(meta2), path(samplesheet), path(counts), path(annotation)
13+
14+
output:
15+
tuple val(meta), path("*.translation.dotseq.results.tsv") , emit: translation , optional: true
16+
tuple val(meta), path("*.dou.dotseq.results.tsv") , emit: dou , optional: true
17+
tuple val(meta), path("*.dou_strategy.dotseq.results.tsv") , emit: dou_strategy , optional: true
18+
tuple val(meta), path("*.dte_strategy.dotseq.results.tsv") , emit: dte_strategy , optional: true
19+
tuple val(meta), path("*.volcano.png") , emit: volcano_plot , optional: true
20+
tuple val(meta), path("*.composite.png") , emit: composite_plot, optional: true
21+
tuple val(meta), path("*.venn.png") , emit: venn_plot , optional: true
22+
tuple val(meta), path("*.heatmap.png") , emit: heatmap_plot , optional: true
23+
tuple val(meta), path("*.interaction_p_distribution.png") , emit: interaction_p_distribution_plot, optional: true
24+
tuple val(meta), path("*.DOTSeqDataSets.rds") , emit: rdata
25+
tuple val(meta), path("*.R_sessionInfo.log") , emit: session_info
26+
path "versions.yml" , emit: versions, topic: versions
27+
28+
when:
29+
task.ext.when == null || task.ext.when
30+
31+
script:
32+
template 'dotseq.R'
33+
34+
stub:
35+
def prefix = task.ext.prefix ?: "${meta.id}"
36+
"""
37+
touch ${prefix}.translation.dotseq.results.tsv
38+
touch ${prefix}.dou.dotseq.results.tsv
39+
touch ${prefix}.dou_strategy.dotseq.results.tsv
40+
touch ${prefix}.dte_strategy.dotseq.results.tsv
41+
touch ${prefix}.volcano.png
42+
touch ${prefix}.composite.png
43+
touch ${prefix}.venn.png
44+
touch ${prefix}.heatmap.png
45+
touch ${prefix}.interaction_p_distribution.png
46+
touch ${prefix}.DOTSeqDataSets.rds
47+
touch ${prefix}.R_sessionInfo.log
48+
49+
cat <<-END_VERSIONS > versions.yml
50+
"${task.process}":
51+
bioconductor-dotseq: \$(Rscript -e "cat(as.character(packageVersion('DOTSeq')))")
52+
r-optparse: \$(Rscript -e "cat(as.character(packageVersion('optparse')))")
53+
r-readr: \$(Rscript -e "cat(as.character(packageVersion('readr')))")
54+
r-dplyr: \$(Rscript -e "cat(as.character(packageVersion('dplyr')))")
55+
END_VERSIONS
56+
"""
57+
}
Lines changed: 209 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,209 @@
1+
# yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/modules/meta-schema.json
2+
name: "dotseq_dotseq"
3+
description: |
4+
Detect differential ORF usage (DOU) and ORF-level differential
5+
translation efficiency (DTE) from Ribo-seq with matched RNA-seq using
6+
DOTSeq. Wraps DOTSeqDataSetsFromSummarizeOverlaps() + DOTSeq() +
7+
getContrasts() and emits the package's native contrast tables plus
8+
plotDOT() visualisations.
9+
keywords:
10+
- riboseq
11+
- rnaseq
12+
- translation
13+
- differential
14+
- orf
15+
tools:
16+
- "dotseq":
17+
description: "Differential ORF Translation analysis for Ribo-seq with matched RNA-seq"
18+
homepage: "https://bioconductor.org/packages/release/bioc/html/DOTSeq.html"
19+
documentation: "https://bioconductor.org/packages/release/bioc/vignettes/DOTSeq/inst/doc/DOTSeq.html"
20+
tool_dev_url: "https://github.com/compgenom/DOTSeq"
21+
licence:
22+
- "MIT"
23+
identifier: ""
24+
25+
input:
26+
- - meta:
27+
type: map
28+
description: Groovy Map containing contrast information. e.g. [ id:'treatment_vs_control' ]
29+
- contrast_variable:
30+
type: string
31+
description: Sample-sheet column that holds the experimental condition (mapped to DOTSeq's `condition` internally).
32+
- reference:
33+
type: string
34+
description: Value of the contrast_variable to use as reference (baseline).
35+
- target:
36+
type: string
37+
description: Value of the contrast_variable to use as target (non-reference).
38+
- - meta2:
39+
type: map
40+
description: Groovy map containing study-wide metadata
41+
- samplesheet:
42+
type: file
43+
description: |
44+
CSV or TSV sample sheet with `run`, `strategy`, `replicate`, and
45+
`condition` columns (defaults; can be overridden via
46+
task.ext.args). Both Ribo-seq and RNA-seq samples are required:
47+
DOTSeq's design is `~ condition * strategy` and the interaction
48+
term is unestimable without both strategies.
49+
ontologies:
50+
- edam: "http://edamontology.org/format_3752"
51+
- edam: "http://edamontology.org/format_3475"
52+
- counts:
53+
type: file
54+
description: |
55+
Per-ORF count matrix. First column is the ORF identifier (default
56+
`orf_id`, override via `--orf_id_col`); remaining columns are
57+
sample IDs that must match the `run` values in the sample sheet.
58+
Both Ribo-seq and RNA-seq sample columns belong in this single
59+
matrix; the sample sheet's `strategy` column distinguishes them.
60+
ontologies:
61+
- edam: "http://edamontology.org/format_3475"
62+
- annotation:
63+
type: file
64+
description: |
65+
Per-ORF annotation table (one row per ORF). Required columns:
66+
`orf_id` (matches the count matrix) and `gene_id` (parent gene;
67+
DOTSeq's DOU model groups child ORFs by gene). Optional columns:
68+
`orf_type` (mORF / uORF / dORF; used by plotDOT()'s heatmap and
69+
defaults to mORF when absent) and `chrom`, `start`, `end`,
70+
`strand` (used only for downstream inspection - dummy ranges
71+
are generated when absent because DOTSeq's fit does not depend
72+
on genomic coordinates). Column names can be overridden via
73+
task.ext.args (`--gene_id_col`, `--orf_type_col`, etc.).
74+
ontologies:
75+
- edam: "http://edamontology.org/format_3475"
76+
77+
output:
78+
translation:
79+
- - meta:
80+
type: map
81+
description: Groovy Map containing contrast information. e.g. [ id:'treatment_vs_control' ]
82+
- "*.translation.dotseq.results.tsv":
83+
type: file
84+
description: |
85+
Per-ORF differential translation efficiency: DOTSeq's DTE
86+
interaction-term results (DESeq2 + ashr shrinkage). Emitted only
87+
when the DTE module is selected (the default).
88+
pattern: ".translation.dotseq.results.tsv"
89+
ontologies:
90+
- edam: "http://edamontology.org/format_3475"
91+
dou:
92+
- - meta:
93+
type: map
94+
description: Groovy Map containing contrast information. e.g. [ id:'treatment_vs_control' ]
95+
- "*.dou.dotseq.results.tsv":
96+
type: file
97+
description: |
98+
DOTSeq Differential ORF Usage results (beta-binomial GLM
99+
modelling Ribo / RNA proportion changes within each gene,
100+
shrunk with ashr). DOTSeq-unique. Emitted only when the DOU
101+
module is selected (the default).
102+
pattern: ".dou.dotseq.results.tsv"
103+
ontologies:
104+
- edam: "http://edamontology.org/format_3475"
105+
dou_strategy:
106+
- - meta:
107+
type: map
108+
description: Groovy Map containing contrast information. e.g. [ id:'treatment_vs_control' ]
109+
- "*.dou_strategy.dotseq.results.tsv":
110+
type: file
111+
description: DOU strategy contrasts (Ribo vs RNA effect per condition), when present.
112+
pattern: ".dou_strategy.dotseq.results.tsv"
113+
ontologies:
114+
- edam: "http://edamontology.org/format_3475"
115+
dte_strategy:
116+
- - meta:
117+
type: map
118+
description: Groovy Map containing contrast information. e.g. [ id:'treatment_vs_control' ]
119+
- "*.dte_strategy.dotseq.results.tsv":
120+
type: file
121+
description: DTE strategy contrasts (Ribo vs RNA effect per condition), when present.
122+
pattern: ".dte_strategy.dotseq.results.tsv"
123+
ontologies:
124+
- edam: "http://edamontology.org/format_3475"
125+
volcano_plot:
126+
- - meta:
127+
type: map
128+
description: Groovy Map containing contrast information. e.g. [ id:'treatment_vs_control' ]
129+
- "*.volcano.png":
130+
type: file
131+
description: DOTSeq plotDOT() volcano (DOU + DTE significance).
132+
pattern: ".volcano.png"
133+
ontologies:
134+
- edam: "http://edamontology.org/format_3603"
135+
composite_plot:
136+
- - meta:
137+
type: map
138+
description: Groovy Map containing contrast information. e.g. [ id:'treatment_vs_control' ]
139+
- "*.composite.png":
140+
type: file
141+
description: DOTSeq plotDOT() composite scatter (DOU vs DTE effect sizes).
142+
pattern: ".composite.png"
143+
ontologies:
144+
- edam: "http://edamontology.org/format_3603"
145+
venn_plot:
146+
- - meta:
147+
type: map
148+
description: Groovy Map containing contrast information. e.g. [ id:'treatment_vs_control' ]
149+
- "*.venn.png":
150+
type: file
151+
description: DOTSeq plotDOT() Venn diagram of DOU vs DTE significant ORFs.
152+
pattern: ".venn.png"
153+
ontologies:
154+
- edam: "http://edamontology.org/format_3603"
155+
heatmap_plot:
156+
- - meta:
157+
type: map
158+
description: Groovy Map containing contrast information. e.g. [ id:'treatment_vs_control' ]
159+
- "*.heatmap.png":
160+
type: file
161+
description: DOTSeq plotDOT() heatmap of DOU across top genes.
162+
pattern: ".heatmap.png"
163+
ontologies:
164+
- edam: "http://edamontology.org/format_3603"
165+
interaction_p_distribution_plot:
166+
- - meta:
167+
type: map
168+
description: Groovy Map containing contrast information. e.g. [ id:'treatment_vs_control' ]
169+
- "*.interaction_p_distribution.png":
170+
type: file
171+
description: Histogram of DOTSeq's DTE adjusted p-values.
172+
pattern: ".interaction_p_distribution.png"
173+
ontologies:
174+
- edam: "http://edamontology.org/format_3603"
175+
rdata:
176+
- - meta:
177+
type: map
178+
description: Groovy Map containing contrast information. e.g. [ id:'treatment_vs_control' ]
179+
- "*.DOTSeqDataSets.rds":
180+
type: file
181+
description: Serialised DOTSeqDataSets object containing DOU + DTE fits
182+
pattern: ".DOTSeqDataSets.rds"
183+
ontologies: []
184+
session_info:
185+
- - meta:
186+
type: map
187+
description: Groovy Map containing contrast information. e.g. [ id:'treatment_vs_control' ]
188+
- "*.R_sessionInfo.log":
189+
type: file
190+
description: dump of R sessionInfo()
191+
pattern: "*.log"
192+
ontologies:
193+
- edam: "http://edamontology.org/data_1678"
194+
versions:
195+
- versions.yml:
196+
type: file
197+
description: File containing software versions
198+
pattern: "versions.yml"
199+
ontologies:
200+
- edam: "http://edamontology.org/format_3750"
201+
topics:
202+
versions:
203+
- versions.yml:
204+
type: string
205+
description: The name of the process
206+
authors:
207+
- "@pinin4fjords"
208+
maintainers:
209+
- "@pinin4fjords"

0 commit comments

Comments
 (0)