Skip to content

Commit bbd6423

Browse files
pinin4fjordsclaude
andauthored
feat(rpbp/estimateorfbayesfactors): add module (#11960)
* feat(rpbp/estimateorfbayesfactors): add module [skip ci] * ci: trigger tests * test(rpbp/estimateorfbayesfactors): fix test-data path and assert stable output Point inputs at params.modules_testdata_base_path (nf-core/test-datasets, modules branch) instead of a deleted personal-fork branch that 404s and fails CI on every profile. Replace the filename-only assertion with one that snapshots the row count, header, and an md5 of the columns that are byte-identical across the conda and container toolchains (BED structure, ORF identity, chi_square_p, the profile count sums). The six MCMC estimate columns (p_translated_*, p_background_*, bayes_factor_*) differ between toolchains and are excluded, so this guards against empty/truncated/structurally-wrong output while staying green on every CI profile. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(rpbp/estimateorfbayesfactors): note why only stable columns are snapshotted Explain in the test that the MCMC estimate columns are excluded because their floats differ between the conda and container toolchains, so a future reader does not replace the assertion with a full content snapshot and break the conda CI shard. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(tests): add sanitizeOutput to test snapshots [skip ci] Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test(rpbp/estimateorfbayesfactors): regenerate stub snapshot for sanitizeOutput The stub snapshot still carried the numeric-index channel keys; the test asserts sanitizeOutput(process.out), which omits them. Regenerated so the two agree. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(rpbp/estimateorfbayesfactors): move descriptor into default prefix per nf-core convention Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent ad585e7 commit bbd6423

5 files changed

Lines changed: 266 additions & 0 deletions

File tree

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
channels:
2+
- conda-forge
3+
- bioconda
4+
dependencies:
5+
- bioconda::rpbp=4.0.1
Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
process RPBP_ESTIMATEORFBAYESFACTORS {
2+
tag "$meta.id"
3+
label 'process_high'
4+
5+
conda "${moduleDir}/environment.yml"
6+
container "${ workflow.containerEngine in ['singularity', 'apptainer'] && !task.ext.singularity_pull_docker_container ?
7+
'https://community-cr-prod.seqera.io/docker/registry/v2/blobs/sha256/14/146c3f15abf184a5ec13531d2a040ba7b9235c1091723aa37c7a119817411367/data' :
8+
'community.wave.seqera.io/library/rpbp:4.0.1--71297b462026e13b' }"
9+
10+
input:
11+
tuple val(meta), path(profiles)
12+
tuple val(meta2), path(orfs_genomic_bed)
13+
14+
output:
15+
tuple val(meta), path("${prefix}.bed.gz"), emit: bayes_factors
16+
tuple val("${task.process}"), val('rpbp'), eval('python -c "import rpbp; print(rpbp.__version__)"'), emit: versions_rpbp, topic: versions
17+
18+
when:
19+
task.ext.when == null || task.ext.when
20+
21+
script:
22+
def args = task.ext.args ?: ''
23+
prefix = task.ext.prefix ?: "${meta.id}.bayes-factors"
24+
"""
25+
RPBP_MODELS_BASE=\$(python3 -c "import os, inspect, rpbp; print(os.path.join(os.path.dirname(inspect.getfile(rpbp)), 'models'))")
26+
TRANSLATED=\$(ls "\$RPBP_MODELS_BASE"/translated/*.stan)
27+
UNTRANSLATED=\$(ls "\$RPBP_MODELS_BASE"/untranslated/*.stan)
28+
29+
estimate-orf-bayes-factors \\
30+
${profiles} \\
31+
${orfs_genomic_bed} \\
32+
${prefix}.bed.gz \\
33+
--translated-models \$TRANSLATED \\
34+
--untranslated-models \$UNTRANSLATED \\
35+
--num-cpus ${task.cpus} \\
36+
${args}
37+
"""
38+
39+
stub:
40+
prefix = task.ext.prefix ?: "${meta.id}.bayes-factors"
41+
"""
42+
echo "" | gzip > ${prefix}.bed.gz
43+
"""
44+
}
Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
name: "rpbp_estimateorfbayesfactors"
2+
description: |
3+
Score every candidate ORF for evidence of active translation. For
4+
each ORF, Rp-Bp fits two competing Bayesian models to its per-codon
5+
P-site count vector: a "translated" model that expects P-site density
6+
to concentrate at codon-start positions (the in-frame signal a
7+
translating ribosome produces), and an "untranslated" / noise model
8+
for the same data. The Bayes factor (ratio of marginal likelihoods)
9+
quantifies how much the data favour the translated hypothesis.
10+
11+
Emits a BED-style table with one row per ORF carrying genomic
12+
coordinates plus the mean and variance of the log Bayes factor across
13+
MCMC samples. Downstream, `rpbp/selectfinalpredictionset` applies
14+
Bayes-factor, length and overlap rules to this table to produce the
15+
final filtered prediction set.
16+
17+
Uses the Stan models bundled inside the rpbp Python package.
18+
keywords:
19+
- rpbp
20+
- orf
21+
- bayes
22+
- translation
23+
- riboseq
24+
tools:
25+
- "rpbp":
26+
description: "Rp-Bp - Bayesian inference of ribosome profiling data for identifying translated open reading frames"
27+
homepage: "https://github.com/dieterich-lab/rp-bp"
28+
documentation: "https://rp-bp.readthedocs.io"
29+
tool_dev_url: "https://github.com/dieterich-lab/rp-bp"
30+
doi: "10.1093/nar/gkw1350"
31+
licence:
32+
- "MIT"
33+
identifier: ""
34+
input:
35+
- - meta:
36+
type: map
37+
description: |
38+
Groovy Map containing sample information, e.g. `[ id:'sample1' ]`.
39+
- profiles:
40+
type: file
41+
description: Per-ORF P-site profile matrix from `rpbp/extractorfprofiles`.
42+
pattern: "*.profiles.mtx.gz"
43+
ontologies: []
44+
- - meta2:
45+
type: map
46+
description: |
47+
Groovy Map identifying the reference (e.g. `[ id:'reference' ]`).
48+
- orfs_genomic_bed:
49+
type: file
50+
description: Per-ORF genomic BED from `rpbp/preparegenome`.
51+
pattern: "*.orfs-genomic.annotated.bed.gz"
52+
ontologies: []
53+
output:
54+
bayes_factors:
55+
- - meta:
56+
type: map
57+
description: Groovy Map inherited from input meta.
58+
- "${prefix}.bed.gz":
59+
type: file
60+
description: Per-ORF translation Bayes factors (BED).
61+
pattern: "*.bed.gz"
62+
ontologies: []
63+
versions_rpbp:
64+
- - ${task.process}:
65+
type: string
66+
description: The name of the process
67+
- rpbp:
68+
type: string
69+
description: The name of the tool
70+
- python -c "import rpbp; print(rpbp.__version__)":
71+
type: eval
72+
description: The expression to obtain the version of the tool
73+
topics:
74+
versions:
75+
- - ${task.process}:
76+
type: string
77+
description: The name of the process
78+
- rpbp:
79+
type: string
80+
description: The name of the tool
81+
- python -c "import rpbp; print(rpbp.__version__)":
82+
type: eval
83+
description: The expression to obtain the version of the tool
84+
authors:
85+
- "@pinin4fjords"
86+
maintainers:
87+
- "@pinin4fjords"
Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
nextflow_process {
2+
3+
name "Test Process RPBP_ESTIMATEORFBAYESFACTORS"
4+
script "../main.nf"
5+
process "RPBP_ESTIMATEORFBAYESFACTORS"
6+
7+
tag "modules"
8+
tag "modules_nfcore"
9+
tag "rpbp"
10+
tag "rpbp/estimateorfbayesfactors"
11+
12+
test("homo_sapiens chr20 - estimate orf bayes factors") {
13+
14+
when {
15+
process {
16+
"""
17+
input[0] = Channel.of([
18+
[ id:'test', single_end:true, strandedness:'forward' ],
19+
file(params.modules_testdata_base_path + "genomics/homo_sapiens/riboseq_expression/rpbp/SRX11780888_chr20.profiles.mtx.gz", checkIfExists: true)
20+
])
21+
input[1] = [
22+
[ id:'reference' ],
23+
file(params.modules_testdata_base_path + "genomics/homo_sapiens/riboseq_expression/rpbp/reference.orfs-genomic.annotated.bed.gz", checkIfExists: true)
24+
]
25+
"""
26+
}
27+
}
28+
29+
then {
30+
def bayes_factors = file(process.out.bayes_factors[0][1])
31+
def rows = path(process.out.bayes_factors[0][1]).linesGzip
32+
def header = rows[0].split("\t")
33+
// These columns are MCMC estimates whose floats differ between the conda and
34+
// container toolchains; exclude them and snapshot only the deterministic columns.
35+
def mcmc_cols = ["#p_translated_mean", "#p_translated_var", "#p_background_mean", "#p_background_var", "#bayes_factor_mean", "#bayes_factor_var"]
36+
def stable_idx = (0..<header.size()).findAll { !(header[it] in mcmc_cols) }
37+
def stable = rows.collect { line -> def fields = line.split("\t"); stable_idx.collect { fields[it] }.join("\t") }.join("\n")
38+
def stable_md5 = java.security.MessageDigest.getInstance("MD5").digest(stable.bytes).encodeHex().toString()
39+
assertAll(
40+
{ assert process.success },
41+
{ assert snapshot(
42+
bayes_factors.name,
43+
rows.size(),
44+
rows[0],
45+
stable_md5,
46+
process.out.findAll { key, val -> key.startsWith('versions') }
47+
).match() }
48+
)
49+
}
50+
}
51+
52+
test("homo_sapiens chr20 - estimate orf bayes factors - stub") {
53+
54+
options '-stub'
55+
56+
when {
57+
process {
58+
"""
59+
input[0] = Channel.of([
60+
[ id:'test', single_end:true, strandedness:'forward' ],
61+
file(params.modules_testdata_base_path + "genomics/homo_sapiens/riboseq_expression/rpbp/SRX11780888_chr20.profiles.mtx.gz", checkIfExists: true)
62+
])
63+
input[1] = [
64+
[ id:'reference' ],
65+
file(params.modules_testdata_base_path + "genomics/homo_sapiens/riboseq_expression/rpbp/reference.orfs-genomic.annotated.bed.gz", checkIfExists: true)
66+
]
67+
"""
68+
}
69+
}
70+
71+
then {
72+
assertAll(
73+
{ assert process.success },
74+
{ assert snapshot(sanitizeOutput(process.out)).match() }
75+
)
76+
}
77+
}
78+
}
Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
{
2+
"homo_sapiens chr20 - estimate orf bayes factors": {
3+
"content": [
4+
"test.bayes-factors.bed.gz",
5+
8219,
6+
"#seqname\t#start\t#end\t#id\t#score\t#strand\t#thick_start\t#thick_end\t#color\t#num_exons\t#exon_lengths\t#exon_genomic_relative_starts\t#orf_num\t#orf_len\t#p_translated_mean\t#p_translated_var\t#p_background_mean\t#p_background_var\t#bayes_factor_mean\t#bayes_factor_var\t#chi_square_p\t#x_1_sum\t#x_2_sum\t#x_3_sum\t#profile_sum",
7+
"1a17faf5a5771d0f306b3c9591d7f9b1",
8+
{
9+
"versions_rpbp": [
10+
[
11+
"RPBP_ESTIMATEORFBAYESFACTORS",
12+
"rpbp",
13+
"4.0.1"
14+
]
15+
]
16+
}
17+
],
18+
"timestamp": "2026-06-10T17:00:18.24957747",
19+
"meta": {
20+
"nf-test": "0.9.5",
21+
"nextflow": "26.04.3"
22+
}
23+
},
24+
"homo_sapiens chr20 - estimate orf bayes factors - stub": {
25+
"content": [
26+
{
27+
"bayes_factors": [
28+
[
29+
{
30+
"id": "test",
31+
"single_end": true,
32+
"strandedness": "forward"
33+
},
34+
"test.bayes-factors.bed.gz:md5,68b329da9893e34099c7d8ad5cb9c940"
35+
]
36+
],
37+
"versions_rpbp": [
38+
[
39+
"RPBP_ESTIMATEORFBAYESFACTORS",
40+
"rpbp",
41+
"4.0.1"
42+
]
43+
]
44+
}
45+
],
46+
"timestamp": "2026-06-11T11:51:10.205143393",
47+
"meta": {
48+
"nf-test": "0.9.5",
49+
"nextflow": "26.04.3"
50+
}
51+
}
52+
}

0 commit comments

Comments
 (0)