Skip to content

Commit 6386634

Browse files
committed
citations, output.md and reference bibliography texts updated
1 parent 85de710 commit 6386634

6 files changed

Lines changed: 73 additions & 24 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ Initial release of nf-core/proteinannotator, created with the [nf-core](https://
1414
- [#52](https://github.com/nf-core/proteinannotator/pull/52) - Add option to turn off InterProScan for testing
1515
- [#51](https://github.com/nf-core/proteinannotator/pull/51) - Update to nf-core/tools v3.3.1
1616
- [#47](https://github.com/nf-core/proteinannotator/pull/47) - Update metromap with more tools added from [May 2025 Hackathon](https://nf-co.re/events/2025/hackathon-boston)
17-
- [#43](https://github.com/nf-core/proteinannotator/pull/44) - Add [mTM-Align](https://nf-co.re/modules/mtmalign_align/) and [MMseqs2 Search](https://nf-co.re/modules/mmseqs_search/) modules
17+
<!-- - [#43](https://github.com/nf-core/proteinannotator/pull/44) - Add [mTM-Align](https://nf-co.re/modules/mtmalign_align/) and [MMseqs2 Search](https://nf-co.re/modules/mmseqs_search/) modules -->
1818
- [#42](https://github.com/nf-core/proteinannotator/pull/42) - Updated to `nf-test` on GitHub Actions and in the `PULL_REQUEST_TEMPLATE.md`
1919
- [#13](https://github.com/nf-core/proteinannotator/pull/13) - Add nf-core seqkit/stats module
2020
- [#9](https://github.com/nf-core/proteinannotator/pull/9) - Add [InterProScan](https://interproscan-docs.readthedocs.io/) module

CITATIONS.md

Lines changed: 12 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -10,21 +10,29 @@
1010
1111
## Pipeline tools
1212

13-
- [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/)
13+
- [SeqFu](https://pubmed.ncbi.nlm.nih.gov/34066939/)
1414

15-
> Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924.
15+
> Telatin A, Fariselli P, Birolo G. SeqFu: a suite of utilities for the robust and reproducible manipulation of sequence files. Bioengineering. 2021 May 7;8(5):59. doi: 10.3390/bioengineering8050059. PubMed PMID: 34066939; PubMed Central PMCID: PMC8148589.
16+
17+
- [SeqKit](https://pubmed.ncbi.nlm.nih.gov/38898985/)
18+
19+
> Shen W, Sipos B, Zhao L. SeqKit2: A Swiss army knife for sequence and alignment processing. iMeta. 2024 Apr 5:e191. doi: 10.1002/imt2.191. PubMed PMID: 38898985; PubMed Central PMCID: PMC11183193.
1620
1721
- [InterProScan](https://academic.oup.com/bioinformatics/article/17/9/847/206564)
1822

1923
> Zdobnov, Evgeni M., and Rolf Apweiler. “InterProScan – an Integration Platform for the Signature-Recognition Methods in InterPro.” Bioinformatics 17, no. 9 (September 1, 2001): 847–48. https://doi.org/10.1093/bioinformatics/17.9.847.
2024
21-
- [MMseqs2](https://www.nature.com/articles/nbt.3988)
25+
<!-- - [MMseqs2](https://www.nature.com/articles/nbt.3988)
2226
2327
> Zdobnov, Evgeni M., and Rolf Apweiler. “InterProScan – an Integration Platform for the Signature-Recognition Methods in InterPro.” Bioinformatics 17, no. 9 (September 1, 2001): 847–48. https://doi.org/10.1093/bioinformatics/17.9.847.
2428
2529
- [mTM-align](https://academic.oup.com/bioinformatics/article/34/10/1719/4769500)
2630
27-
> Dong, Runze, Zhenling Peng, Yang Zhang, and Jianyi Yang. “mTM-Align: An Algorithm for Fast and Accurate Multiple Protein Structure Alignment.” Bioinformatics 34, no. 10 (May 15, 2018): 1719–25. https://doi.org/10.1093/bioinformatics/btx828.
31+
> Dong, Runze, Zhenling Peng, Yang Zhang, and Jianyi Yang. “mTM-Align: An Algorithm for Fast and Accurate Multiple Protein Structure Alignment.” Bioinformatics 34, no. 10 (May 15, 2018): 1719–25. https://doi.org/10.1093/bioinformatics/btx828. -->
32+
33+
- [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/)
34+
35+
> Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924.
2836
2937
## Software packaging/containerisation tools
3038

assets/methods_description_template.yml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,6 @@ description: "Suggested text and references to use when describing pipeline usag
33
section_name: "nf-core/proteinannotator Methods Description"
44
section_href: "https://github.com/nf-core/proteinannotator"
55
plot_type: "html"
6-
## TODO nf-core: Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline
76
## You inject any metadata in the Nextflow '${workflow}' object
87
data: |
98
<h4>Methods</h4>

docs/output.md

Lines changed: 40 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,12 +12,51 @@ The directories listed below will be created in the results directory after the
1212

1313
The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes data using the following steps:
1414

15+
- [Quality check and preprocessing](#quality-check-and-preprocessing)
16+
- [SeqFu](#seqfu) for input amino acid sequences quality check (QC)
17+
- [SeqKit](#seqkit) for preprocessing input amino acid sequences (i.e., gap removal, convert to upper case, validate, filter by length, replace special characters such as `/`, and remove duplicate sequences)
18+
1519
- [Functional Annotation](#functional-annotation) Annotate proteins with functional domains
1620
- [InterProScan](#Interproscan) - Search the InterPro database for functional domains
1721
- [MultiQC](#multiqc) - Aggregate report describing results and QC from the whole pipeline
18-
- [SeqKit stats](#seqkit_stats) - Simple statistics for protein FASTA files
1922
- [Pipeline information](#pipeline-information) - Report metrics generated during the workflow execution
2023

24+
### Quality check and preprocessing
25+
26+
#### SeqFu
27+
28+
<details markdown="1">
29+
<summary>Output files</summary>
30+
31+
- `qc/`
32+
- `<samplename>/`
33+
- `<samplename>_before.tsv`: Statistics for the input amino acid sequences before preprocessing
34+
- `<samplename>_before_mqc.txt`: Statistics for the input amino acid sequences in MultiQC-ready format before preprocessing
35+
- `<samplename>_after.tsv`: (optional) Statistics for the input amino acid sequences after preprocessing
36+
- `<samplename>_after_mqc.txt`: (optional) Statistics for the input amino acid sequences in MultiQC-ready format after preprocessing
37+
- `<samplename>.log`: (optional) Output file with count of duplicate sequences that were found and removed
38+
39+
</details>
40+
41+
The `seqfu` module is used for statistics generation of input amino acid sequences, both before and after preprocessing.
42+
43+
[SeqFu](https://github.com/telatin/seqfu2) is a cross-platform compiled suite of tools to manipulate and inspect `FASTA` and `FASTQ` files.
44+
45+
#### SeqKit
46+
47+
<details markdown="1">
48+
<summary>Output files</summary>
49+
50+
- `qc/`
51+
- `<samplename>/`
52+
- `<samplename>.<suffix>`: Updated preprocessed input fasta file
53+
54+
</details>
55+
56+
The `seqkit` module is used for initial preprocessing (i.e., gap removal, convert to upper case, validate, filter by length, replace special characters such as `/`, and remove duplicate sequences) of the input amino acid sequences.
57+
58+
[SeqKit](https://github.com/shenwei356/seqkit) is a cross-platform and ultrafast toolkit for FASTA/Q file manipulation.
59+
2160
### Functional Annotation
2261

2362
#### InterProScan

subworkflows/local/utils_nfcore_proteinannotator_pipeline/main.nf

Lines changed: 19 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -174,30 +174,33 @@ def validateInputSamplesheet(input) {
174174
// Generate methods description for MultiQC
175175
//
176176
def toolCitationText() {
177+
178+
def quality_check_text = [
179+
"Amino acid sequence statistics were generated with SeqFu (Telatin et al. 2021).",
180+
params.skip_preprocessing ? "" : "Input sequences were preprocessed with SeqKit (gap trimming, length filtering, validation, duplicate removal) (Shen et al. 2024)."
181+
].join(' ').trim()
182+
183+
def postprocessing_text = "Run statistics were reported using MultiQC (Ewels et al. 2016)."
184+
177185
def citation_text = [
178-
"Tools used in the workflow included:",
179-
"Nextflow (Di Tommaso et al. 2017),",
180-
"nf-core (Ewels et al. 2020),",
181-
"Bioconda (Grüning et al. 2018),",
182-
"BioContainers (da Veiga Leprevost et al. 2017),",
183-
"MultiQC (Ewels et al. 2016),",
184-
"SeqKit (Shen 2016),",
185-
"Anaconda (Anaconda Software Distribution 2016),",
186-
"Docker (Merkel 2014),",
187-
"Singularity (Kurtzer et al. 2017)",
188-
".",
186+
quality_check_text,
187+
postprocessing_text
189188
].join(' ').trim()
190189

191190
return citation_text
192191
}
193192

194193
def toolBibliographyText() {
194+
def quality_check_text = [
195+
'<li>Telatin, A., Fariselli, P., & Birolo, G. (2021). SeqFu: a suite of utilities for the robust and reproducible manipulation of sequence files. Bioengineering, 8(5), 59. doi: <a href="https://doi.org/10.3390/bioengineering8050059">10.3390/bioengineering8050059</a></li>',
196+
params.skip_preprocessing ? '' : '<li>Shen, W., Sipos, B., & Zhao, L. (2024). SeqKit2: A Swiss army knife for sequence and alignment processing. Imeta, 3(3), e191. doi: <a href="https://doi.org/10.1002/imt2.191">10.1002/imt2.191</a></li>'
197+
].join(' ').trim()
198+
199+
def postprocessing_text = '<li>Ewels, P., Magnusson, M., Lundin, S., & Käller, M. (2016). MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics, 32(19), 3047–3048. doi: <a href="https://doi.org/10.1093/bioinformatics/btw354">10.1093/bioinformatics/btw354</a></li>'
200+
195201
def reference_text = [
196-
"<li>Ewels, P., Magnusson, M., Lundin, S., & Käller, M. (2016). MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics , 32(19), 3047–3048. doi: /10.1093/bioinformatics/btw354</li>",
197-
"<li>Shen W, Le S, Li Y, Hu F (2016) SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File Manipulation. PLoS ONE 11(10): e0163962. doi: https://doi.org/10.1371/journal.pone.0163962</li>",
198-
"<li>Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web.</li>",
199-
"<li>Merkel, D. (2014). Docker: lightweight linux containers for consistent development and deployment. Linux Journal, 2014(239), 2. doi: https://10.5555/2600239.2600241.</li>",
200-
"<li>Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: https://10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675.</li>",
202+
quality_check_text,
203+
postprocessing_text
201204
].join(' ').trim()
202205

203206
return reference_text

workflows/proteinannotator.nf

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ workflow PROTEINANNOTATOR {
4040
meta, _fasta, updated_fasta ->
4141
[ meta, updated_fasta ]
4242
}
43-
ch_samplesheet_updated.view()
43+
4444
FUNCTIONAL_ANNOTATION( ch_samplesheet_updated )
4545
ch_versions = ch_versions.mix( FUNCTIONAL_ANNOTATION.out.versions.first() )
4646

0 commit comments

Comments
 (0)