You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,7 +14,7 @@ Initial release of nf-core/proteinannotator, created with the [nf-core](https://
14
14
-[#52](https://github.com/nf-core/proteinannotator/pull/52) - Add option to turn off InterProScan for testing
15
15
-[#51](https://github.com/nf-core/proteinannotator/pull/51) - Update to nf-core/tools v3.3.1
16
16
-[#47](https://github.com/nf-core/proteinannotator/pull/47) - Update metromap with more tools added from [May 2025 Hackathon](https://nf-co.re/events/2025/hackathon-boston)
17
-
-[#43](https://github.com/nf-core/proteinannotator/pull/44) - Add [mTM-Align](https://nf-co.re/modules/mtmalign_align/) and [MMseqs2 Search](https://nf-co.re/modules/mmseqs_search/) modules
17
+
<!-- - [#43](https://github.com/nf-core/proteinannotator/pull/44) - Add [mTM-Align](https://nf-co.re/modules/mtmalign_align/) and [MMseqs2 Search](https://nf-co.re/modules/mmseqs_search/) modules-->
18
18
-[#42](https://github.com/nf-core/proteinannotator/pull/42) - Updated to `nf-test` on GitHub Actions and in the `PULL_REQUEST_TEMPLATE.md`
> Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924.
15
+
> Telatin A, Fariselli P, Birolo G. SeqFu: a suite of utilities for the robust and reproducible manipulation of sequence files. Bioengineering. 2021 May 7;8(5):59. doi: 10.3390/bioengineering8050059. PubMed PMID: 34066939; PubMed Central PMCID: PMC8148589.
> Shen W, Sipos B, Zhao L. SeqKit2: A Swiss army knife for sequence and alignment processing. iMeta. 2024 Apr 5:e191. doi: 10.1002/imt2.191. PubMed PMID: 38898985; PubMed Central PMCID: PMC11183193.
> Zdobnov, Evgeni M., and Rolf Apweiler. “InterProScan – an Integration Platform for the Signature-Recognition Methods in InterPro.” Bioinformatics 17, no. 9 (September 1, 2001): 847–48. https://doi.org/10.1093/bioinformatics/17.9.847.
> Zdobnov, Evgeni M., and Rolf Apweiler. “InterProScan – an Integration Platform for the Signature-Recognition Methods in InterPro.” Bioinformatics 17, no. 9 (September 1, 2001): 847–48. https://doi.org/10.1093/bioinformatics/17.9.847.
> Dong, Runze, Zhenling Peng, Yang Zhang, and Jianyi Yang. “mTM-Align: An Algorithm for Fast and Accurate Multiple Protein Structure Alignment.” Bioinformatics 34, no. 10 (May 15, 2018): 1719–25. https://doi.org/10.1093/bioinformatics/btx828.
31
+
> Dong, Runze, Zhenling Peng, Yang Zhang, and Jianyi Yang. “mTM-Align: An Algorithm for Fast and Accurate Multiple Protein Structure Alignment.” Bioinformatics 34, no. 10 (May 15, 2018): 1719–25. https://doi.org/10.1093/bioinformatics/btx828. -->
> Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924.
Copy file name to clipboardExpand all lines: docs/output.md
+40-1Lines changed: 40 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,12 +12,51 @@ The directories listed below will be created in the results directory after the
12
12
13
13
The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes data using the following steps:
14
14
15
+
-[Quality check and preprocessing](#quality-check-and-preprocessing)
16
+
-[SeqFu](#seqfu) for input amino acid sequences quality check (QC)
17
+
-[SeqKit](#seqkit) for preprocessing input amino acid sequences (i.e., gap removal, convert to upper case, validate, filter by length, replace special characters such as `/`, and remove duplicate sequences)
18
+
15
19
-[Functional Annotation](#functional-annotation) Annotate proteins with functional domains
16
20
-[InterProScan](#Interproscan) - Search the InterPro database for functional domains
17
21
-[MultiQC](#multiqc) - Aggregate report describing results and QC from the whole pipeline
18
-
-[SeqKit stats](#seqkit_stats) - Simple statistics for protein FASTA files
19
22
-[Pipeline information](#pipeline-information) - Report metrics generated during the workflow execution
20
23
24
+
### Quality check and preprocessing
25
+
26
+
#### SeqFu
27
+
28
+
<detailsmarkdown="1">
29
+
<summary>Output files</summary>
30
+
31
+
-`qc/`
32
+
-`<samplename>/`
33
+
-`<samplename>_before.tsv`: Statistics for the input amino acid sequences before preprocessing
34
+
-`<samplename>_before_mqc.txt`: Statistics for the input amino acid sequences in MultiQC-ready format before preprocessing
35
+
-`<samplename>_after.tsv`: (optional) Statistics for the input amino acid sequences after preprocessing
36
+
-`<samplename>_after_mqc.txt`: (optional) Statistics for the input amino acid sequences in MultiQC-ready format after preprocessing
37
+
-`<samplename>.log`: (optional) Output file with count of duplicate sequences that were found and removed
38
+
39
+
</details>
40
+
41
+
The `seqfu` module is used for statistics generation of input amino acid sequences, both before and after preprocessing.
42
+
43
+
[SeqFu](https://github.com/telatin/seqfu2) is a cross-platform compiled suite of tools to manipulate and inspect `FASTA` and `FASTQ` files.
44
+
45
+
#### SeqKit
46
+
47
+
<detailsmarkdown="1">
48
+
<summary>Output files</summary>
49
+
50
+
-`qc/`
51
+
-`<samplename>/`
52
+
-`<samplename>.<suffix>`: Updated preprocessed input fasta file
53
+
54
+
</details>
55
+
56
+
The `seqkit` module is used for initial preprocessing (i.e., gap removal, convert to upper case, validate, filter by length, replace special characters such as `/`, and remove duplicate sequences) of the input amino acid sequences.
57
+
58
+
[SeqKit](https://github.com/shenwei356/seqkit) is a cross-platform and ultrafast toolkit for FASTA/Q file manipulation.
"Amino acid sequence statistics were generated with SeqFu (Telatin et al. 2021).",
180
+
params.skip_preprocessing ?"":"Input sequences were preprocessed with SeqKit (gap trimming, length filtering, validation, duplicate removal) (Shen et al. 2024)."
181
+
].join('').trim()
182
+
183
+
def postprocessing_text ="Run statistics were reported using MultiQC (Ewels et al. 2016)."
184
+
177
185
def citation_text = [
178
-
"Tools used in the workflow included:",
179
-
"Nextflow (Di Tommaso et al. 2017),",
180
-
"nf-core (Ewels et al. 2020),",
181
-
"Bioconda (Grüning et al. 2018),",
182
-
"BioContainers (da Veiga Leprevost et al. 2017),",
183
-
"MultiQC (Ewels et al. 2016),",
184
-
"SeqKit (Shen 2016),",
185
-
"Anaconda (Anaconda Software Distribution 2016),",
186
-
"Docker (Merkel 2014),",
187
-
"Singularity (Kurtzer et al. 2017)",
188
-
".",
186
+
quality_check_text,
187
+
postprocessing_text
189
188
].join('').trim()
190
189
191
190
return citation_text
192
191
}
193
192
194
193
def toolBibliographyText() {
194
+
def quality_check_text = [
195
+
'<li>Telatin, A., Fariselli, P., & Birolo, G. (2021). SeqFu: a suite of utilities for the robust and reproducible manipulation of sequence files. Bioengineering, 8(5), 59. doi: <a href="https://doi.org/10.3390/bioengineering8050059">10.3390/bioengineering8050059</a></li>',
196
+
params.skip_preprocessing ?'':'<li>Shen, W., Sipos, B., & Zhao, L. (2024). SeqKit2: A Swiss army knife for sequence and alignment processing. Imeta, 3(3), e191. doi: <a href="https://doi.org/10.1002/imt2.191">10.1002/imt2.191</a></li>'
197
+
].join('').trim()
198
+
199
+
def postprocessing_text ='<li>Ewels, P., Magnusson, M., Lundin, S., & Käller, M. (2016). MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics, 32(19), 3047–3048. doi: <a href="https://doi.org/10.1093/bioinformatics/btw354">10.1093/bioinformatics/btw354</a></li>'
200
+
195
201
def reference_text = [
196
-
"<li>Ewels, P., Magnusson, M., Lundin, S., & Käller, M. (2016). MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics , 32(19), 3047–3048. doi: /10.1093/bioinformatics/btw354</li>",
197
-
"<li>Shen W, Le S, Li Y, Hu F (2016) SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File Manipulation. PLoS ONE 11(10): e0163962. doi: https://doi.org/10.1371/journal.pone.0163962</li>",
"<li>Merkel, D. (2014). Docker: lightweight linux containers for consistent development and deployment. Linux Journal, 2014(239), 2. doi: https://10.5555/2600239.2600241.</li>",
200
-
"<li>Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: https://10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675.</li>",
0 commit comments