Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 9 additions & 7 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,15 @@ Initial release of nf-core/proteinannotator, created with the [nf-core](https://

### `Added`

- [[PR #52](https://github.com/nf-core/proteinannotator/pull/52)] Add option to turn off InterProScan for testing
- [[PR #51](https://github.com/nf-core/proteinannotator/pull/51)] Update to nf-core/tools v3.3.1
- [[PR #47](https://github.com/nf-core/proteinannotator/pull/47)] Update metromap with more tools added from [May 2025 Hackathon](https://nf-co.re/events/2025/hackathon-boston)
- [[PR #43](https://github.com/nf-core/proteinannotator/pull/44)] Add [mTM-Align](https://nf-co.re/modules/mtmalign_align/) and [MMseqs2 Search](https://nf-co.re/modules/mmseqs_search/) modules
- [[PR #42](https://github.com/nf-core/proteinannotator/pull/42)] Updated to `nf-test` on GitHub Actions and in the `PULL_REQUEST_TEMPLATE.md`
- [[PR #13](https://github.com/nf-core/proteinannotator/pull/13)] Add nf-core seqkit/stats module
- [[PR #9](https://github.com/nf-core/proteinannotator/pull/9)] Add [InterProScan](https://interproscan-docs.readthedocs.io/) module
- [#59](https://github.com/nf-core/proteinannotator/pull/59) - Added nf-core qc and pre-processing subworkflow for amino acid sequences `FAA_SEQFU_SEQKIT`. (by @vagkaratzas). (by @vagkaratzas)
- [#57](https://github.com/nf-core/proteinannotator/pull/57) - nf-core tools template update to 3.5.1. (by @vagkaratzas)
- [#52](https://github.com/nf-core/proteinannotator/pull/52) - Add option to turn off InterProScan for testing
- [#51](https://github.com/nf-core/proteinannotator/pull/51) - Update to nf-core/tools v3.3.1
- [#47](https://github.com/nf-core/proteinannotator/pull/47) - Update metromap with more tools added from [May 2025 Hackathon](https://nf-co.re/events/2025/hackathon-boston)
<!-- - [#43](https://github.com/nf-core/proteinannotator/pull/44) - Add [mTM-Align](https://nf-co.re/modules/mtmalign_align/) and [MMseqs2 Search](https://nf-co.re/modules/mmseqs_search/) modules -->
- [#42](https://github.com/nf-core/proteinannotator/pull/42) - Updated to `nf-test` on GitHub Actions and in the `PULL_REQUEST_TEMPLATE.md`
- [#13](https://github.com/nf-core/proteinannotator/pull/13) - Add nf-core seqkit/stats module
- [#9](https://github.com/nf-core/proteinannotator/pull/9) - Add [InterProScan](https://interproscan-docs.readthedocs.io/) module

### `Fixed`

Expand Down
16 changes: 12 additions & 4 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,21 +10,29 @@

## Pipeline tools

- [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/)
- [SeqFu](https://pubmed.ncbi.nlm.nih.gov/34066939/)

> Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924.
> Telatin A, Fariselli P, Birolo G. SeqFu: a suite of utilities for the robust and reproducible manipulation of sequence files. Bioengineering. 2021 May 7;8(5):59. doi: 10.3390/bioengineering8050059. PubMed PMID: 34066939; PubMed Central PMCID: PMC8148589.

- [SeqKit](https://pubmed.ncbi.nlm.nih.gov/38898985/)

> Shen W, Sipos B, Zhao L. SeqKit2: A Swiss army knife for sequence and alignment processing. iMeta. 2024 Apr 5:e191. doi: 10.1002/imt2.191. PubMed PMID: 38898985; PubMed Central PMCID: PMC11183193.

- [InterProScan](https://academic.oup.com/bioinformatics/article/17/9/847/206564)

> Zdobnov, Evgeni M., and Rolf Apweiler. “InterProScan – an Integration Platform for the Signature-Recognition Methods in InterPro.” Bioinformatics 17, no. 9 (September 1, 2001): 847–48. https://doi.org/10.1093/bioinformatics/17.9.847.

- [MMseqs2](https://www.nature.com/articles/nbt.3988)
<!-- - [MMseqs2](https://www.nature.com/articles/nbt.3988)

> Zdobnov, Evgeni M., and Rolf Apweiler. “InterProScan – an Integration Platform for the Signature-Recognition Methods in InterPro.” Bioinformatics 17, no. 9 (September 1, 2001): 847–48. https://doi.org/10.1093/bioinformatics/17.9.847.

- [mTM-align](https://academic.oup.com/bioinformatics/article/34/10/1719/4769500)

> Dong, Runze, Zhenling Peng, Yang Zhang, and Jianyi Yang. “mTM-Align: An Algorithm for Fast and Accurate Multiple Protein Structure Alignment.” Bioinformatics 34, no. 10 (May 15, 2018): 1719–25. https://doi.org/10.1093/bioinformatics/btx828.
> Dong, Runze, Zhenling Peng, Yang Zhang, and Jianyi Yang. “mTM-Align: An Algorithm for Fast and Accurate Multiple Protein Structure Alignment.” Bioinformatics 34, no. 10 (May 15, 2018): 1719–25. https://doi.org/10.1093/bioinformatics/btx828. -->

- [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/)

> Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924.

## Software packaging/containerisation tools

Expand Down
1 change: 0 additions & 1 deletion assets/methods_description_template.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@ description: "Suggested text and references to use when describing pipeline usag
section_name: "nf-core/proteinannotator Methods Description"
section_href: "https://github.com/nf-core/proteinannotator"
plot_type: "html"
## TODO nf-core: Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline
## You inject any metadata in the Nextflow '${workflow}' object
data: |
<h4>Methods</h4>
Expand Down
59 changes: 56 additions & 3 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,62 @@ process {
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]

withName: 'NFCORE_PROTEINANNOTATOR:PROTEINANNOTATOR:FAA_SEQFU_SEQKIT:SEQFU_STATS_BEFORE' {
ext.prefix = { "${meta.id}_before" }
publishDir = [
path: { "${params.outdir}/qc/${meta.id}/" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}

withName: 'NFCORE_PROTEINANNOTATOR:PROTEINANNOTATOR:FAA_SEQFU_SEQKIT:SEQFU_STATS_AFTER' {
ext.prefix = { "${meta.id}_after" }
publishDir = [
path: { "${params.outdir}/qc/${meta.id}/" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}

withName: 'NFCORE_PROTEINANNOTATOR:PROTEINANNOTATOR:FAA_SEQFU_SEQKIT:SEQKIT_SEQ' {
ext.args = [
"--remove-gaps",
"--upper-case",
"--validate-seq",
"--min-len ${params.min_seq_length}",
"--max-len ${params.max_seq_length}"
].join(' ').trim()
ext.prefix = "intermediate_seqkit_seq"
publishDir = [
path: { "${params.outdir}/qc/${meta.id}/" },
mode: params.publish_dir_mode,
enabled: false,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}

withName: 'NFCORE_PROTEINANNOTATOR:PROTEINANNOTATOR:FAA_SEQFU_SEQKIT:SEQKIT_REPLACE' {
ext.args = '-p "/" -r "_"'
ext.suffix = "fasta"
ext.prefix = "intermediate_seqkit_replace"
publishDir = [
path: { "${params.outdir}/qc/${meta.id}/" },
mode: params.publish_dir_mode,
enabled: false,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}

withName: 'NFCORE_PROTEINANNOTATOR:PROTEINANNOTATOR:FAA_SEQFU_SEQKIT:SEQKIT_RMDUP' {
ext.args = { params.remove_duplicates_on_sequence ? "--by-seq" : '' }
publishDir = [
path: { "${params.outdir}/qc/${meta.id}/" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}

withName: 'MULTIQC' {
ext.args = { params.multiqc_title ? "--title \"$params.multiqc_title\"" : '' }
publishDir = [
Expand All @@ -27,7 +83,4 @@ process {
]
}

withName: SEQKIT_STATS {
ext.args = ' ' // turn off --all default argument
}
}
41 changes: 40 additions & 1 deletion docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,51 @@ The directories listed below will be created in the results directory after the

The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes data using the following steps:

- [Quality check and preprocessing](#quality-check-and-preprocessing)
- [SeqFu](#seqfu) for input amino acid sequences quality check (QC)
- [SeqKit](#seqkit) for preprocessing input amino acid sequences (i.e., gap removal, convert to upper case, validate, filter by length, replace special characters such as `/`, and remove duplicate sequences)

- [Functional Annotation](#functional-annotation) Annotate proteins with functional domains
- [InterProScan](#Interproscan) - Search the InterPro database for functional domains
- [MultiQC](#multiqc) - Aggregate report describing results and QC from the whole pipeline
- [SeqKit stats](#seqkit_stats) - Simple statistics for protein FASTA files
- [Pipeline information](#pipeline-information) - Report metrics generated during the workflow execution

### Quality check and preprocessing

#### SeqFu

<details markdown="1">
<summary>Output files</summary>

- `qc/`
- `<samplename>/`
- `<samplename>_before.tsv`: Statistics for the input amino acid sequences before preprocessing
- `<samplename>_before_mqc.txt`: Statistics for the input amino acid sequences in MultiQC-ready format before preprocessing
- `<samplename>_after.tsv`: (optional) Statistics for the input amino acid sequences after preprocessing
- `<samplename>_after_mqc.txt`: (optional) Statistics for the input amino acid sequences in MultiQC-ready format after preprocessing
- `<samplename>.log`: (optional) Output file with count of duplicate sequences that were found and removed

</details>

The `seqfu` module is used for statistics generation of input amino acid sequences, both before and after preprocessing.

[SeqFu](https://github.com/telatin/seqfu2) is a cross-platform compiled suite of tools to manipulate and inspect `FASTA` and `FASTQ` files.

#### SeqKit

<details markdown="1">
<summary>Output files</summary>

- `qc/`
- `<samplename>/`
- `<samplename>.<suffix>`: Updated preprocessed input fasta file

</details>

The `seqkit` module is used for initial preprocessing (i.e., gap removal, convert to upper case, validate, filter by length, replace special characters such as `/`, and remove duplicate sequences) of the input amino acid sequences.

[SeqKit](https://github.com/shenwei356/seqkit) is a cross-platform and ultrafast toolkit for FASTA/Q file manipulation.

### Functional Annotation

#### InterProScan
Expand Down
3 changes: 2 additions & 1 deletion main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,8 @@ workflow NFCORE_PROTEINANNOTATOR {
// WORKFLOW: Run pipeline
//
PROTEINANNOTATOR (
samplesheet
samplesheet,
params.skip_preprocessing
)
emit:
multiqc_report = PROTEINANNOTATOR.out.multiqc_report // channel: /path/to/multiqc_report.html
Expand Down
25 changes: 25 additions & 0 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,26 @@
"git_sha": "af27af1be706e6a2bb8fe454175b0cdf77f47b49",
"installed_by": ["modules"]
},
"seqfu/stats": {
"branch": "master",
"git_sha": "e753770db613ce014b3c4bc94f6cba443427b726",
"installed_by": ["faa_seqfu_seqkit"]
},
"seqkit/replace": {
"branch": "master",
"git_sha": "41dfa3f7c0ffabb96a6a813fe321c6d1cc5b6e46",
"installed_by": ["faa_seqfu_seqkit"]
},
"seqkit/rmdup": {
"branch": "master",
"git_sha": "41dfa3f7c0ffabb96a6a813fe321c6d1cc5b6e46",
"installed_by": ["faa_seqfu_seqkit"]
},
"seqkit/seq": {
"branch": "master",
"git_sha": "41dfa3f7c0ffabb96a6a813fe321c6d1cc5b6e46",
"installed_by": ["faa_seqfu_seqkit"]
},
"seqkit/stats": {
"branch": "master",
"git_sha": "81880787133db07d9b4c1febd152c090eb8325dc",
Expand All @@ -34,6 +54,11 @@
},
"subworkflows": {
"nf-core": {
"faa_seqfu_seqkit": {
"branch": "master",
"git_sha": "15c0a7968179d3b717a9973a1c4f25beb8a9aa2b",
"installed_by": ["subworkflows"]
},
"utils_nextflow_pipeline": {
"branch": "master",
"git_sha": "05954dab2ff481bcb999f24455da29a5828af08d",
Expand Down
7 changes: 7 additions & 0 deletions modules/nf-core/seqfu/stats/environment.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

52 changes: 52 additions & 0 deletions modules/nf-core/seqfu/stats/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

67 changes: 67 additions & 0 deletions modules/nf-core/seqfu/stats/meta.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading