Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .nf-core.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ lint:
nf_core_version: 3.5.1
repository_type: pipeline
template:
author: Olga Botvinnik
author: Olga Botvinnik, Evangelos Karatzas
description: Generation of sequence-level annotations for amino acid sequences
version: 1.0.0
force: true
Expand Down
2 changes: 1 addition & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## v1.0.0 - Yellow Saiga - [2026/02/04]
## v1.0.0 - Yellow Saiga - [2026/02/09]

Initial release of nf-core/proteinannotator, created with the [nf-core](https://nf-co.re/) template.

Expand Down
8 changes: 5 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@

## Introduction

**nf-core/proteinannotator** is a bioinformatics pipeline that computes statistics on input protein FASTA files and identifies protein annotations such as conserved domains, predicted functions, and secondary structure features based on sequence data.
**nf-core/proteinannotator** is a bioinformatics pipeline that computes statistics for protein FASTA inputs and produces protein annotations based on predicted sequence features, including conserved domains, functions, and secondary structure.

<p>
<picture>
Expand Down Expand Up @@ -82,11 +82,13 @@ For more details about the output files and reports, please refer to the

## Credits

nf-core/proteinannotator was originally written by Olga Botvinnik.
nf-core/proteinannotator was originally written by Olga Botvinnik and Evangelos Karatzas.

We thank the following people for their extensive assistance in the development of this pipeline:

- [Evangelos Karatzas](https://github.com/vagkaratzas)
- [Michael L Heuer](https://github.com/heuermh)
- [Edmund Miller](https://github.com/edmundmiller)
- [Eric Wei](https://github.com/eweizy)
- [Martin Beracochea](https://github.com/mberacochea)

## Contributions and Support
Expand Down
5 changes: 0 additions & 5 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,18 +13,13 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
- [Quality control and preprocessing](#quality-control-and-preprocessing)
- [SeqFu](#seqfu) for input amino acid sequences quality control (QC)
- [SeqKit](#seqkit) for preprocessing input amino acid sequences (i.e., gap removal, convert to upper case, validate, filter by length, replace special characters such as `/`, and remove duplicate sequences)

- [Database download](#database-download) Optionally download selected databases for annotation.
- [aria2](#aria2) - To optionally download the Pfam, FunFam, and/or InterProScan databases through the pipeline.

- [Domain annotation](#domain-annotation) Annotate proteins with domains from established repositories.
- [hmmer](#hmmer) - To optionally match the input sequence to known Pfam and/or FunFam domains through `hmmer/hmmsearch`

- [Functional annotation](#functional-annotation) Annotate proteins with functional domains
- [InterProScan](#Interproscan) - Search the InterProScan database for functional domains

- [s4pred](#s4pred) - Predict secondary structures of sequences, producing amino acid level probabilities of forming an α-helix, a β-strand or a coil.

- [MultiQC](#multiqc) - Aggregate report describing results and QC from the whole pipeline
- [Pipeline information](#pipeline-information) - Report metrics generated during the workflow execution

Expand Down
4 changes: 2 additions & 2 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,10 +80,10 @@ You can also generate such `YAML`/`JSON` files via [nf-core/launch](https://nf-c

### InterProScan

[InterProScan](https://github.com/ebi-pf-team/interproscan) is used to provide more information about the proteins annotated on the contigs. By default, turning on this subworkflow without `--skip_interproscan` will download and unzip the InterPro database. The database will then be saved in the output directory `<output_directory>/downloaded_dbs/interproscan_db/`.
[InterProScan](https://github.com/ebi-pf-team/interproscan) is used to provide more information about the proteins annotated on the contigs. By default, turning on this subworkflow without `--skip_interproscan` will download and unzip the InterPro database. The database will then be saved in the output directory `<output_directory>/downloaded_dbs/interproscan_db/`. We recommend keeping a copy of this directory for future reuse in case the results folder is deleted.

:::note
The huge database download (5.5GB) can take up to 4 hours depending on the bandwidth.
The large database download (5.5GB) can take up to 4 hours depending on the bandwidth.
:::

A local version of the database can be supplied to the pipeline by passing the InterProScan database directory to `--interproscan_db <path/to/downloaded-untarred-interproscan_db-dir/>`. The directory can be created by running (e.g. for database version 5.72-103.0):
Expand Down
8 changes: 4 additions & 4 deletions nextflow_schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -220,14 +220,14 @@
"default": 30,
"fa_icon": "fas fa-ruler-horizontal",
"description": "The minimum allowed sequence length",
"help_text": "Specify the minimum length of amino acid sequences that go into clustering."
"help_text": "Specify the minimum length of amino acid sequences that go into clustering. Modifies the --min-len parameter of seqkit seq."
},
"max_seq_length": {
"type": "integer",
"default": 5000,
"fa_icon": "fas fa-ruler-horizontal",
"description": "The maximum allowed sequence length",
"help_text": "Specify the maximum length of amino acid sequences that go into clustering."
"help_text": "Specify the maximum length of amino acid sequences that go into clustering. Modifies the --max-len parameter of seqkit seq"
},
"remove_duplicates_on_sequence": {
"type": "boolean",
Expand Down Expand Up @@ -279,7 +279,7 @@
"hmmsearch_evalue_cutoff": {
"type": "number",
"default": 0.001,
"description": "hmmsearch e-value cutoff threshold for reported results"
"description": "hmmsearch e-value cutoff threshold for reported results. Modifies the -E parameter of hmmsearch."
}
}
},
Expand Down Expand Up @@ -339,7 +339,7 @@
"s4pred_outfmt": {
"type": "string",
"default": "ss2",
"description": "Choose the output format (i.e., 'ss2', 'fas', 'horiz') for the s4pred per amino acid probability predictions (i.e., α-helix, β-strand, coil).",
"description": "Choose the output format (i.e., 'ss2', 'fas', 'horiz') for the s4pred per amino acid probability predictions (i.e., α-helix, β-strand, coil). Modifies the --outfmt parameter of s4pred run_model.",
"help_text": "ss2 is the default and it corresponds to the PSIPRED vertical format (PSIPRED VFORMAT). The fas output returns the sequence FASTA file with the predicted secondary structure concatenated on a second line. The horiz option outputs the results in the PSIPRED horizontal format (PSIPRED HFORMAT).",
"enum": ["ss2", "fas", "horiz"]
}
Expand Down
18 changes: 10 additions & 8 deletions ro-crate-metadata.json

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions subworkflows/local/domain_annotation/meta.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ input:
type: file
description: |
Amino acid fasta file containing amino acid sequences for annotation
Structure: [ val(meta), [ path(fasta) ] ]
- skip_pfam:
type: boolean
description: |
Expand Down