Skip to content

Commit bf1bee7

Browse files
authored
Merge pull request #74 from vagkaratzas/changes-second-review
second reviewer comments resolve
2 parents 26f4d3d + 903d955 commit bf1bee7

8 files changed

Lines changed: 24 additions & 24 deletions

File tree

.nf-core.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ lint:
1111
nf_core_version: 3.5.1
1212
repository_type: pipeline
1313
template:
14-
author: Olga Botvinnik
14+
author: Olga Botvinnik, Evangelos Karatzas
1515
description: Generation of sequence-level annotations for amino acid sequences
1616
version: 1.0.0
1717
force: true

CHANGELOG.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
44
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
55

6-
## v1.0.0 - Yellow Saiga - [2026/02/04]
6+
## v1.0.0 - Yellow Saiga - [2026/02/09]
77

88
Initial release of nf-core/proteinannotator, created with the [nf-core](https://nf-co.re/) template.
99

README.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@
2121

2222
## Introduction
2323

24-
**nf-core/proteinannotator** is a bioinformatics pipeline that computes statistics on input protein FASTA files and identifies protein annotations such as conserved domains, predicted functions, and secondary structure features based on sequence data.
24+
**nf-core/proteinannotator** is a bioinformatics pipeline that computes statistics for protein FASTA inputs and produces protein annotations based on predicted sequence features, including conserved domains, functions, and secondary structure.
2525

2626
<p>
2727
<picture>
@@ -82,11 +82,13 @@ For more details about the output files and reports, please refer to the
8282

8383
## Credits
8484

85-
nf-core/proteinannotator was originally written by Olga Botvinnik.
85+
nf-core/proteinannotator was originally written by Olga Botvinnik and Evangelos Karatzas.
8686

8787
We thank the following people for their extensive assistance in the development of this pipeline:
8888

89-
- [Evangelos Karatzas](https://github.com/vagkaratzas)
89+
- [Michael L Heuer](https://github.com/heuermh)
90+
- [Edmund Miller](https://github.com/edmundmiller)
91+
- [Eric Wei](https://github.com/eweizy)
9092
- [Martin Beracochea](https://github.com/mberacochea)
9193

9294
## Contributions and Support

docs/output.md

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -13,18 +13,13 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
1313
- [Quality control and preprocessing](#quality-control-and-preprocessing)
1414
- [SeqFu](#seqfu) for input amino acid sequences quality control (QC)
1515
- [SeqKit](#seqkit) for preprocessing input amino acid sequences (i.e., gap removal, convert to upper case, validate, filter by length, replace special characters such as `/`, and remove duplicate sequences)
16-
1716
- [Database download](#database-download) Optionally download selected databases for annotation.
1817
- [aria2](#aria2) - To optionally download the Pfam, FunFam, and/or InterProScan databases through the pipeline.
19-
2018
- [Domain annotation](#domain-annotation) Annotate proteins with domains from established repositories.
2119
- [hmmer](#hmmer) - To optionally match the input sequence to known Pfam and/or FunFam domains through `hmmer/hmmsearch`
22-
2320
- [Functional annotation](#functional-annotation) Annotate proteins with functional domains
2421
- [InterProScan](#Interproscan) - Search the InterProScan database for functional domains
25-
2622
- [s4pred](#s4pred) - Predict secondary structures of sequences, producing amino acid level probabilities of forming an α-helix, a β-strand or a coil.
27-
2823
- [MultiQC](#multiqc) - Aggregate report describing results and QC from the whole pipeline
2924
- [Pipeline information](#pipeline-information) - Report metrics generated during the workflow execution
3025

docs/usage.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -80,10 +80,10 @@ You can also generate such `YAML`/`JSON` files via [nf-core/launch](https://nf-c
8080

8181
### InterProScan
8282

83-
[InterProScan](https://github.com/ebi-pf-team/interproscan) is used to provide more information about the proteins annotated on the contigs. By default, turning on this subworkflow without `--skip_interproscan` will download and unzip the InterPro database. The database will then be saved in the output directory `<output_directory>/downloaded_dbs/interproscan_db/`.
83+
[InterProScan](https://github.com/ebi-pf-team/interproscan) is used to provide more information about the proteins annotated on the contigs. By default, turning on this subworkflow without `--skip_interproscan` will download and unzip the InterPro database. The database will then be saved in the output directory `<output_directory>/downloaded_dbs/interproscan_db/`. We recommend keeping a copy of this directory for future reuse in case the results folder is deleted.
8484

8585
:::note
86-
The huge database download (5.5GB) can take up to 4 hours depending on the bandwidth.
86+
The large database download (5.5GB) can take up to 4 hours depending on the bandwidth.
8787
:::
8888

8989
A local version of the database can be supplied to the pipeline by passing the InterProScan database directory to `--interproscan_db <path/to/downloaded-untarred-interproscan_db-dir/>`. The directory can be created by running (e.g. for database version 5.72-103.0):

nextflow_schema.json

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -220,14 +220,14 @@
220220
"default": 30,
221221
"fa_icon": "fas fa-ruler-horizontal",
222222
"description": "The minimum allowed sequence length",
223-
"help_text": "Specify the minimum length of amino acid sequences that go into clustering."
223+
"help_text": "Specify the minimum length of amino acid sequences that go into clustering. Modifies the --min-len parameter of seqkit seq."
224224
},
225225
"max_seq_length": {
226226
"type": "integer",
227227
"default": 5000,
228228
"fa_icon": "fas fa-ruler-horizontal",
229229
"description": "The maximum allowed sequence length",
230-
"help_text": "Specify the maximum length of amino acid sequences that go into clustering."
230+
"help_text": "Specify the maximum length of amino acid sequences that go into clustering. Modifies the --max-len parameter of seqkit seq"
231231
},
232232
"remove_duplicates_on_sequence": {
233233
"type": "boolean",
@@ -279,7 +279,7 @@
279279
"hmmsearch_evalue_cutoff": {
280280
"type": "number",
281281
"default": 0.001,
282-
"description": "hmmsearch e-value cutoff threshold for reported results"
282+
"description": "hmmsearch e-value cutoff threshold for reported results. Modifies the -E parameter of hmmsearch."
283283
}
284284
}
285285
},
@@ -339,7 +339,7 @@
339339
"s4pred_outfmt": {
340340
"type": "string",
341341
"default": "ss2",
342-
"description": "Choose the output format (i.e., 'ss2', 'fas', 'horiz') for the s4pred per amino acid probability predictions (i.e., α-helix, β-strand, coil).",
342+
"description": "Choose the output format (i.e., 'ss2', 'fas', 'horiz') for the s4pred per amino acid probability predictions (i.e., α-helix, β-strand, coil). Modifies the --outfmt parameter of s4pred run_model.",
343343
"help_text": "ss2 is the default and it corresponds to the PSIPRED vertical format (PSIPRED VFORMAT). The fas output returns the sequence FASTA file with the predicted secondary structure concatenated on a second line. The horiz option outputs the results in the PSIPRED horizontal format (PSIPRED HFORMAT).",
344344
"enum": ["ss2", "fas", "horiz"]
345345
}

ro-crate-metadata.json

Lines changed: 10 additions & 8 deletions
Large diffs are not rendered by default.

subworkflows/local/domain_annotation/meta.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ input:
1717
type: file
1818
description: |
1919
Amino acid fasta file containing amino acid sequences for annotation
20+
Structure: [ val(meta), [ path(fasta) ] ]
2021
- skip_pfam:
2122
type: boolean
2223
description: |

0 commit comments

Comments
 (0)