Skip to content

Commit 7c36646

Browse files
authored
Merge branch 'dev' into nf-core-template-merge-4.0.2
2 parents 5cabc5a + 96885a2 commit 7c36646

102 files changed

Lines changed: 6930 additions & 273 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/PULL_REQUEST_TEMPLATE.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,8 +18,8 @@ Learn more about contributing: [CONTRIBUTING.md](https://github.com/nf-core/prot
1818
- [ ] If you've added a new tool - have you followed the pipeline conventions in the [contribution docs](https://github.com/nf-core/proteinannotator/tree/master/docs/CONTRIBUTING.md)
1919
- [ ] If necessary, also make a PR on the nf-core/proteinannotator _branch_ on the [nf-core/test-datasets](https://github.com/nf-core/test-datasets) repository.
2020
- [ ] Make sure your code lints (`nf-core pipelines lint`).
21-
- [ ] Ensure the test suite passes (`nextflow run . -profile test,docker --outdir <OUTDIR>`).
22-
- [ ] Check for unexpected warnings in debug mode (`nextflow run . -profile debug,test,docker --outdir <OUTDIR>`).
21+
- [ ] Ensure the test suite passes (e.g. `nf-test test */local --profile=~test,docker` for all new local tests).
22+
- [ ] Check for unexpected warnings in debug mode (`nf-test test */local --profile=~test,docker,debug`).
2323
- [ ] Usage Documentation in `docs/usage.md` is updated.
2424
- [ ] Output Documentation in `docs/output.md` is updated.
2525
- [ ] `CHANGELOG.md` is updated.

.github/actions/nf-test/action.yml

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,18 @@ inputs:
1919
runs:
2020
using: "composite"
2121
steps:
22+
- name: Print Modules Folder Tree
23+
uses: jaywcjlove/github-action-folder-tree@main
24+
with:
25+
exclude: "node_modules|dist|.git|.husky"
26+
path: ./modules
27+
depth: 10
28+
- name: Print Subworkflows Folder Tree
29+
uses: jaywcjlove/github-action-folder-tree@main
30+
with:
31+
exclude: "node_modules|dist|.git|.husky"
32+
path: ./subworkflows
33+
depth: 10
2234
- name: Setup Nextflow
2335
uses: nf-core/setup-nextflow@b4ec1bc7c16a94435159de94a05253542fddf6ef # v3
2436
with:

.github/workflows/awsfulltest.yml

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -20,13 +20,10 @@ jobs:
2020
- name: Set revision variable
2121
id: revision
2222
run: |
23-
echo "revision=${{ (github.event_name == 'workflow_dispatch' || github.event_name == 'release') && github.sha || 'dev' }}" >> "$GITHUB_OUTPUT"
23+
echo "revision={%- raw -%}${{ (github.event_name == 'workflow_dispatch' || github.event_name == 'release') && github.sha || 'dev' }}" >> "$GITHUB_OUTPUT"
2424
2525
- name: Launch workflow via Seqera Platform
2626
uses: seqeralabs/action-tower-launch@51565b514bff1827cf34620de25d0055759f1fc9 # v2
27-
# TODO nf-core: You can customise AWS full pipeline tests as required
28-
# Add full size test data (but still relatively small datasets for few samples)
29-
# on the `test_full.config` test runs with only one set of parameters
3027
with:
3128
workspace_id: ${{ vars.TOWER_WORKSPACE_ID }}
3229
access_token: ${{ secrets.TOWER_ACCESS_TOKEN }}

.gitignore

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,3 +8,8 @@ testing*
88
*.pyc
99
null/
1010
.lineage/
11+
# Nextflow nf-tests output
12+
.nf-test.log
13+
.nf-test/tests
14+
.nf-test-*.nf
15+
.nf-test/*

.nf-core.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,4 +21,4 @@ template:
2121
skip_features:
2222
- fastqc
2323
- igenomes
24-
version: 1.1.0dev
24+
version: 1.1.0dev

.prettierignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,3 +12,5 @@ bin/
1212
ro-crate-metadata.json
1313
modules/nf-core/
1414
subworkflows/nf-core/
15+
**/Makefile
16+
Makefile

CHANGELOG.md

Lines changed: 24 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -5,12 +5,31 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
55

66
## v1.1.0dev - [date]
77

8-
Initial release of nf-core/proteinannotator, created with the [nf-core](https://nf-co.re/) template.
9-
108
### `Added`
119

12-
### `Fixed`
10+
- [#90](https://github.com/nf-core/proteinannotator/pull/90) - Added the option to download and use the latest `metagRoot` HMM library (or use path to an existing one) for domain annotation. (by @angelphanth)
11+
- [#87](https://github.com/nf-core/proteinannotator/pull/87) - Added the option to download and use the latest `NMPFams` HMM library (or use path to an existing one) for domain annotation. (by @npechl)
12+
- [#85](https://github.com/nf-core/proteinannotator/pull/85) - Added zenodo doi in `nextflow.config`. (by @vagkaratzas)
13+
14+
### `Changed`
1315

14-
### `Dependencies`
16+
- [#85](https://github.com/nf-core/proteinannotator/pull/85) - `test_full.config` input samplesheet path is now set properly. (by @vagkaratzas)
17+
18+
## v1.0.0 - Yellow Saiga - [2026/02/09]
19+
20+
Initial release of nf-core/proteinannotator, created with the [nf-core](https://nf-co.re/) template.
1521

16-
### `Deprecated`
22+
- [#68](https://github.com/nf-core/proteinannotator/pull/68) - Using the `ARIA2` and `UNTAR` nf-core modules to download and decompress the InterProScan database. (by @vagkaratzas)
23+
- [#67](https://github.com/nf-core/proteinannotator/pull/67) - Swapped to the updated, non-buggy, nf-core version of `INTERPROSCAN`. (by @vagkaratzas)
24+
- [#65](https://github.com/nf-core/proteinannotator/pull/65) - Converted the pipeline schematic to nf-core metromap. (by @vagkaratzas)
25+
- [#62](https://github.com/nf-core/proteinannotator/pull/62) - Added the option to download and use the latest FunFam HMM library (or use path to an existing one) for domain annotation. (by @vagkaratzas)
26+
- [#61](https://github.com/nf-core/proteinannotator/pull/61) - Added nf-core modules `ARIA2` and `HMMER_HMMSEARCH` to download latest Pfam HMM library (or use path to existing one) and match domains to input sequences. (by @vagkaratzas)
27+
- [#60](https://github.com/nf-core/proteinannotator/pull/60) - Added nf-core module `S4PRED_RUNMODEL` for secondary structure prediction (i.e., α-helix, a β-strand or a coil). (by @vagkaratzas)
28+
- [#59](https://github.com/nf-core/proteinannotator/pull/59) - Added nf-core qc and pre-processing subworkflow for amino acid sequences `FAA_SEQFU_SEQKIT`. (by @vagkaratzas)
29+
- [#57](https://github.com/nf-core/proteinannotator/pull/57) - nf-core tools template update to 3.5.1. (by @vagkaratzas)
30+
- [#52](https://github.com/nf-core/proteinannotator/pull/52) - Add option to turn off InterProScan for testing. (by @edmundmiller, @olgabot)
31+
- [#51](https://github.com/nf-core/proteinannotator/pull/51) - Update to nf-core/tools v3.3.1. (by @olgabot)
32+
- [#47](https://github.com/nf-core/proteinannotator/pull/47) - Update metromap with more tools added from [May 2025 Hackathon](https://nf-co.re/events/2025/hackathon-boston). (by @olgabot)
33+
- [#42](https://github.com/nf-core/proteinannotator/pull/42) - Updated to `nf-test` on GitHub Actions and in the `PULL_REQUEST_TEMPLATE.md`. (by @olgabot)
34+
- [#13](https://github.com/nf-core/proteinannotator/pull/13) - Add nf-core seqkit/stats module. (by @olgabot, @heuermh)
35+
- [#9](https://github.com/nf-core/proteinannotator/pull/9) - Added [InterProScan](https://interproscan-docs.readthedocs.io/) module - local version. (by @olgabot, @heuermh, @eweizy)

CITATIONS.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,26 @@
1010
1111
## Pipeline tools
1212

13+
- [SeqFu](https://pubmed.ncbi.nlm.nih.gov/34066939/)
14+
15+
> Telatin A, Fariselli P, Birolo G. SeqFu: a suite of utilities for the robust and reproducible manipulation of sequence files. Bioengineering. 2021 May 7;8(5):59. doi: 10.3390/bioengineering8050059. PubMed PMID: 34066939; PubMed Central PMCID: PMC8148589.
16+
17+
- [SeqKit](https://pubmed.ncbi.nlm.nih.gov/38898985/)
18+
19+
> Shen W, Sipos B, Zhao L. SeqKit2: A Swiss army knife for sequence and alignment processing. iMeta. 2024 Apr 5:e191. doi: 10.1002/imt2.191. PubMed PMID: 38898985; PubMed Central PMCID: PMC11183193.
20+
21+
- [hmmer](https://pubmed.ncbi.nlm.nih.gov/29905871/)
22+
23+
> Eddy SR. Accelerated profile HMM searches. PLoS computational biology. 2011 Oct 20;7(10):e1002195. doi: 10.1371/journal.pcbi.1002195. PubMed PMID: 22039361; PubMed Central PMCID: PMC3197634.
24+
25+
- [InterProScan](https://academic.oup.com/bioinformatics/article/17/9/847/206564)
26+
27+
> Jones P, Binns D, Chang HY, Fraser M, Li W, McAnulla C, McWilliam H, Maslen J, Mitchell A, Nuka G, Pesseat S, Quinn, A. F, Sangrador-Vegas A, Scheremetjew M, Yong S-Y, Lopez R, Hunter S. InterProScan 5: genome-scale protein function classification. Bioinformatics. 2014 May 1;30(9):1236-40. doi: 10.1093/bioinformatics/btu031. PubMed PMID: 24451626; PubMed Central PMCID: PMC3998142.
28+
29+
- [s4pred](https://pubmed.ncbi.nlm.nih.gov/34213528/)
30+
31+
> Moffat L, Jones DT. Increasing the accuracy of single sequence prediction methods using a deep semi-supervised learning framework. Bioinformatics. 2021 Nov 1;37(21):3744-51. doi: 10.1093/bioinformatics/btab491. PubMed PMID: 34213528; PubMed Central PMCID: PMC8570780.
32+
1333
- [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/)
1434

1535
> Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924.

README.md

Lines changed: 30 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77

88
[![Open in GitHub Codespaces](https://img.shields.io/badge/Open_In_GitHub_Codespaces-black?labelColor=grey&logo=github)](https://github.com/codespaces/new/nf-core/proteinannotator)
99
[![GitHub Actions CI Status](https://github.com/nf-core/proteinannotator/actions/workflows/nf-test.yml/badge.svg)](https://github.com/nf-core/proteinannotator/actions/workflows/nf-test.yml)
10-
[![GitHub Actions Linting Status](https://github.com/nf-core/proteinannotator/actions/workflows/linting.yml/badge.svg)](https://github.com/nf-core/proteinannotator/actions/workflows/linting.yml)[![AWS CI](https://img.shields.io/badge/CI%20tests-full%20size-FF9900?labelColor=000000&logo=Amazon%20AWS)](https://nf-co.re/proteinannotator/results)[![Cite with Zenodo](http://img.shields.io/badge/DOI-10.5281/zenodo.XXXXXXX-1073c8?labelColor=000000)](https://doi.org/10.5281/zenodo.XXXXXXX)
10+
[![GitHub Actions Linting Status](https://github.com/nf-core/proteinannotator/actions/workflows/linting.yml/badge.svg)](https://github.com/nf-core/proteinannotator/actions/workflows/linting.yml)[![AWS CI](https://img.shields.io/badge/CI%20tests-full%20size-FF9900?labelColor=000000&logo=Amazon%20AWS)](https://nf-co.re/proteinannotator/results)[![Cite with Zenodo](http://img.shields.io/badge/DOI-10.5281/zenodo.18547735-1073c8?labelColor=000000)](https://doi.org/10.5281/zenodo.18547735)
1111
[![nf-test](https://img.shields.io/badge/unit_tests-nf--test-337ab7.svg)](https://www.nf-test.com)
1212

1313
[![Nextflow](https://img.shields.io/badge/version-%E2%89%A525.10.4-green?style=flat&logo=nextflow&logoColor=white&color=%230DC09D&link=https%3A%2F%2Fnextflow.io)](https://www.nextflow.io/)
@@ -21,43 +21,47 @@
2121

2222
## Introduction
2323

24-
**nf-core/proteinannotator** is a bioinformatics pipeline that ...
24+
**nf-core/proteinannotator** is a bioinformatics pipeline that computes statistics for protein FASTA inputs and produces protein annotations based on predicted sequence features, including conserved domains, functions, and secondary structure.
2525

26-
<!-- TODO nf-core:
27-
Complete this sentence with a 2-3 sentence summary of what types of data the pipeline ingests, a brief overview of the
28-
major pipeline sections and the types of output it produces. You're giving an overview to someone new
29-
to nf-core here, in 15-20 seconds. For an example, see https://github.com/nf-core/rnaseq/blob/master/README.md#introduction
30-
-->
26+
<p>
27+
<picture>
28+
<source media="(prefers-color-scheme: dark)" srcset="docs/images/proteinannotator_metromap_dark.png">
29+
<img alt="nf-core/proteinannotator" src="docs/images/proteinannotator_metromap_light.png">
30+
</picture>
31+
</p>
32+
33+
### Check quality and pre-process
34+
35+
Generate input amino acid sequence statistics with ([`SeqFu`](https://github.com/telatin/seqfu2/)) and pre-process them (i.e., gap removal, convert to upper case, validate, filter by length, replace special characters such as `/`, and remove duplicate sequences) with ([`SeqKit`](https://github.com/shenwei356/seqkit/))
36+
37+
### Annotate sequences
3138

32-
<!-- TODO nf-core: Include a figure that guides the user through the major workflow steps. Many nf-core
33-
workflows use the "tube map" design for that. See https://nf-co.re/docs/community/brand/workflow-schematics#examples for examples. -->
34-
<!-- TODO nf-core: Fill in short bullet-pointed list of the default steps in the pipeline -->2. Present QC for raw reads ([`MultiQC`](http://multiqc.info/))
39+
1. Conserved domain annotation with ([`hmmer`](https://github.com/EddyRivasLab/hmmer/)) against databases
40+
such as [Pfam](https://ftp.ebi.ac.uk/pub/databases/Pfam/), [FunFam](https://download.cathdb.info/cath/releases/all-releases/), and [NMPFams and metagRoot](https://pavlopoulos-lab.org/envofams/databases/hmmer/)
41+
2. Functional annotation:
42+
- ([`InterProScan`](https://interproscan-docs.readthedocs.io/en/v5/)) a software tool used to analyze protein sequences by scanning them against the signatures of protein families, domains, and sites in the [InterPro](https://www.ebi.ac.uk/interpro/) database, helping to identify their functional characteristics.
43+
3. Predict secondary structure compositional features such as α-helices, β-strands and coils with ([`s4pred`](https://github.com/psipred/s4pred))
44+
4. Present QC stats for input sequences before and after initial pre-processing with ([`MultiQC`](http://multiqc.info/))
3545

3646
## Usage
3747

3848
> [!NOTE]
3949
> If you are new to Nextflow and nf-core, please refer to [this page](https://nf-co.re/docs/get_started/environment_setup/overview) on how to set-up Nextflow. Make sure to [test your setup](https://nf-co.re/docs/get_started/run-your-first-pipeline) with `-profile test` before running the workflow on actual data.
4050
41-
<!-- TODO nf-core: Describe the minimum required steps to execute the pipeline, e.g. how to prepare samplesheets.
42-
Explain what rows and columns represent. For instance (please edit as appropriate):
43-
4451
First, prepare a samplesheet with your input data that looks as follows:
4552

4653
`samplesheet.csv`:
4754

4855
```csv
49-
sample,fastq_1,fastq_2
50-
CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz
56+
id,fasta
57+
species1,species1_proteins.fasta
58+
species2,species2_proteins.fasta
5159
```
5260

53-
Each row represents a fastq file (single-end) or a pair of fastq files (paired end).
54-
55-
-->
61+
Each row represents a FASTA file of proteins from a single species.
5662

5763
Now, you can run the pipeline using:
5864

59-
<!-- TODO nf-core: update the following command to include all required parameters for a minimal example -->
60-
6165
```bash
6266
nextflow run nf-core/proteinannotator \
6367
-profile <docker/singularity/.../institute> \
@@ -78,11 +82,14 @@ For more details about the output files and reports, please refer to the
7882

7983
## Credits
8084

81-
nf-core/proteinannotator was originally written by Olga Botvinnik, Evangelos Karatzas.
85+
nf-core/proteinannotator was originally written by Olga Botvinnik and Evangelos Karatzas.
8286

8387
We thank the following people for their extensive assistance in the development of this pipeline:
8488

85-
<!-- TODO nf-core: If applicable, make list of people who have also contributed -->
89+
- [Michael L Heuer](https://github.com/heuermh)
90+
- [Edmund Miller](https://github.com/edmundmiller)
91+
- [Eric Wei](https://github.com/eweizy)
92+
- [Martin Beracochea](https://github.com/mberacochea)
8693

8794
## Contributions and Support
8895

@@ -92,10 +99,7 @@ For further information or help, don't hesitate to get in touch on the [Slack `#
9299

93100
## Citations
94101

95-
<!-- TODO nf-core: Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file. -->
96-
<!-- If you use nf-core/proteinannotator for your analysis, please cite it using the following doi: [10.5281/zenodo.XXXXXX](https://doi.org/10.5281/zenodo.XXXXXX) -->
97-
98-
<!-- TODO nf-core: Add bibliography of tools and data used in your pipeline -->
102+
If you use nf-core/proteinannotator for your analysis, please cite it using the following doi: [10.5281/zenodo.18547735](https://doi.org/10.5281/zenodo.18547735)
99103

100104
An extensive list of references for the tools used by the pipeline can be found in the [`CITATIONS.md`](CITATIONS.md) file.
101105

assets/methods_description_template.yml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,6 @@ description: "Suggested text and references to use when describing pipeline usag
33
section_name: "nf-core/proteinannotator Methods Description"
44
section_href: "https://github.com/nf-core/proteinannotator"
55
plot_type: "html"
6-
## TODO nf-core: Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline
76
## You inject any metadata in the Nextflow '${workflow}' object
87
data: |
98
<h4>Methods</h4>

0 commit comments

Comments
 (0)