|
22 | 22 | "@id": "./", |
23 | 23 | "@type": "Dataset", |
24 | 24 | "creativeWorkStatus": "InProgress", |
25 | | - "datePublished": "2026-01-26T10:59:12+00:00", |
26 | | - "description": "<h1>\n <picture>\n <source media=\"(prefers-color-scheme: dark)\" srcset=\"docs/images/nf-core-proteinannotator_logo_dark.png\">\n <img alt=\"nf-core/proteinannotator\" src=\"docs/images/nf-core-proteinannotator_logo_light.png\">\n </picture>\n</h1>\n\n[](https://github.com/codespaces/new/nf-core/proteinannotator)\n[](https://github.com/nf-core/proteinannotator/actions/workflows/nf-test.yml)\n[](https://github.com/nf-core/proteinannotator/actions/workflows/linting.yml)[](https://nf-co.re/proteinannotator/results)[](https://doi.org/10.5281/zenodo.XXXXXXX)\n[](https://www.nf-test.com)\n\n[](https://www.nextflow.io/)\n[](https://github.com/nf-core/tools/releases/tag/3.5.1)\n[](https://docs.conda.io/en/latest/)\n[](https://www.docker.com/)\n[](https://sylabs.io/docs/)\n[](https://cloud.seqera.io/launch?pipeline=https://github.com/nf-core/proteinannotator)\n\n[](https://nfcore.slack.com/channels/proteinannotator)[](https://bsky.app/profile/nf-co.re)[](https://mstdn.science/@nf_core)[](https://www.youtube.com/c/nf-core)\n\n## Introduction\n\n**nf-core/proteinannotator** is a bioinformatics pipeline that runs statistics of input protein fasta files and identifies\nprotein annotations such as conserved domains, functions and secondary structure features, based on their sequence data.\n\n<p>\n <picture>\n <source media=\"(prefers-color-scheme: dark)\" srcset=\"docs/images/proteinannotator_metromap_dark.png\">\n <img alt=\"Protein annotator metromap. Protein fasta files are summarized with `seqkit stats`, then functionally annotated with InterProScan, DIAMOND-blastp, UniFire, and Kmerseek\" src=\"docs/images/proteinannotator_metromap_light.png\">\n </picture>\n</p>\n\n### Check quality and pre-process\n\nGenerate input amino acid sequence statistics with ([`SeqFu`](https://github.com/telatin/seqfu2/)) and pre-process them (i.e., gap removal, convert to upper case, validate, filter by length, replace special characters such as `/`, and remove duplicate sequences) with ([`SeqKit`](https://github.com/shenwei356/seqkit/))\n\n### Annotate sequences\n\n1. Conserved domain annotation with ([`hmmer`](https://github.com/EddyRivasLab/hmmer/)) against databases\n such as [Pfam](https://ftp.ebi.ac.uk/pub/databases/Pfam/) and [FunFam](https://download.cathdb.info/cath/releases/all-releases/)\n2. Functional annotation:\n - ([`InterProScan`](https://interproscan-docs.readthedocs.io/en/v5/)) a software tool used to analyze protein sequences by scanning them against the signatures of protein families, domains, and sites in the [InterPro](https://www.ebi.ac.uk/interpro/) database, helping to identify their functional characteristics.\n3. Predict secondary structure compositional features such as \u03b1-helices, \u03b2-strands and coils with ([`s4pred`](https://github.com/psipred/s4pred))\n4. Present QC stats for input sequences before and after initial pre-processing with ([`MultiQC`](http://multiqc.info/))\n\n## Usage\n\n> [!NOTE]\n> If you are new to Nextflow and nf-core, please refer to [this page](https://nf-co.re/docs/usage/installation) on how to set-up Nextflow. Make sure to [test your setup](https://nf-co.re/docs/usage/introduction#how-to-run-a-pipeline) with `-profile test` before running the workflow on actual data.\n\nFirst, prepare a samplesheet with your input data that looks as follows:\n\n`samplesheet.csv`:\n\n```csv\nid,fasta\nspecies1,species1_proteins.fasta\nspecies2,species2_proteins.fasta\n```\n\nEach row represents a fasta file of proteins from a single species.\n\nNow, you can run the pipeline using:\n\n```bash\nnextflow run nf-core/proteinannotator \\\n -profile <docker/singularity/.../institute> \\\n --input samplesheet.csv \\\n --outdir <OUTDIR>\n```\n\n> [!WARNING]\n> Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_; see [docs](https://nf-co.re/docs/usage/getting_started/configuration#custom-configuration-files).\n\nFor more details and further functionality, please refer to the [usage documentation](https://nf-co.re/proteinannotator/usage) and the [parameter documentation](https://nf-co.re/proteinannotator/parameters).\n\n## Pipeline output\n\nTo see the results of an example test run with a full size dataset refer to the [results](https://nf-co.re/proteinannotator/results) tab on the nf-core website pipeline page.\nFor more details about the output files and reports, please refer to the\n[output documentation](https://nf-co.re/proteinannotator/output).\n\n## Credits\n\nnf-core/proteinannotator was originally written by Olga Botvinnik.\n\nWe thank the following people for their extensive assistance in the development of this pipeline:\n\n- [Evangelos Karatzas](https://github.com/vagkaratzas)\n- [Martin Beracochea](https://github.com/mberacochea)\n\n## Contributions and Support\n\nIf you would like to contribute to this pipeline, please see the [contributing guidelines](.github/CONTRIBUTING.md).\n\nFor further information or help, don't hesitate to get in touch on the [Slack `#proteinannotator` channel](https://nfcore.slack.com/channels/proteinannotator) (you can join with [this invite](https://nf-co.re/join/slack)).\n\n## Citations\n\n<!-- TODO nf-core: Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file. -->\n<!-- If you use nf-core/proteinannotator for your analysis, please cite it using the following doi: [10.5281/zenodo.XXXXXX](https://doi.org/10.5281/zenodo.XXXXXX) -->\n\nAn extensive list of references for the tools used by the pipeline can be found in the [`CITATIONS.md`](CITATIONS.md) file.\n\nYou can cite the `nf-core` publication as follows:\n\n> **The nf-core framework for community-curated bioinformatics pipelines.**\n>\n> Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.\n>\n> _Nat Biotechnol._ 2020 Feb 13. doi: [10.1038/s41587-020-0439-x](https://dx.doi.org/10.1038/s41587-020-0439-x).\n", |
| 25 | + "datePublished": "2026-01-26T11:02:55+00:00", |
| 26 | + "description": "<h1>\n <picture>\n <source media=\"(prefers-color-scheme: dark)\" srcset=\"docs/images/nf-core-proteinannotator_logo_dark.png\">\n <img alt=\"nf-core/proteinannotator\" src=\"docs/images/nf-core-proteinannotator_logo_light.png\">\n </picture>\n</h1>\n\n[](https://github.com/codespaces/new/nf-core/proteinannotator)\n[](https://github.com/nf-core/proteinannotator/actions/workflows/nf-test.yml)\n[](https://github.com/nf-core/proteinannotator/actions/workflows/linting.yml)[](https://nf-co.re/proteinannotator/results)[](https://doi.org/10.5281/zenodo.XXXXXXX)\n[](https://www.nf-test.com)\n\n[](https://www.nextflow.io/)\n[](https://github.com/nf-core/tools/releases/tag/3.5.1)\n[](https://docs.conda.io/en/latest/)\n[](https://www.docker.com/)\n[](https://sylabs.io/docs/)\n[](https://cloud.seqera.io/launch?pipeline=https://github.com/nf-core/proteinannotator)\n\n[](https://nfcore.slack.com/channels/proteinannotator)[](https://bsky.app/profile/nf-co.re)[](https://mstdn.science/@nf_core)[](https://www.youtube.com/c/nf-core)\n\n## Introduction\n\n**nf-core/proteinannotator** is a bioinformatics pipeline that runs statistics of input protein fasta files and identifies\nprotein annotations such as conserved domains, functions and secondary structure features, based on their sequence data.\n\n<p>\n <picture>\n <source media=\"(prefers-color-scheme: dark)\" srcset=\"docs/images/proteinannotator_metromap_dark.png\">\n <img alt=\"Protein annotator metromap. Protein fasta files are summarized with `seqkit stats`, then functionally annotated with InterProScan, DIAMOND-blastp, UniFire, and Kmerseek\" src=\"docs/images/proteinannotator_metromap_light.png\">\n </picture>\n</p>\n\n### Check quality and pre-process\n\nGenerate input amino acid sequence statistics with ([`SeqFu`](https://github.com/telatin/seqfu2/)) and pre-process them (i.e., gap removal, convert to upper case, validate, filter by length, replace special characters such as `/`, and remove duplicate sequences) with ([`SeqKit`](https://github.com/shenwei356/seqkit/))\n\n### Annotate sequences\n\n1. Conserved domain annotation with ([`hmmer`](https://github.com/EddyRivasLab/hmmer/)) against databases\n such as [Pfam](https://ftp.ebi.ac.uk/pub/databases/Pfam/) and [FunFam](https://download.cathdb.info/cath/releases/all-releases/)\n2. Functional annotation:\n - ([`InterProScan`](https://interproscan-docs.readthedocs.io/en/v5/)) a software tool used to analyze protein sequences by scanning them against the signatures of protein families, domains, and sites in the [InterPro](https://www.ebi.ac.uk/interpro/) database, helping to identify their functional characteristics.\n3. Predict secondary structure compositional features such as \u03b1-helices, \u03b2-strands and coils with ([`s4pred`](https://github.com/psipred/s4pred))\n4. Present QC stats for input sequences before and after initial pre-processing with ([`MultiQC`](http://multiqc.info/))\n\n## Usage\n\n> [!NOTE]\n> If you are new to Nextflow and nf-core, please refer to [this page](https://nf-co.re/docs/usage/installation) on how to set-up Nextflow. Make sure to [test your setup](https://nf-co.re/docs/usage/introduction#how-to-run-a-pipeline) with `-profile test` before running the workflow on actual data.\n\nFirst, prepare a samplesheet with your input data that looks as follows:\n\n`samplesheet.csv`:\n\n```csv\nid,fasta\nspecies1,species1_proteins.fasta\nspecies2,species2_proteins.fasta\n```\n\nEach row represents a fasta file of proteins from a single species.\n\nNow, you can run the pipeline using:\n\n```bash\nnextflow run nf-core/proteinannotator \\\n -profile <docker/singularity/.../institute> \\\n --input samplesheet.csv \\\n --outdir <OUTDIR>\n```\n\n> [!WARNING]\n> Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_; see [docs](https://nf-co.re/docs/usage/getting_started/configuration#custom-configuration-files).\n\nFor more details and further functionality, please refer to the [usage documentation](https://nf-co.re/proteinannotator/usage) and the [parameter documentation](https://nf-co.re/proteinannotator/parameters).\n\n## Pipeline output\n\nTo see the results of an example test run with a full size dataset refer to the [results](https://nf-co.re/proteinannotator/results) tab on the nf-core website pipeline page.\nFor more details about the output files and reports, please refer to the\n[output documentation](https://nf-co.re/proteinannotator/output).\n\n## Credits\n\nnf-core/proteinannotator was originally written by Olga Botvinnik.\n\nWe thank the following people for their extensive assistance in the development of this pipeline:\n\n- [Evangelos Karatzas](https://github.com/vagkaratzas)\n- [Martin Beracochea](https://github.com/mberacochea)\n\n## Contributions and Support\n\nIf you would like to contribute to this pipeline, please see the [contributing guidelines](.github/CONTRIBUTING.md).\n\nFor further information or help, don't hesitate to get in touch on the [Slack `#proteinannotator` channel](https://nfcore.slack.com/channels/proteinannotator) (you can join with [this invite](https://nf-co.re/join/slack)).\n\n## Citations\n\n<!-- TODO nf-core: Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file. -->\n<!-- If you use nf-core/proteinannotator for your analysis, please cite it using the following doi: [10.5281/zenodo.XXXXXX](https://doi.org/10.5281/zenodo.XXXXXX) -->\n\nAn extensive list of references for the tools used by the pipeline can be found in the [`CITATIONS.md`](CITATIONS.md) file.\n\nYou can cite the `nf-core` publication as follows:\n\n> **The nf-core framework for community-curated bioinformatics pipelines.**\n>\n> Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.\n>\n> _Nat Biotechnol._ 2020 Feb 13. doi: [10.1038/s41587-020-0439-x](https://dx.doi.org/10.1038/s41587-020-0439-x).\n", |
27 | 27 | "hasPart": [ |
28 | 28 | { |
29 | 29 | "@id": "main.nf" |
|
102 | 102 | }, |
103 | 103 | "mentions": [ |
104 | 104 | { |
105 | | - "@id": "#a5905431-da26-41f5-864f-9f8c411d51f6" |
| 105 | + "@id": "#87053b52-5b88-415f-a7f6-46735e3173cc" |
106 | 106 | } |
107 | 107 | ], |
108 | 108 | "name": "nf-core/proteinannotator" |
|
135 | 135 | } |
136 | 136 | ], |
137 | 137 | "dateCreated": "", |
138 | | - "dateModified": "2026-01-26T10:59:12Z", |
| 138 | + "dateModified": "2026-01-26T11:02:55Z", |
139 | 139 | "dct:conformsTo": "https://bioschemas.org/profiles/ComputationalWorkflow/1.0-RELEASE/", |
140 | 140 | "keywords": [ |
141 | 141 | "nf-core", |
|
179 | 179 | "version": "!>=25.10.0" |
180 | 180 | }, |
181 | 181 | { |
182 | | - "@id": "#a5905431-da26-41f5-864f-9f8c411d51f6", |
| 182 | + "@id": "#87053b52-5b88-415f-a7f6-46735e3173cc", |
183 | 183 | "@type": "TestSuite", |
184 | 184 | "instance": [ |
185 | 185 | { |
186 | | - "@id": "#79fab9de-6d3f-4d3a-b4ce-c0bd58f4709a" |
| 186 | + "@id": "#323dbd90-e7fd-4f62-82e0-a6c201e7a3d7" |
187 | 187 | } |
188 | 188 | ], |
189 | 189 | "mainEntity": { |
|
192 | 192 | "name": "Test suite for nf-core/proteinannotator" |
193 | 193 | }, |
194 | 194 | { |
195 | | - "@id": "#79fab9de-6d3f-4d3a-b4ce-c0bd58f4709a", |
| 195 | + "@id": "#323dbd90-e7fd-4f62-82e0-a6c201e7a3d7", |
196 | 196 | "@type": "TestInstance", |
197 | 197 | "name": "GitHub Actions workflow for testing nf-core/proteinannotator", |
198 | 198 | "resource": "repos/nf-core/proteinannotator/actions/workflows/nf-test.yml", |
|
0 commit comments