Skip to content

Commit b859941

Browse files
authored
Merge pull request #65 from nf-core/update-metromap
Update metromap and readme
2 parents cd6f617 + e72cb1b commit b859941

10 files changed

Lines changed: 43 additions & 32 deletions

.github/PULL_REQUEST_TEMPLATE.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,4 +24,3 @@ Learn more about contributing: [CONTRIBUTING.md](https://github.com/nf-core/prot
2424
- [ ] Output Documentation in `docs/output.md` is updated.
2525
- [ ] `CHANGELOG.md` is updated.
2626
- [ ] `README.md` is updated (including new tool citations and authors/contributors).
27-
- [ ] `docs/images/proteinannotator-metromap.light.excalidraw.png` and `docs/images/proteinannotator-metromap.light.excalidraw.png` (edit the light version only, then export and turn on dark mode) are both updated (use the excalidraw [website](https://excalidraw.com/) or [VS Code](https://marketplace.visualstudio.com/items?itemName=pomdtr.excalidraw-editor) plugin to edit)

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ Initial release of nf-core/proteinannotator, created with the [nf-core](https://
99

1010
### `Added`
1111

12+
- [#65](https://github.com/nf-core/proteinannotator/pull/65) - Converted the pipeline schematic to nf-core metromap. (by @vagkaratzas)
1213
- [#62](https://github.com/nf-core/proteinannotator/pull/62) - Added the option to download and use the latest FunFam HMM library (or use path to an existing one) for domain annotation. (by @vagkaratzas)
1314
- [#61](https://github.com/nf-core/proteinannotator/pull/61) - Added nf-core modules `ARIA2` and `HMMER_HMMSEARCH` to download latest Pfam HMM library (or use path to existing one) and match domains to input sequences. (by @vagkaratzas)
1415
- [#60](https://github.com/nf-core/proteinannotator/pull/60) - Added nf-core module `S4PRED_RUNMODEL` for secondary structure prediction (i.e., α-helix, a β-strand or a coil). (by @vagkaratzas)

README.md

Lines changed: 19 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -21,30 +21,28 @@
2121

2222
## Introduction
2323

24-
**nf-core/proteinannotator** is a bioinformatics pipeline that runs statistics of input protein fasta files and identifies the function of proteins based on their sequence data, using state-of-the-art protein annotation tools such as [InterProScan](https://interproscan-docs.readthedocs.io/).
24+
**nf-core/proteinannotator** is a bioinformatics pipeline that runs statistics of input protein fasta files and identifies
25+
protein annotations such as conserved domains, functions and secondary structure features, based on their sequence data.
2526

26-
<!-- TODO nf-core:
27-
Complete this sentence with a 2-3 sentence summary of what types of data the pipeline ingests, a brief overview of the
28-
major pipeline sections and the types of output it produces. You're giving an overview to someone new
29-
to nf-core here, in 15-20 seconds. For an example, see https://github.com/nf-core/rnaseq/blob/master/README.md#introduction
30-
-->
27+
<p>
28+
<picture>
29+
<source media="(prefers-color-scheme: dark)" srcset="docs/images/proteinannotator_metromap_dark.png">
30+
<img alt="Protein annotator metromap. Protein fasta files are summarized with `seqkit stats`, then functionally annotated with InterProScan, DIAMOND-blastp, UniFire, and Kmerseek" src="docs/images/proteinannotator_metromap_light.png">
31+
</picture>
32+
</p>
3133

32-
<!-- TODO nf-core: Include a figure that guides the user through the major workflow steps. Many nf-core
34+
### Check quality and pre-process
3335

34-
workflows use the "tube map" design for that. See https://nf-co.re/docs/contributing/design_guidelines#examples for examples. -->
35-
<!-- TODO nf-core: Fill in short bullet-pointed list of the default steps in the pipeline -->
36+
Generate input amino acid sequence statistics with ([`SeqFu`](https://github.com/telatin/seqfu2/)) and pre-process them (i.e., gap removal, convert to upper case, validate, filter by length, replace special characters such as `/`, and remove duplicate sequences) with ([`SeqKit`](https://github.com/shenwei356/seqkit/))
3637

37-
1. Run ([`seqkit stats`](https://bioinf.shenwei.me/seqkit/usage/#stats)) to summarize input protein fasta files
38-
2. Functional Annotation:
39-
1. ([`InterProScan`](https://interproscan-docs.readthedocs.io/en/v5/)) a software tool used to analyze protein sequences by scanning them against the signatures of protein families, domains, and sites in the [InterPro](https://www.ebi.ac.uk/interpro/) database, helping to identify their functional characteristics.
40-
3. Present QC for raw reads ([`MultiQC`](http://multiqc.info/))
38+
### Annotate sequences
4139

42-
<h1>
43-
<picture>
44-
<source media="(prefers-color-scheme: dark)" srcset="docs/images/proteinannotator-metromap.dark.excalidraw.png">
45-
<img alt="Protein annotator metromap. Protein fasta files are summarized with `seqkit stats`, then functionally annotated with InterProScan, DIAMOND-blastp, UniFire, and Kmerseek" src="docs/images/proteinannotator-metromap.light.excalidraw.png">
46-
</picture>
47-
</h1>
40+
1. Conserved domain annotation with ([`hmmer`](https://github.com/EddyRivasLab/hmmer/)) against databases
41+
such as [Pfam](https://ftp.ebi.ac.uk/pub/databases/Pfam/) and [FunFam](https://download.cathdb.info/cath/releases/all-releases/)
42+
2. Functional annotation:
43+
- ([`InterProScan`](https://interproscan-docs.readthedocs.io/en/v5/)) a software tool used to analyze protein sequences by scanning them against the signatures of protein families, domains, and sites in the [InterPro](https://www.ebi.ac.uk/interpro/) database, helping to identify their functional characteristics.
44+
3. Predict secondary structure compositional features such as α-helices, β-strands and coils with ([`s4pred`](https://github.com/psipred/s4pred))
45+
4. Present QC stats for input sequences before and after initial pre-processing with ([`MultiQC`](http://multiqc.info/))
4846

4947
## Usage
5048

@@ -65,8 +63,6 @@ Each row represents a fasta file of proteins from a single species.
6563

6664
Now, you can run the pipeline using:
6765

68-
<!-- TODO nf-core: update the following command to include all required parameters for a minimal example -->
69-
7066
```bash
7167
nextflow run nf-core/proteinannotator \
7268
-profile <docker/singularity/.../institute> \
@@ -91,7 +87,8 @@ nf-core/proteinannotator was originally written by Olga Botvinnik.
9187

9288
We thank the following people for their extensive assistance in the development of this pipeline:
9389

94-
<!-- TODO nf-core: If applicable, make list of people who have also contributed -->
90+
- [Evangelos Karatzas](https://github.com/vagkaratzas)
91+
- [Martin Beracochea](https://github.com/mberacochea)
9592

9693
## Contributions and Support
9794

@@ -104,8 +101,6 @@ For further information or help, don't hesitate to get in touch on the [Slack `#
104101
<!-- TODO nf-core: Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file. -->
105102
<!-- If you use nf-core/proteinannotator for your analysis, please cite it using the following doi: [10.5281/zenodo.XXXXXX](https://doi.org/10.5281/zenodo.XXXXXX) -->
106103

107-
<!-- TODO nf-core: Add bibliography of tools and data used in your pipeline -->
108-
109104
An extensive list of references for the tools used by the pipeline can be found in the [`CITATIONS.md`](CITATIONS.md) file.
110105

111106
You can cite the `nf-core` publication as follows:
-300 KB
Binary file not shown.
-300 KB
Binary file not shown.
234 KB
Loading

docs/images/proteinannotator_metromap_dark.svg

Lines changed: 4 additions & 0 deletions
Loading
227 KB
Loading

docs/images/proteinannotator_metromap_light.svg

Lines changed: 4 additions & 0 deletions
Loading

ro-crate-metadata.json

Lines changed: 15 additions & 7 deletions
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)