Skip to content

Commit e72cb1b

Browse files
committed
readme update
1 parent 2082d66 commit e72cb1b

2 files changed

Lines changed: 26 additions & 31 deletions

File tree

README.md

Lines changed: 19 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -21,30 +21,28 @@
2121

2222
## Introduction
2323

24-
**nf-core/proteinannotator** is a bioinformatics pipeline that runs statistics of input protein fasta files and identifies the function of proteins based on their sequence data, using state-of-the-art protein annotation tools such as [InterProScan](https://interproscan-docs.readthedocs.io/).
24+
**nf-core/proteinannotator** is a bioinformatics pipeline that runs statistics of input protein fasta files and identifies
25+
protein annotations such as conserved domains, functions and secondary structure features, based on their sequence data.
2526

26-
<!-- TODO nf-core:
27-
Complete this sentence with a 2-3 sentence summary of what types of data the pipeline ingests, a brief overview of the
28-
major pipeline sections and the types of output it produces. You're giving an overview to someone new
29-
to nf-core here, in 15-20 seconds. For an example, see https://github.com/nf-core/rnaseq/blob/master/README.md#introduction
30-
-->
31-
32-
<!-- TODO nf-core: Include a figure that guides the user through the major workflow steps. Many nf-core
33-
34-
workflows use the "tube map" design for that. See https://nf-co.re/docs/contributing/design_guidelines#examples for examples. -->
35-
<!-- TODO nf-core: Fill in short bullet-pointed list of the default steps in the pipeline -->
36-
37-
1. Run ([`seqkit stats`](https://bioinf.shenwei.me/seqkit/usage/#stats)) to summarize input protein fasta files
38-
2. Functional Annotation:
39-
1. ([`InterProScan`](https://interproscan-docs.readthedocs.io/en/v5/)) a software tool used to analyze protein sequences by scanning them against the signatures of protein families, domains, and sites in the [InterPro](https://www.ebi.ac.uk/interpro/) database, helping to identify their functional characteristics.
40-
3. Present QC for raw reads ([`MultiQC`](http://multiqc.info/))
41-
42-
<h1>
27+
<p>
4328
<picture>
4429
<source media="(prefers-color-scheme: dark)" srcset="docs/images/proteinannotator_metromap_dark.png">
4530
<img alt="Protein annotator metromap. Protein fasta files are summarized with `seqkit stats`, then functionally annotated with InterProScan, DIAMOND-blastp, UniFire, and Kmerseek" src="docs/images/proteinannotator_metromap_light.png">
4631
</picture>
47-
</h1>
32+
</p>
33+
34+
### Check quality and pre-process
35+
36+
Generate input amino acid sequence statistics with ([`SeqFu`](https://github.com/telatin/seqfu2/)) and pre-process them (i.e., gap removal, convert to upper case, validate, filter by length, replace special characters such as `/`, and remove duplicate sequences) with ([`SeqKit`](https://github.com/shenwei356/seqkit/))
37+
38+
### Annotate sequences
39+
40+
1. Conserved domain annotation with ([`hmmer`](https://github.com/EddyRivasLab/hmmer/)) against databases
41+
such as [Pfam](https://ftp.ebi.ac.uk/pub/databases/Pfam/) and [FunFam](https://download.cathdb.info/cath/releases/all-releases/)
42+
2. Functional annotation:
43+
- ([`InterProScan`](https://interproscan-docs.readthedocs.io/en/v5/)) a software tool used to analyze protein sequences by scanning them against the signatures of protein families, domains, and sites in the [InterPro](https://www.ebi.ac.uk/interpro/) database, helping to identify their functional characteristics.
44+
3. Predict secondary structure compositional features such as α-helices, β-strands and coils with ([`s4pred`](https://github.com/psipred/s4pred))
45+
4. Present QC stats for input sequences before and after initial pre-processing with ([`MultiQC`](http://multiqc.info/))
4846

4947
## Usage
5048

@@ -65,8 +63,6 @@ Each row represents a fasta file of proteins from a single species.
6563

6664
Now, you can run the pipeline using:
6765

68-
<!-- TODO nf-core: update the following command to include all required parameters for a minimal example -->
69-
7066
```bash
7167
nextflow run nf-core/proteinannotator \
7268
-profile <docker/singularity/.../institute> \
@@ -91,7 +87,8 @@ nf-core/proteinannotator was originally written by Olga Botvinnik.
9187

9288
We thank the following people for their extensive assistance in the development of this pipeline:
9389

94-
<!-- TODO nf-core: If applicable, make list of people who have also contributed -->
90+
- [Evangelos Karatzas](https://github.com/vagkaratzas)
91+
- [Martin Beracochea](https://github.com/mberacochea)
9592

9693
## Contributions and Support
9794

@@ -104,8 +101,6 @@ For further information or help, don't hesitate to get in touch on the [Slack `#
104101
<!-- TODO nf-core: Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file. -->
105102
<!-- If you use nf-core/proteinannotator for your analysis, please cite it using the following doi: [10.5281/zenodo.XXXXXX](https://doi.org/10.5281/zenodo.XXXXXX) -->
106103

107-
<!-- TODO nf-core: Add bibliography of tools and data used in your pipeline -->
108-
109104
An extensive list of references for the tools used by the pipeline can be found in the [`CITATIONS.md`](CITATIONS.md) file.
110105

111106
You can cite the `nf-core` publication as follows:

0 commit comments

Comments
 (0)