You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+19-24Lines changed: 19 additions & 24 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -21,30 +21,28 @@
21
21
22
22
## Introduction
23
23
24
-
**nf-core/proteinannotator** is a bioinformatics pipeline that runs statistics of input protein fasta files and identifies the function of proteins based on their sequence data, using state-of-the-art protein annotation tools such as [InterProScan](https://interproscan-docs.readthedocs.io/).
24
+
**nf-core/proteinannotator** is a bioinformatics pipeline that runs statistics of input protein fasta files and identifies
25
+
protein annotations such as conserved domains, functions and secondary structure features, based on their sequence data.
25
26
26
-
<!-- TODO nf-core:
27
-
Complete this sentence with a 2-3 sentence summary of what types of data the pipeline ingests, a brief overview of the
28
-
major pipeline sections and the types of output it produces. You're giving an overview to someone new
29
-
to nf-core here, in 15-20 seconds. For an example, see https://github.com/nf-core/rnaseq/blob/master/README.md#introduction
30
-
-->
31
-
32
-
<!-- TODO nf-core: Include a figure that guides the user through the major workflow steps. Many nf-core
33
-
34
-
workflows use the "tube map" design for that. See https://nf-co.re/docs/contributing/design_guidelines#examples for examples. -->
35
-
<!-- TODO nf-core: Fill in short bullet-pointed list of the default steps in the pipeline -->
36
-
37
-
1. Run ([`seqkit stats`](https://bioinf.shenwei.me/seqkit/usage/#stats)) to summarize input protein fasta files
38
-
2. Functional Annotation:
39
-
1. ([`InterProScan`](https://interproscan-docs.readthedocs.io/en/v5/)) a software tool used to analyze protein sequences by scanning them against the signatures of protein families, domains, and sites in the [InterPro](https://www.ebi.ac.uk/interpro/) database, helping to identify their functional characteristics.
40
-
3. Present QC for raw reads ([`MultiQC`](http://multiqc.info/))
<img alt="Protein annotator metromap. Protein fasta files are summarized with `seqkit stats`, then functionally annotated with InterProScan, DIAMOND-blastp, UniFire, and Kmerseek" src="docs/images/proteinannotator_metromap_light.png">
46
31
</picture>
47
-
</h1>
32
+
</p>
33
+
34
+
### Check quality and pre-process
35
+
36
+
Generate input amino acid sequence statistics with ([`SeqFu`](https://github.com/telatin/seqfu2/)) and pre-process them (i.e., gap removal, convert to upper case, validate, filter by length, replace special characters such as `/`, and remove duplicate sequences) with ([`SeqKit`](https://github.com/shenwei356/seqkit/))
37
+
38
+
### Annotate sequences
39
+
40
+
1. Conserved domain annotation with ([`hmmer`](https://github.com/EddyRivasLab/hmmer/)) against databases
41
+
such as [Pfam](https://ftp.ebi.ac.uk/pub/databases/Pfam/) and [FunFam](https://download.cathdb.info/cath/releases/all-releases/)
42
+
2. Functional annotation:
43
+
- ([`InterProScan`](https://interproscan-docs.readthedocs.io/en/v5/)) a software tool used to analyze protein sequences by scanning them against the signatures of protein families, domains, and sites in the [InterPro](https://www.ebi.ac.uk/interpro/) database, helping to identify their functional characteristics.
44
+
3. Predict secondary structure compositional features such as α-helices, β-strands and coils with ([`s4pred`](https://github.com/psipred/s4pred))
45
+
4. Present QC stats for input sequences before and after initial pre-processing with ([`MultiQC`](http://multiqc.info/))
48
46
49
47
## Usage
50
48
@@ -65,8 +63,6 @@ Each row represents a fasta file of proteins from a single species.
65
63
66
64
Now, you can run the pipeline using:
67
65
68
-
<!-- TODO nf-core: update the following command to include all required parameters for a minimal example -->
69
-
70
66
```bash
71
67
nextflow run nf-core/proteinannotator \
72
68
-profile <docker/singularity/.../institute> \
@@ -91,7 +87,8 @@ nf-core/proteinannotator was originally written by Olga Botvinnik.
91
87
92
88
We thank the following people for their extensive assistance in the development of this pipeline:
93
89
94
-
<!-- TODO nf-core: If applicable, make list of people who have also contributed -->
@@ -104,8 +101,6 @@ For further information or help, don't hesitate to get in touch on the [Slack `#
104
101
<!-- TODO nf-core: Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file. -->
105
102
<!-- If you use nf-core/proteinannotator for your analysis, please cite it using the following doi: [10.5281/zenodo.XXXXXX](https://doi.org/10.5281/zenodo.XXXXXX) -->
106
103
107
-
<!-- TODO nf-core: Add bibliography of tools and data used in your pipeline -->
108
-
109
104
An extensive list of references for the tools used by the pipeline can be found in the [`CITATIONS.md`](CITATIONS.md) file.
110
105
111
106
You can cite the `nf-core` publication as follows:
0 commit comments