You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: .github/PULL_REQUEST_TEMPLATE.md
-1Lines changed: 0 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -24,4 +24,3 @@ Learn more about contributing: [CONTRIBUTING.md](https://github.com/nf-core/prot
24
24
-[ ] Output Documentation in `docs/output.md` is updated.
25
25
-[ ]`CHANGELOG.md` is updated.
26
26
-[ ]`README.md` is updated (including new tool citations and authors/contributors).
27
-
-[ ]`docs/images/proteinannotator-metromap.light.excalidraw.png` and `docs/images/proteinannotator-metromap.light.excalidraw.png` (edit the light version only, then export and turn on dark mode) are both updated (use the excalidraw [website](https://excalidraw.com/) or [VS Code](https://marketplace.visualstudio.com/items?itemName=pomdtr.excalidraw-editor) plugin to edit)
Copy file name to clipboardExpand all lines: CHANGELOG.md
+1Lines changed: 1 addition & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,6 +9,7 @@ Initial release of nf-core/proteinannotator, created with the [nf-core](https://
9
9
10
10
### `Added`
11
11
12
+
-[#65](https://github.com/nf-core/proteinannotator/pull/65) - Converted the pipeline schematic to nf-core metromap. (by @vagkaratzas)
12
13
-[#62](https://github.com/nf-core/proteinannotator/pull/62) - Added the option to download and use the latest FunFam HMM library (or use path to an existing one) for domain annotation. (by @vagkaratzas)
13
14
-[#61](https://github.com/nf-core/proteinannotator/pull/61) - Added nf-core modules `ARIA2` and `HMMER_HMMSEARCH` to download latest Pfam HMM library (or use path to existing one) and match domains to input sequences. (by @vagkaratzas)
14
15
-[#60](https://github.com/nf-core/proteinannotator/pull/60) - Added nf-core module `S4PRED_RUNMODEL` for secondary structure prediction (i.e., α-helix, a β-strand or a coil). (by @vagkaratzas)
Copy file name to clipboardExpand all lines: README.md
+19-24Lines changed: 19 additions & 24 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -21,30 +21,28 @@
21
21
22
22
## Introduction
23
23
24
-
**nf-core/proteinannotator** is a bioinformatics pipeline that runs statistics of input protein fasta files and identifies the function of proteins based on their sequence data, using state-of-the-art protein annotation tools such as [InterProScan](https://interproscan-docs.readthedocs.io/).
24
+
**nf-core/proteinannotator** is a bioinformatics pipeline that runs statistics of input protein fasta files and identifies
25
+
protein annotations such as conserved domains, functions and secondary structure features, based on their sequence data.
25
26
26
-
<!-- TODO nf-core:
27
-
Complete this sentence with a 2-3 sentence summary of what types of data the pipeline ingests, a brief overview of the
28
-
major pipeline sections and the types of output it produces. You're giving an overview to someone new
29
-
to nf-core here, in 15-20 seconds. For an example, see https://github.com/nf-core/rnaseq/blob/master/README.md#introduction
<img alt="Protein annotator metromap. Protein fasta files are summarized with `seqkit stats`, then functionally annotated with InterProScan, DIAMOND-blastp, UniFire, and Kmerseek" src="docs/images/proteinannotator_metromap_light.png">
31
+
</picture>
32
+
</p>
31
33
32
-
<!-- TODO nf-core: Include a figure that guides the user through the major workflow steps. Many nf-core
34
+
### Check quality and pre-process
33
35
34
-
workflows use the "tube map" design for that. See https://nf-co.re/docs/contributing/design_guidelines#examples for examples. -->
35
-
<!-- TODO nf-core: Fill in short bullet-pointed list of the default steps in the pipeline -->
36
+
Generate input amino acid sequence statistics with ([`SeqFu`](https://github.com/telatin/seqfu2/)) and pre-process them (i.e., gap removal, convert to upper case, validate, filter by length, replace special characters such as `/`, and remove duplicate sequences) with ([`SeqKit`](https://github.com/shenwei356/seqkit/))
36
37
37
-
1. Run ([`seqkit stats`](https://bioinf.shenwei.me/seqkit/usage/#stats)) to summarize input protein fasta files
38
-
2. Functional Annotation:
39
-
1. ([`InterProScan`](https://interproscan-docs.readthedocs.io/en/v5/)) a software tool used to analyze protein sequences by scanning them against the signatures of protein families, domains, and sites in the [InterPro](https://www.ebi.ac.uk/interpro/) database, helping to identify their functional characteristics.
40
-
3. Present QC for raw reads ([`MultiQC`](http://multiqc.info/))
<img alt="Protein annotator metromap. Protein fasta files are summarized with `seqkit stats`, then functionally annotated with InterProScan, DIAMOND-blastp, UniFire, and Kmerseek" src="docs/images/proteinannotator-metromap.light.excalidraw.png">
46
-
</picture>
47
-
</h1>
40
+
1. Conserved domain annotation with ([`hmmer`](https://github.com/EddyRivasLab/hmmer/)) against databases
41
+
such as [Pfam](https://ftp.ebi.ac.uk/pub/databases/Pfam/) and [FunFam](https://download.cathdb.info/cath/releases/all-releases/)
42
+
2. Functional annotation:
43
+
- ([`InterProScan`](https://interproscan-docs.readthedocs.io/en/v5/)) a software tool used to analyze protein sequences by scanning them against the signatures of protein families, domains, and sites in the [InterPro](https://www.ebi.ac.uk/interpro/) database, helping to identify their functional characteristics.
44
+
3. Predict secondary structure compositional features such as α-helices, β-strands and coils with ([`s4pred`](https://github.com/psipred/s4pred))
45
+
4. Present QC stats for input sequences before and after initial pre-processing with ([`MultiQC`](http://multiqc.info/))
48
46
49
47
## Usage
50
48
@@ -65,8 +63,6 @@ Each row represents a fasta file of proteins from a single species.
65
63
66
64
Now, you can run the pipeline using:
67
65
68
-
<!-- TODO nf-core: update the following command to include all required parameters for a minimal example -->
69
-
70
66
```bash
71
67
nextflow run nf-core/proteinannotator \
72
68
-profile <docker/singularity/.../institute> \
@@ -91,7 +87,8 @@ nf-core/proteinannotator was originally written by Olga Botvinnik.
91
87
92
88
We thank the following people for their extensive assistance in the development of this pipeline:
93
89
94
-
<!-- TODO nf-core: If applicable, make list of people who have also contributed -->
@@ -104,8 +101,6 @@ For further information or help, don't hesitate to get in touch on the [Slack `#
104
101
<!-- TODO nf-core: Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file. -->
105
102
<!-- If you use nf-core/proteinannotator for your analysis, please cite it using the following doi: [10.5281/zenodo.XXXXXX](https://doi.org/10.5281/zenodo.XXXXXX) -->
106
103
107
-
<!-- TODO nf-core: Add bibliography of tools and data used in your pipeline -->
108
-
109
104
An extensive list of references for the tools used by the pipeline can be found in the [`CITATIONS.md`](CITATIONS.md) file.
110
105
111
106
You can cite the `nf-core` publication as follows:
0 commit comments