You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+1Lines changed: 1 addition & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,6 +9,7 @@ Initial release of nf-core/proteinannotator, created with the [nf-core](https://
9
9
10
10
### `Added`
11
11
12
+
-[#68](https://github.com/nf-core/proteinannotator/pull/68) - Using the `ARIA2` and `UNTAR` nf-core modules to download and decompress the InterProScan database. (by @vagkaratzas)
12
13
-[#67](https://github.com/nf-core/proteinannotator/pull/67) - Swapped to the updated, non-buggy, nf-core version of `INTERPROSCAN`. (by @vagkaratzas)
13
14
-[#65](https://github.com/nf-core/proteinannotator/pull/65) - Converted the pipeline schematic to nf-core metromap. (by @vagkaratzas)
14
15
-[#62](https://github.com/nf-core/proteinannotator/pull/62) - Added the option to download and use the latest FunFam HMM library (or use path to an existing one) for domain annotation. (by @vagkaratzas)
Copy file name to clipboardExpand all lines: docs/output.md
+13-11Lines changed: 13 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,12 +14,14 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
14
14
-[SeqFu](#seqfu) for input amino acid sequences quality check (QC)
15
15
-[SeqKit](#seqkit) for preprocessing input amino acid sequences (i.e., gap removal, convert to upper case, validate, filter by length, replace special characters such as `/`, and remove duplicate sequences)
16
16
17
+
-[Database download](#database-download) Optionally download selected databases for annotation.
18
+
-[aria2](#aria2) - To optionally download the Pfam, FunFam, and/or InterProScan databases through the pipeline.
19
+
17
20
-[Domain annotation](#domain-annotation) Annotate proteins with domains from established repositories.
18
-
-[aria2](#aria2) - To optionally download the latest Pfam and/or FunFam databases through the pipeline.
19
21
-[hmmer](#hmmer) - To optionally match the input sequence to known Pfam and/or FunFam domains through `hmmer/hmmsearch`
20
22
21
23
-[Functional annotation](#functional-annotation) Annotate proteins with functional domains
22
-
-[InterProScan](#Interproscan) - Search the InterPro database for functional domains
24
+
-[InterProScan](#Interproscan) - Search the InterProScan database for functional domains
23
25
24
26
-[s4pred](#s4pred) - Predict secondary structures of sequences, producing per amino acid probabilities of being an α-helix, a β-strand or a coil.
25
27
@@ -62,23 +64,28 @@ The `seqkit` module is used for initial preprocessing (i.e., gap removal, conver
62
64
63
65
[SeqKit](https://github.com/shenwei356/seqkit) is a cross-platform and ultrafast toolkit for FASTA/Q file manipulation.
64
66
65
-
### Domain annotation
67
+
### Database download
66
68
67
69
#### aria2
68
70
69
71
<detailsmarkdown="1">
70
72
<summary>Output files</summary>
71
73
72
74
-`downloaded_dbs/`
75
+
-`interproscan_db/`: (optional) uncompressed archive data from the downloaded InterProScan database
76
+
-`*/`: (optional) one directory for each of the member databases of InterProScan
73
77
-`Pfam-A*.hmm.gz`: (optional) The latest full, or a minimal test, Pfam-A HMM database that can be downloaded through the pipeline.
78
+
-`interproscan_test.tar.gz`: (optional) the downloaded InterProScan archive of member databases according to the optional user-provided url
74
79
-`funfam-hmm3-v4_3_0*.lib.gz`: (optional) The latest (v4_3_0) full, or a minimal test, FunFam HMM database that can be downloaded through the pipeline.
75
80
76
81
</details>
77
82
78
-
If the `skip_*` flags (e.g., `skip_pfam`, `skip_funfam`) for each domain annotation database is set to `true`, or the `*_db` parameter paths (e.g., `pfam_db`, `funfam_db`) are set (i.e., not `null`), or the run is resumed after a successful database download, then the respective database will not be (re)downloaded. The full database links can be found in the main `nextflow.config` file, while minimal test versions can be found in the `test` and `test_full` profiles (i.e., `conf/test.config`, `conf/test_full.config`).
83
+
If the `skip_*` flags (e.g., `skip_pfam`, `skip_funfam`, `skip_interproscan`) for each annotation database is set to `true`, or the `*_db` parameter paths (e.g., `pfam_db`, `funfam_db`, `interproscan_db`) are set (i.e., not `null`), or the run is resumed after a successful database download, then the respective database will not be (re)downloaded. The full database links can be found in the main `nextflow.config` file, while minimal test versions can be found in the `test` and `test_full` profiles (i.e., `conf/test.config`, `conf/test_full.config`).
79
84
80
85
[aria2](https://github.com/aria2/aria2/) is a lightweight multi-protocol & multi-source, cross platform download utility operated in command-line. It supports HTTP/HTTPS, FTP, SFTP, BitTorrent and Metalink.
81
86
87
+
### Domain annotation
88
+
82
89
#### hmmer
83
90
84
91
<detailsmarkdown="1">
@@ -103,10 +110,6 @@ Each of the `domain_annotation/` subfolders (e.g., `pfam`, `funfam`) contain a `
103
110
<detailsmarkdown="1">
104
111
<summary>Output files</summary>
105
112
106
-
-`downloaded_dbs/`
107
-
-`data/`: (optional) uncompressed archive data from the downloaded InterProScan database
108
-
-`*/`: (optional) one directory for each of the member databases of InterProScan
109
-
-`interproscan_test.tar.gz`: (optional) the downloaded InterProScan archive of member databases according to the optional user-provided url
110
113
-`functional_annotation/`
111
114
-`interproscan/`
112
115
-`<samplename>/`
@@ -117,9 +120,8 @@ Each of the `domain_annotation/` subfolders (e.g., `pfam`, `funfam`) contain a `
117
120
118
121
</details>
119
122
120
-
[InterProScan](https://interproscan-docs.readthedocs.io/en/v5/#) is a protein annotation tool that searches [InterPro](http://www.ebi.ac.uk/interpro/), a database which integrates predictive information about protein function from a number of member resources, giving an overview of the families that a protein belongs to and the domains and sites it contains.
121
-
For `nf-core/proteinannotator`, the default database applications that are used to functionally annotate sequences include
122
-
Hamap, PANTHER, PIRSF, TIGRFAM and sfld. The main `nextflow.config` contains a [url]("https://ftp.ebi.ac.uk/pub/software/unix/iprscan/5/5.72-103.0/interproscan-5.72-103.0-64-bit.tar.gz") parameter (`--interproscan_db_url`) for the full version of the InterProScan database. If, instead, a local database is provided via the `--interproscan_db` parameter, then the download is skipped.
123
+
[InterProScan](https://interproscan-docs.readthedocs.io/en/v5/#) is a protein annotation tool that searches [InterPro](http://www.ebi.ac.uk/interpro/), a database which integrates predictive information about protein function from a number of member resources, giving an overview of the families that a protein belongs to and the domains and sites it contains. The default database applications that are used to functionally annotate sequences include
124
+
Hamap, PANTHER, PIRSF, TIGRFAM and sfld, and are set through the `--interproscan_applications` parameter.
123
125
124
126
See also [InterProScan output documentation](https://interproscan-docs.readthedocs.io/en/v5/), where most of these examples are taken from.
0 commit comments