Skip to content

Commit 9c476b2

Browse files
committed
aria2 and untar for ips db
1 parent f8fdee9 commit 9c476b2

11 files changed

Lines changed: 436 additions & 77 deletions

File tree

conf/modules.config

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -110,7 +110,15 @@ process {
110110
]
111111
}
112112

113-
withName: 'NFCORE_PROTEINANNOTATOR:PROTEINANNOTATOR:FUNCTIONAL_ANNOTATION:INTERPROSCAN_DATABASE' {
113+
withName: 'NFCORE_PROTEINANNOTATOR:PROTEINANNOTATOR:FUNCTIONAL_ANNOTATION:ARIA2' {
114+
publishDir = [
115+
path: { "${params.outdir}/downloaded_dbs/" },
116+
mode: params.publish_dir_mode,
117+
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
118+
]
119+
}
120+
121+
withName: 'NFCORE_PROTEINANNOTATOR:PROTEINANNOTATOR:FUNCTIONAL_ANNOTATION:UNTAR' {
114122
publishDir = [
115123
path: { "${params.outdir}/downloaded_dbs/" },
116124
mode: params.publish_dir_mode,

docs/output.md

Lines changed: 13 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -14,12 +14,14 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
1414
- [SeqFu](#seqfu) for input amino acid sequences quality check (QC)
1515
- [SeqKit](#seqkit) for preprocessing input amino acid sequences (i.e., gap removal, convert to upper case, validate, filter by length, replace special characters such as `/`, and remove duplicate sequences)
1616

17+
- [Database download](#database-download) Optionally download selected databases for annotation.
18+
- [aria2](#aria2) - To optionally download the Pfam, FunFam, and/or InterProScan databases through the pipeline.
19+
1720
- [Domain annotation](#domain-annotation) Annotate proteins with domains from established repositories.
18-
- [aria2](#aria2) - To optionally download the latest Pfam and/or FunFam databases through the pipeline.
1921
- [hmmer](#hmmer) - To optionally match the input sequence to known Pfam and/or FunFam domains through `hmmer/hmmsearch`
2022

2123
- [Functional annotation](#functional-annotation) Annotate proteins with functional domains
22-
- [InterProScan](#Interproscan) - Search the InterPro database for functional domains
24+
- [InterProScan](#Interproscan) - Search the InterProScan database for functional domains
2325

2426
- [s4pred](#s4pred) - Predict secondary structures of sequences, producing per amino acid probabilities of being an α-helix, a β-strand or a coil.
2527

@@ -62,23 +64,28 @@ The `seqkit` module is used for initial preprocessing (i.e., gap removal, conver
6264

6365
[SeqKit](https://github.com/shenwei356/seqkit) is a cross-platform and ultrafast toolkit for FASTA/Q file manipulation.
6466

65-
### Domain annotation
67+
### Database download
6668

6769
#### aria2
6870

6971
<details markdown="1">
7072
<summary>Output files</summary>
7173

7274
- `downloaded_dbs/`
75+
- `interproscan_db/`: (optional) uncompressed archive data from the downloaded InterProScan database
76+
- `*/`: (optional) one directory for each of the member databases of InterProScan
7377
- `Pfam-A*.hmm.gz`: (optional) The latest full, or a minimal test, Pfam-A HMM database that can be downloaded through the pipeline.
78+
- `interproscan_test.tar.gz`: (optional) the downloaded InterProScan archive of member databases according to the optional user-provided url
7479
- `funfam-hmm3-v4_3_0*.lib.gz`: (optional) The latest (v4_3_0) full, or a minimal test, FunFam HMM database that can be downloaded through the pipeline.
7580

7681
</details>
7782

78-
If the `skip_*` flags (e.g., `skip_pfam`, `skip_funfam`) for each domain annotation database is set to `true`, or the `*_db` parameter paths (e.g., `pfam_db`, `funfam_db`) are set (i.e., not `null`), or the run is resumed after a successful database download, then the respective database will not be (re)downloaded. The full database links can be found in the main `nextflow.config` file, while minimal test versions can be found in the `test` and `test_full` profiles (i.e., `conf/test.config`, `conf/test_full.config`).
83+
If the `skip_*` flags (e.g., `skip_pfam`, `skip_funfam`, `skip_interproscan`) for each annotation database is set to `true`, or the `*_db` parameter paths (e.g., `pfam_db`, `funfam_db`, `interproscan_db`) are set (i.e., not `null`), or the run is resumed after a successful database download, then the respective database will not be (re)downloaded. The full database links can be found in the main `nextflow.config` file, while minimal test versions can be found in the `test` and `test_full` profiles (i.e., `conf/test.config`, `conf/test_full.config`).
7984

8085
[aria2](https://github.com/aria2/aria2/) is a lightweight multi-protocol & multi-source, cross platform download utility operated in command-line. It supports HTTP/HTTPS, FTP, SFTP, BitTorrent and Metalink.
8186

87+
### Domain annotation
88+
8289
#### hmmer
8390

8491
<details markdown="1">
@@ -103,10 +110,6 @@ Each of the `domain_annotation/` subfolders (e.g., `pfam`, `funfam`) contain a `
103110
<details markdown="1">
104111
<summary>Output files</summary>
105112

106-
- `downloaded_dbs/`
107-
- `data/`: (optional) uncompressed archive data from the downloaded InterProScan database
108-
- `*/`: (optional) one directory for each of the member databases of InterProScan
109-
- `interproscan_test.tar.gz`: (optional) the downloaded InterProScan archive of member databases according to the optional user-provided url
110113
- `functional_annotation/`
111114
- `interproscan/`
112115
- `<samplename>/`
@@ -117,9 +120,8 @@ Each of the `domain_annotation/` subfolders (e.g., `pfam`, `funfam`) contain a `
117120

118121
</details>
119122

120-
[InterProScan](https://interproscan-docs.readthedocs.io/en/v5/#) is a protein annotation tool that searches [InterPro](http://www.ebi.ac.uk/interpro/), a database which integrates predictive information about protein function from a number of member resources, giving an overview of the families that a protein belongs to and the domains and sites it contains.
121-
For `nf-core/proteinannotator`, the default database applications that are used to functionally annotate sequences include
122-
Hamap, PANTHER, PIRSF, TIGRFAM and sfld. The main `nextflow.config` contains a [url]("https://ftp.ebi.ac.uk/pub/software/unix/iprscan/5/5.72-103.0/interproscan-5.72-103.0-64-bit.tar.gz") parameter (`--interproscan_db_url`) for the full version of the InterProScan database. If, instead, a local database is provided via the `--interproscan_db` parameter, then the download is skipped.
123+
[InterProScan](https://interproscan-docs.readthedocs.io/en/v5/#) is a protein annotation tool that searches [InterPro](http://www.ebi.ac.uk/interpro/), a database which integrates predictive information about protein function from a number of member resources, giving an overview of the families that a protein belongs to and the domains and sites it contains. The default database applications that are used to functionally annotate sequences include
124+
Hamap, PANTHER, PIRSF, TIGRFAM and sfld, and are set through the `--interproscan_applications` parameter.
123125

124126
See also [InterProScan output documentation](https://interproscan-docs.readthedocs.io/en/v5/), where most of these examples are taken from.
125127

modules.json

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,11 @@
5454
"branch": "master",
5555
"git_sha": "41dfa3f7c0ffabb96a6a813fe321c6d1cc5b6e46",
5656
"installed_by": ["modules"]
57+
},
58+
"untar": {
59+
"branch": "master",
60+
"git_sha": "447f7bc0fa41dfc2400c8cad4c0291880dc060cf",
61+
"installed_by": ["modules"]
5762
}
5863
}
5964
},

modules/local/interproscan/database/main.nf

Lines changed: 0 additions & 35 deletions
This file was deleted.

modules/nf-core/untar/environment.yml

Lines changed: 12 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

modules/nf-core/untar/main.nf

Lines changed: 75 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

modules/nf-core/untar/meta.yml

Lines changed: 73 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

modules/nf-core/untar/tests/main.nf.test

Lines changed: 97 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)