Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ Initial release of nf-core/proteinannotator, created with the [nf-core](https://

### `Added`

- [#68](https://github.com/nf-core/proteinannotator/pull/68) - Using the `ARIA2` and `UNTAR` nf-core modules to download and decompress the InterProScan database. (by @vagkaratzas)
- [#67](https://github.com/nf-core/proteinannotator/pull/67) - Swapped to the updated, non-buggy, nf-core version of `INTERPROSCAN`. (by @vagkaratzas)
- [#65](https://github.com/nf-core/proteinannotator/pull/65) - Converted the pipeline schematic to nf-core metromap. (by @vagkaratzas)
- [#62](https://github.com/nf-core/proteinannotator/pull/62) - Added the option to download and use the latest FunFam HMM library (or use path to an existing one) for domain annotation. (by @vagkaratzas)
Expand Down
10 changes: 9 additions & 1 deletion conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,15 @@ process {
]
}

withName: 'NFCORE_PROTEINANNOTATOR:PROTEINANNOTATOR:FUNCTIONAL_ANNOTATION:INTERPROSCAN_DATABASE' {
withName: 'NFCORE_PROTEINANNOTATOR:PROTEINANNOTATOR:FUNCTIONAL_ANNOTATION:ARIA2' {
publishDir = [
path: { "${params.outdir}/downloaded_dbs/" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}

withName: 'NFCORE_PROTEINANNOTATOR:PROTEINANNOTATOR:FUNCTIONAL_ANNOTATION:UNTAR' {
publishDir = [
path: { "${params.outdir}/downloaded_dbs/" },
mode: params.publish_dir_mode,
Expand Down
24 changes: 13 additions & 11 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,12 +14,14 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
- [SeqFu](#seqfu) for input amino acid sequences quality check (QC)
- [SeqKit](#seqkit) for preprocessing input amino acid sequences (i.e., gap removal, convert to upper case, validate, filter by length, replace special characters such as `/`, and remove duplicate sequences)

- [Database download](#database-download) Optionally download selected databases for annotation.
- [aria2](#aria2) - To optionally download the Pfam, FunFam, and/or InterProScan databases through the pipeline.

- [Domain annotation](#domain-annotation) Annotate proteins with domains from established repositories.
- [aria2](#aria2) - To optionally download the latest Pfam and/or FunFam databases through the pipeline.
- [hmmer](#hmmer) - To optionally match the input sequence to known Pfam and/or FunFam domains through `hmmer/hmmsearch`

- [Functional annotation](#functional-annotation) Annotate proteins with functional domains
- [InterProScan](#Interproscan) - Search the InterPro database for functional domains
- [InterProScan](#Interproscan) - Search the InterProScan database for functional domains

- [s4pred](#s4pred) - Predict secondary structures of sequences, producing per amino acid probabilities of being an α-helix, a β-strand or a coil.

Expand Down Expand Up @@ -62,23 +64,28 @@ The `seqkit` module is used for initial preprocessing (i.e., gap removal, conver

[SeqKit](https://github.com/shenwei356/seqkit) is a cross-platform and ultrafast toolkit for FASTA/Q file manipulation.

### Domain annotation
### Database download

#### aria2

<details markdown="1">
<summary>Output files</summary>

- `downloaded_dbs/`
- `interproscan_db/`: (optional) uncompressed archive data from the downloaded InterProScan database
- `*/`: (optional) one directory for each of the member databases of InterProScan
- `Pfam-A*.hmm.gz`: (optional) The latest full, or a minimal test, Pfam-A HMM database that can be downloaded through the pipeline.
- `interproscan_test.tar.gz`: (optional) the downloaded InterProScan archive of member databases according to the optional user-provided url
- `funfam-hmm3-v4_3_0*.lib.gz`: (optional) The latest (v4_3_0) full, or a minimal test, FunFam HMM database that can be downloaded through the pipeline.

</details>

If the `skip_*` flags (e.g., `skip_pfam`, `skip_funfam`) for each domain annotation database is set to `true`, or the `*_db` parameter paths (e.g., `pfam_db`, `funfam_db`) are set (i.e., not `null`), or the run is resumed after a successful database download, then the respective database will not be (re)downloaded. The full database links can be found in the main `nextflow.config` file, while minimal test versions can be found in the `test` and `test_full` profiles (i.e., `conf/test.config`, `conf/test_full.config`).
If the `skip_*` flags (e.g., `skip_pfam`, `skip_funfam`, `skip_interproscan`) for each annotation database is set to `true`, or the `*_db` parameter paths (e.g., `pfam_db`, `funfam_db`, `interproscan_db`) are set (i.e., not `null`), or the run is resumed after a successful database download, then the respective database will not be (re)downloaded. The full database links can be found in the main `nextflow.config` file, while minimal test versions can be found in the `test` and `test_full` profiles (i.e., `conf/test.config`, `conf/test_full.config`).

[aria2](https://github.com/aria2/aria2/) is a lightweight multi-protocol & multi-source, cross platform download utility operated in command-line. It supports HTTP/HTTPS, FTP, SFTP, BitTorrent and Metalink.

### Domain annotation

#### hmmer

<details markdown="1">
Expand All @@ -103,10 +110,6 @@ Each of the `domain_annotation/` subfolders (e.g., `pfam`, `funfam`) contain a `
<details markdown="1">
<summary>Output files</summary>

- `downloaded_dbs/`
- `data/`: (optional) uncompressed archive data from the downloaded InterProScan database
- `*/`: (optional) one directory for each of the member databases of InterProScan
- `interproscan_test.tar.gz`: (optional) the downloaded InterProScan archive of member databases according to the optional user-provided url
- `functional_annotation/`
- `interproscan/`
- `<samplename>/`
Expand All @@ -117,9 +120,8 @@ Each of the `domain_annotation/` subfolders (e.g., `pfam`, `funfam`) contain a `

</details>

[InterProScan](https://interproscan-docs.readthedocs.io/en/v5/#) is a protein annotation tool that searches [InterPro](http://www.ebi.ac.uk/interpro/), a database which integrates predictive information about protein function from a number of member resources, giving an overview of the families that a protein belongs to and the domains and sites it contains.
For `nf-core/proteinannotator`, the default database applications that are used to functionally annotate sequences include
Hamap, PANTHER, PIRSF, TIGRFAM and sfld. The main `nextflow.config` contains a [url]("https://ftp.ebi.ac.uk/pub/software/unix/iprscan/5/5.72-103.0/interproscan-5.72-103.0-64-bit.tar.gz") parameter (`--interproscan_db_url`) for the full version of the InterProScan database. If, instead, a local database is provided via the `--interproscan_db` parameter, then the download is skipped.
[InterProScan](https://interproscan-docs.readthedocs.io/en/v5/#) is a protein annotation tool that searches [InterPro](http://www.ebi.ac.uk/interpro/), a database which integrates predictive information about protein function from a number of member resources, giving an overview of the families that a protein belongs to and the domains and sites it contains. The default database applications that are used to functionally annotate sequences include
Hamap, PANTHER, PIRSF, TIGRFAM and sfld, and are set through the `--interproscan_applications` parameter.

See also [InterProScan output documentation](https://interproscan-docs.readthedocs.io/en/v5/), where most of these examples are taken from.

Expand Down
5 changes: 5 additions & 0 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,11 @@
"branch": "master",
"git_sha": "41dfa3f7c0ffabb96a6a813fe321c6d1cc5b6e46",
"installed_by": ["modules"]
},
"untar": {
"branch": "master",
"git_sha": "447f7bc0fa41dfc2400c8cad4c0291880dc060cf",
"installed_by": ["modules"]
}
}
},
Expand Down
35 changes: 0 additions & 35 deletions modules/local/interproscan/database/main.nf

This file was deleted.

12 changes: 12 additions & 0 deletions modules/nf-core/untar/environment.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

75 changes: 75 additions & 0 deletions modules/nf-core/untar/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

73 changes: 73 additions & 0 deletions modules/nf-core/untar/meta.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading