Skip to content

Commit 3da49a9

Browse files
authored
Merge pull request #61 from nf-core/add-hmmsearch
Add hmmsearch
2 parents 76e1a3e + 9217108 commit 3da49a9

25 files changed

Lines changed: 887 additions & 12 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ Initial release of nf-core/proteinannotator, created with the [nf-core](https://
99

1010
### `Added`
1111

12+
- [#61](https://github.com/nf-core/proteinannotator/pull/61) - Added nf-core modules `ARIA2` and `HMMER_HMMSEARCH` to download latest Pfam HMM library (or use path to existing one) and match domains to input sequences. (by @vagkaratzas)
1213
- [#60](https://github.com/nf-core/proteinannotator/pull/60) - Added nf-core module `S4PRED_RUNMODEL` for secondary structure prediction (i.e., α-helix, a β-strand or a coil). (by @vagkaratzas)
1314
- [#59](https://github.com/nf-core/proteinannotator/pull/59) - Added nf-core qc and pre-processing subworkflow for amino acid sequences `FAA_SEQFU_SEQKIT`. (by @vagkaratzas)
1415
- [#57](https://github.com/nf-core/proteinannotator/pull/57) - nf-core tools template update to 3.5.1. (by @vagkaratzas)

CITATIONS.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,10 @@
1818

1919
> Shen W, Sipos B, Zhao L. SeqKit2: A Swiss army knife for sequence and alignment processing. iMeta. 2024 Apr 5:e191. doi: 10.1002/imt2.191. PubMed PMID: 38898985; PubMed Central PMCID: PMC11183193.
2020
21+
- [hmmer](https://pubmed.ncbi.nlm.nih.gov/29905871/)
22+
23+
> Eddy SR. Accelerated profile HMM searches. PLoS computational biology. 2011 Oct 20;7(10):e1002195. doi: 10.1371/journal.pcbi.1002195. PubMed PMID: 22039361; PubMed Central PMCID: PMC3197634.
24+
2125
- [InterProScan](https://academic.oup.com/bioinformatics/article/17/9/847/206564)
2226

2327
> Zdobnov, Evgeni M., and Rolf Apweiler. “InterProScan – an Integration Platform for the Signature-Recognition Methods in InterPro.” Bioinformatics 17, no. 9 (September 1, 2001): 847–48. https://doi.org/10.1093/bioinformatics/17.9.847.

conf/modules.config

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -74,6 +74,24 @@ process {
7474
]
7575
}
7676

77+
withName: 'NFCORE_PROTEINANNOTATOR:PROTEINANNOTATOR:DOMAIN_ANNOTATION:ARIA2' {
78+
publishDir = [
79+
path: { "${params.outdir}/downloaded_dbs/" },
80+
mode: params.publish_dir_mode,
81+
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
82+
]
83+
}
84+
85+
withName: 'NFCORE_PROTEINANNOTATOR:PROTEINANNOTATOR:DOMAIN_ANNOTATION:HMMER_HMMSEARCH' {
86+
ext.args = { "-E ${params.hmmsearch_evalue_cutoff}" }
87+
publishDir = [
88+
path: { "${params.outdir}/domain_annotation/pfam/" },
89+
mode: params.publish_dir_mode,
90+
pattern: "*.domtbl.gz",
91+
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
92+
]
93+
}
94+
7795
withName: 'NFCORE_PROTEINANNOTATOR:PROTEINANNOTATOR:S4PRED_RUNMODEL' {
7896
ext.prefix = { params.s4pred_outfmt }
7997
ext.args = { "--outfmt ${params.s4pred_outfmt}" }

conf/test.config

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -23,10 +23,9 @@ params {
2323
config_profile_description = 'Minimal test dataset to check pipeline function'
2424

2525
// Input data
26-
// TODO nf-core: Specify the paths to your test data on nf-core/test-datasets
27-
// TODO nf-core: Give any required params for the test so that command line flags are not needed
28-
// From: https://github.com/nf-core/proteinfold/blob/1.1.1/conf/test.config
29-
// Example: https://github.com/nf-core/test-datasets/blob/proteinfold/testdata/samplesheet/v1.2/samplesheet.csv
3026
input = params.pipelines_testdata_base_path + 'proteinfold/testdata/samplesheet/v1.2/samplesheet.csv'
27+
// Domain annotation
28+
pfam_latest_link = params.pipelines_testdata_base_path + 'proteinannotator/testdata/pfam/Pfam-A_test.hmm.gz'
29+
// Functional annotation
3130
skip_interproscan = true
3231
}

conf/test_full.config

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ params {
1515
config_profile_description = 'Full test dataset to check pipeline function'
1616

1717
// Input data for full size test
18-
// TODO nf-core: Specify the paths to your full test data ( on nf-core/test-datasets or directly in repositories, e.g. SRA)
19-
// TODO nf-core: Give any required params for the test so that command line flags are not needed
2018
input = params.pipelines_testdata_base_path + 'proteinannotator/samplesheet/snap25-isoforms.csv'
19+
// Domain annotation
20+
pfam_latest_link = params.pipelines_testdata_base_path + 'proteinannotator/testdata/pfam/Pfam-A_test.hmm.gz'
2121
}

docs/output.md

Lines changed: 35 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,11 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
1616
- [SeqFu](#seqfu) for input amino acid sequences quality check (QC)
1717
- [SeqKit](#seqkit) for preprocessing input amino acid sequences (i.e., gap removal, convert to upper case, validate, filter by length, replace special characters such as `/`, and remove duplicate sequences)
1818

19-
- [Functional Annotation](#functional-annotation) Annotate proteins with functional domains
19+
- [Domain annotation](#domain-annotation) Annotate proteins with domains from established repositories.
20+
- [aria2](#aria2) - To optionally download the latest Pfam database through the pipeline.
21+
- [hmmer](#hmmer) - To optionally match the input sequence to known Pfam domains through `hmmer/hmmsearch`
22+
23+
- [Functional annotation](#functional-annotation) Annotate proteins with functional domains
2024
- [InterProScan](#Interproscan) - Search the InterPro database for functional domains
2125

2226
- [s4pred](#s4pred) - Predict secondary structures of sequences, producing per amino acid probabilities of being an α-helix, a β-strand or a coil.
@@ -60,7 +64,36 @@ The `seqkit` module is used for initial preprocessing (i.e., gap removal, conver
6064

6165
[SeqKit](https://github.com/shenwei356/seqkit) is a cross-platform and ultrafast toolkit for FASTA/Q file manipulation.
6266

63-
### Functional Annotation
67+
### Domain annotation
68+
69+
#### aria2
70+
71+
<details markdown="1">
72+
<summary>Output files</summary>
73+
74+
- `downloaded_dbs/`
75+
- `Pfam-A*.hmm.gz`: (optional) The latest full, or a minimal test, Pfam-A HMM database that can be downloaded through the pipeline.
76+
77+
</details>
78+
79+
[aria2](https://github.com/aria2/aria2/) is a lightweight multi-protocol & multi-source, cross platform download utility operated in command-line. It supports HTTP/HTTPS, FTP, SFTP, BitTorrent and Metalink.
80+
81+
#### hmmer
82+
83+
<details markdown="1">
84+
<summary>Output files</summary>
85+
86+
- `domain_annotation/`
87+
- `pfam/`
88+
- `<samplename>.domtbl.gz`: `hmmer/hmmsearch` results along parameters info.
89+
90+
</details>
91+
92+
The `domain_annotation/pfam` folder contains a `.domtbl.gz` annotation file per input sample.
93+
94+
[hmmer](https://github.com/EddyRivasLab/hmmer) is a fast and flexible alignment trimming tool that keeps phylogenetically informative sites and removes others.
95+
96+
### Functional annotation
6497

6598
#### InterProScan
6699

main.nf

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,9 @@ workflow NFCORE_PROTEINANNOTATOR {
4040
PROTEINANNOTATOR (
4141
samplesheet,
4242
params.skip_preprocessing,
43+
params.skip_pfam,
44+
params.pfam_latest_link,
45+
params.pfam_db,
4346
params.skip_s4pred
4447
)
4548
emit:

modules.json

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,16 @@
55
"https://github.com/nf-core/modules.git": {
66
"modules": {
77
"nf-core": {
8+
"aria2": {
9+
"branch": "master",
10+
"git_sha": "41dfa3f7c0ffabb96a6a813fe321c6d1cc5b6e46",
11+
"installed_by": ["modules"]
12+
},
13+
"hmmer/hmmsearch": {
14+
"branch": "master",
15+
"git_sha": "41dfa3f7c0ffabb96a6a813fe321c6d1cc5b6e46",
16+
"installed_by": ["modules"]
17+
},
818
"mmseqs/search": {
919
"branch": "master",
1020
"git_sha": "81880787133db07d9b4c1febd152c090eb8325dc",

modules/nf-core/aria2/environment.yml

Lines changed: 7 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

modules/nf-core/aria2/main.nf

Lines changed: 47 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)