You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Feature to pass multiple datasets of paired-end reads.
7
+
- Scripts to generate the target definition using the accession number instead of the GI number have been updated. Additional scripts have been added to facilitate the creation and changes of the customized databases.
8
+
- Include updated README_CLARK.txt
9
+
- New download scripts `download_data_newest.sh` and `download_data_release.sh`
Copy file name to clipboardExpand all lines: README.md
+45-12Lines changed: 45 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,10 @@
1
+
# CuCLARK
2
+
1
3
ABOUT
2
4
-----
3
-
CuCLARK is a metagenomic classifier for CUDA-enabled GPUs, based on CLARK (http://clark.cs.ucr.edu/).
5
+
CuCLARK is a metagenomic classifier for CUDA-enabled GPUs, based on CLARK (http://clark.cs.ucr.edu/).
6
+
For implementation details and speed comparison see the corresponding paper [Accelerating metagenomic read classification on CUDA-enabled GPUs](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-1434-6). CuCLARK [v1.0](https://github.com/Funatiq/cuclark/releases/tag/v1.0) was used in the paper and has since been updated (see `CHANGELOG.md` for details).
7
+
4
8
5
9
The program comes in two variants: CuCLARK and CuCLARK-l.
6
10
CuCLARK is designed for workstations which can provide enough RAM to fit large databases
@@ -78,7 +82,7 @@ details on these scripts.
78
82
79
83
SOFTWARE & SYSTEM REQUIREMENTS
80
84
-----
81
-
1) C++ COMPILER VERSION
85
+
1) C++ COMPILER VERSION
82
86
The main requirement is a 64-bit operating system (Linux or Mac), and the GNU GCC to
83
87
compile version 4.4 or higher. Multi-threading operations are assured by the openmp
84
88
libraries. If these libraries are not installed, CuCLARK will run in single-threaded
@@ -113,13 +117,30 @@ INSTALLATION
113
117
Copy the whole "CuCLARK" folder to hard disk and execute the installation script (`./install.sh`).
114
118
The installer builds binaries (CuCLARK and CuCLARK-l, in the subfolder "exe").
115
119
120
+
SCRIPTS
121
+
-----
116
122
In the main folder, you can also notice that several scripts are available.
117
123
Especially:
118
124
-`set_targets.sh` and `classify_metagenome.sh`: They allow you to classify your metagenomes
119
125
against several database(s) (downloaded from NCBI or available "locally" in your disk).
120
126
See section "CLASSIFICATION OF METAGENOMIC SAMPLES" for details.
127
+
121
128
-`download_data.sh`, `download_taxondata.sh` and `make_metadata.sh` are called by `set_targets.sh` to download a specific database and taxonomy tree data from NCBI, and to associate the genomes of the database with the corresponding taxons, respectively. Although it is possible to use these scripts on their own, we recommend to simply use `set_targets.sh` to carry out all necessery steps.
122
129
130
+
-`download_data.sh` downloads bacteria, viruses or human genomes from NCBI like the original CLARK.
131
+
132
+
-`download_data.sh` can be replaced with `download_data_newest.sh` or `download_data_release.sh`
133
+
to download the newest NCBI RefSeq genomes or the genomes of the latest NCBI RefSeq release. These scripts allow to download any database included in RefSeq like archaea, bacteria, fungi, etc..
134
+
135
+
-`clean.sh`: This script will delete permanently all data related (generated and
136
+
downloaded) of the database directory defined in set_targets.h.
137
+
138
+
-`resetCustomDB.sh`: It resets the targets definition with sequences (newly
139
+
added/modified) of the customized database. Any call of this script must be
140
+
followed by a run of set_target.sh.
141
+
142
+
-`updateTaxonomy.sh`: To download the latest taxonomy data (taxonomy id, accession numbers, etc.) from the NCBI website.
143
+
123
144
124
145
125
146
Following is a version of CLARK's usage guide adjusted to CuCLARK's needs.
@@ -137,7 +158,7 @@ Definitions of parameters:
137
158
`-k <kmerSize>`, k-mer length: integer, >= 2 and <= 32.
138
159
The default value for this parameter is 31, except for CuCLARK-l (it is 27).
139
160
140
-
`-T <fileTargets>`, The targets definition is written in fileTargets: filename.
161
+
`-T <fileTargets>`, The targets definition is written in fileTargets: filename.
141
162
This is a two-column file (separated by space, comma or tab), such that, for each line:
142
163
column 1: the filename of a reference sequence
143
164
column 2: the target ID (taxon name, or taxonomy ID, ...) of the reference sequence
@@ -148,15 +169,15 @@ Definitions of parameters:
148
169
The default value is 0. For example, for 1 (or, 2), the program will discard any
149
170
discriminative k-mer that appears only once (or, less than twice).
150
171
151
-
`-D <directoryDB/>`, Directory of the database : pathname.
172
+
`-D <directoryDB/>`, Directory of the database : pathname.
0 commit comments