Skip to content

Commit 6a45669

Browse files
authored
Add organism info
1 parent 998be51 commit 6a45669

1 file changed

Lines changed: 109 additions & 0 deletions

File tree

README.md

Lines changed: 109 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -221,6 +221,115 @@ Users can override the default mapping with `ambiguous_aminoacid_map_override`.
221221
<br></br>
222222

223223

224+
## Organisms
225+
226+
This project utilizes a dataset of 164 distinct organisms for training and fine-tuning. The full dataset, including genomic sequences and codon adaptation indices, can be found at [Zenodo (DOI: 10.5281/zenodo.12509224)](https://zenodo.org/records/12509224).
227+
228+
### Summary by Kingdom
229+
230+
| Kingdom | Number of Organisms |
231+
| :------- | :------------------ |
232+
| Bacteria | 142 |
233+
| Archaea | 11 |
234+
| Plantae | 5 |
235+
| Animalia | 5 |
236+
| Fungi | 1 |
237+
| **Total**| **164** |
238+
239+
### Detailed Breakdown by Clade
240+
241+
#### Animalia
242+
243+
| Class | Number of Organisms | Organism Names |
244+
| :------------- | :------------------ | :----------------------------------------------- |
245+
| Mammalia | 2 | *Homo sapiens*, *Mus musculus* |
246+
| Insecta | 1 | *Drosophila melanogaster* |
247+
| Actinopterygii | 1 | *Danio rerio* (zebrafish) |
248+
| Chromadorea | 1 | *Caenorhabditis elegans* |
249+
250+
#### Plantae
251+
252+
The list includes two plant species, a green alga, and their corresponding chloroplasts, which are treated as separate entries in the dataset.
253+
254+
| Phylum | Class | Number of Organisms | Organism Names |
255+
| :----------- | :------------ | :------------------ | :---------------------------------------------------------------------------- |
256+
| Tracheophyta | Magnoliopsida | 3 | *Arabidopsis thaliana*, *Nicotiana tabacum*, *Nicotiana tabacum chloroplast* |
257+
| Chlorophyta | Chlorophyceae | 2 | *Chlamydomonas reinhardtii*, *Chlamydomonas reinhardtii chloroplast* |
258+
259+
#### Fungi
260+
261+
| Phylum | Class | Number of Organisms | Organism Name |
262+
| :--------- | :-------------- | :------------------ | :------------------------- |
263+
| Ascomycota | Saccharomycetes | 1 | *Saccharomyces cerevisiae* |
264+
265+
#### Archaea
266+
267+
The archaeal organisms in this dataset are all extremophiles.
268+
269+
| Phylum | Class | Number of Organisms | Genera |
270+
| :----------- | :----------- | :------------------ | :------------------------------ |
271+
| Euryarchaeota| Thermococci | 10 | *Pyrococcus*, *Thermococcus* |
272+
| Crenarchaeota| Thermoprotei | 1 | *Saccharolobus* |
273+
274+
#### Bacteria
275+
276+
The vast majority of the organisms in the dataset are bacteria, primarily from the phylum Pseudomonadota (formerly Proteobacteria).
277+
278+
| Phylum | Number of Organisms | Notable Genera |
279+
| :------------ | :------------------ | :------------------------------------------------------------------------------------------------ |
280+
| Pseudomonadota| 141 | *Escherichia*, *Salmonella*, *Klebsiella*, *Pseudomonas*, *Yersinia*, *Serratia*, *Enterobacter*, *Proteus*, etc. |
281+
| Bacillota | 1 | *Bacillus* |
282+
283+
### Full Organism List
284+
285+
The model supports the following 164 organisms. Organisms can be referenced by their name or by their corresponding ID (0-163) when using the tool.
286+
287+
| | | | |
288+
| :-- | :-- | :-- | :-- |
289+
| Arabidopsis thaliana | Enterobacter hormaechei | Klebsiella variicola | Proteus penneri |
290+
| Atlantibacter hermannii | Enterobacter kobei | Kosakonia cowanii | Proteus terrae subsp. cibarius |
291+
| Bacillus subtilis | Enterobacter ludwigii | Kosakonia radicincitans | Proteus vulgaris |
292+
| Brenneria goodwinii | Enterobacter mori | Leclercia adecarboxylata | Providencia alcalifaciens |
293+
| Buchnera aphidicola (Schizaphis graminum) | Enterobacter quasiroggenkampii | Lelliottia amnigena | Providencia heimbachae |
294+
| Caenorhabditis elegans | Enterobacter roggenkampii | Lonsdalea populi | Providencia rettgeri |
295+
| Candidatus Erwinia haradaeae | Enterobacter sichuanensis | Moellerella wisconsensis | Providencia rustigianii |
296+
| Candidatus Hamiltonella defensa 5AT (Acyrthosiphon pisum) | Erwinia amylovora CFBP1430 | Morganella morganii | Providencia stuartii |
297+
| Chlamydomonas reinhardtii | Erwinia persicina | Mus musculus | Providencia thailandensis |
298+
| Chlamydomonas reinhardtii chloroplast | Escherichia albertii | Nicotiana tabacum | Pseudomonas putida |
299+
| Citrobacter amalonaticus | Escherichia coli general | Nicotiana tabacum chloroplast | Pyrococcus furiosus |
300+
| Citrobacter braakii | Escherichia coli O157-H7 str. Sakai | Obesumbacterium proteus | Pyrococcus horikoshii |
301+
| Citrobacter cronae | Escherichia coli str. K-12 substr. MG1655 | Pantoea agglomerans | Pyrococcus yayanosii |
302+
| Citrobacter europaeus | Escherichia fergusonii | Pantoea allii | Rahnella aquatilis CIP 78.65 = ATCC 33071 |
303+
| Citrobacter farmeri | Escherichia marmotae | Pantoea ananatis PA13 | Raoultella ornithinolytica |
304+
| Citrobacter freundii | Escherichia ruysiae | Pantoea dispersa | Raoultella planticola |
305+
| Citrobacter koseri ATCC BAA-895 | Ewingella americana | Pantoea stewartii | Raoultella terrigena |
306+
| Citrobacter portucalensis | Hafnia alvei | Pantoea vagans | Rosenbergiella epipactidis |
307+
| Citrobacter werkmanii | Hafnia paralvei | Pectobacterium aroidearum | Rouxiella badensis |
308+
| Citrobacter youngae | Homo sapiens | Pectobacterium atrosepticum | Saccharomyces cerevisiae |
309+
| Cronobacter dublinensis subsp. dublinensis LMG 23823 | Kalamiella piersonii | Pectobacterium brasiliense | Saccharolobus solfataricus |
310+
| Cronobacter malonaticus LMG 23826 | Klebsiella aerogenes | Pectobacterium carotovorum | Salmonella bongori N268-08 |
311+
| Cronobacter sakazakii | Klebsiella grimontii | Pectobacterium odoriferum | Salmonella enterica subsp. enterica serovar Typhimurium str. LT2 |
312+
| Cronobacter turicensis | Klebsiella michiganensis | Pectobacterium parmentieri | Serratia bockelmannii |
313+
| Danio rerio | Klebsiella oxytoca | Pectobacterium polaris | Serratia entomophila |
314+
| Dickeya dadantii 3937 | Klebsiella pasteurii | Pectobacterium versatile | Serratia ficaria |
315+
| Dickeya dianthicola | Klebsiella pneumoniae subsp. pneumoniae HS11286 | Photorhabdus laumondii subsp. laumondii TTO1 | Serratia fonticola |
316+
| Dickeya fangzhongdai | Klebsiella quasipneumoniae | Plesiomonas shigelloides | Serratia grimesii |
317+
| Dickeya solani | Klebsiella quasivariicola | Pluralibacter gergoviae | Serratia liquefaciens |
318+
| Dickeya zeae | Thermoccoccus kodakarensis | Proteus faecis | Serratia marcescens |
319+
| Drosophila melanogaster | Thermococcus barophilus MPT | Proteus mirabilis HI4320 | Serratia nevei |
320+
| Edwardsiella anguillarum ET080813 | Thermococcus chitonophagus | Yersinia aldovae 670-83 | Serratia plymuthica AS9 |
321+
| Edwardsiella ictaluri | Thermococcus gammatolerans | Yersinia aleksiciae | Serratia proteamaculans |
322+
| Edwardsiella piscicida | Thermococcus litoralis | Yersinia alsatica | Serratia quinivorans |
323+
| Edwardsiella tarda | Thermococcus onnurineus | Yersinia enterocolitica | Serratia rubidaea |
324+
| Enterobacter asburiae | Thermococcus sibiricus | Yersinia frederiksenii ATCC 33641 | Serratia ureilytica |
325+
| Enterobacter bugandensis | Xenorhabdus bovienii str. feltiae Florida | Yersinia intermedia | Shigella boydii |
326+
| Enterobacter cancerogenus | Yersinia kristensenii | Yersinia massiliensis CCUG 53443 | Shigella dysenteriae |
327+
| Enterobacter chengduensis | Yersinia mollaretii ATCC 43969 | Yersinia pestis A1122 | Shigella flexneri 2a str. 301 |
328+
| Enterobacter cloacae | Yersinia proxima | Yersinia pseudotuberculosis IP 32953 | Shigella sonnei |
329+
| Yersinia rochesterensis | Yersinia rohdei | Yersinia ruckeri | Yokenella regensburgei |
330+
331+
<br><br>
332+
224333
## Star History
225334
<p align="center">
226335

0 commit comments

Comments
 (0)