@@ -221,6 +221,115 @@ Users can override the default mapping with `ambiguous_aminoacid_map_override`.
221221< br></br>
222222
223223
224+ # # Organisms
225+
226+ This project utilizes a dataset of 164 distinct organisms for training and fine-tuning. The full dataset, including genomic sequences and codon adaptation indices, can be found at [Zenodo (DOI: 10.5281/zenodo.12509224)](https://zenodo.org/records/12509224).
227+
228+ # ## Summary by Kingdom
229+
230+ | Kingdom | Number of Organisms |
231+ | :------- | :------------------ |
232+ | Bacteria | 142 |
233+ | Archaea | 11 |
234+ | Plantae | 5 |
235+ | Animalia | 5 |
236+ | Fungi | 1 |
237+ | ** Total** | ** 164** |
238+
239+ # ## Detailed Breakdown by Clade
240+
241+ # ### Animalia
242+
243+ | Class | Number of Organisms | Organism Names |
244+ | :------------- | :------------------ | :----------------------------------------------- |
245+ | Mammalia | 2 | * Homo sapiens* , * Mus musculus* |
246+ | Insecta | 1 | * Drosophila melanogaster* |
247+ | Actinopterygii | 1 | * Danio rerio* (zebrafish) |
248+ | Chromadorea | 1 | * Caenorhabditis elegans* |
249+
250+ # ### Plantae
251+
252+ The list includes two plant species, a green alga, and their corresponding chloroplasts, which are treated as separate entries in the dataset.
253+
254+ | Phylum | Class | Number of Organisms | Organism Names |
255+ | :----------- | :------------ | :------------------ | :---------------------------------------------------------------------------- |
256+ | Tracheophyta | Magnoliopsida | 3 | * Arabidopsis thaliana* , * Nicotiana tabacum* , * Nicotiana tabacum chloroplast* |
257+ | Chlorophyta | Chlorophyceae | 2 | * Chlamydomonas reinhardtii* , * Chlamydomonas reinhardtii chloroplast* |
258+
259+ # ### Fungi
260+
261+ | Phylum | Class | Number of Organisms | Organism Name |
262+ | :--------- | :-------------- | :------------------ | :------------------------- |
263+ | Ascomycota | Saccharomycetes | 1 | * Saccharomyces cerevisiae* |
264+
265+ # ### Archaea
266+
267+ The archaeal organisms in this dataset are all extremophiles.
268+
269+ | Phylum | Class | Number of Organisms | Genera |
270+ | :----------- | :----------- | :------------------ | :------------------------------ |
271+ | Euryarchaeota| Thermococci | 10 | * Pyrococcus* , * Thermococcus* |
272+ | Crenarchaeota| Thermoprotei | 1 | * Saccharolobus* |
273+
274+ # ### Bacteria
275+
276+ The vast majority of the organisms in the dataset are bacteria, primarily from the phylum Pseudomonadota (formerly Proteobacteria).
277+
278+ | Phylum | Number of Organisms | Notable Genera |
279+ | :------------ | :------------------ | :------------------------------------------------------------------------------------------------ |
280+ | Pseudomonadota| 141 | * Escherichia* , * Salmonella* , * Klebsiella* , * Pseudomonas* , * Yersinia* , * Serratia* , * Enterobacter* , * Proteus* , etc. |
281+ | Bacillota | 1 | * Bacillus* |
282+
283+ # ## Full Organism List
284+
285+ The model supports the following 164 organisms. Organisms can be referenced by their name or by their corresponding ID (0-163) when using the tool.
286+
287+ | | | | |
288+ | :-- | :-- | :-- | :-- |
289+ | Arabidopsis thaliana | Enterobacter hormaechei | Klebsiella variicola | Proteus penneri |
290+ | Atlantibacter hermannii | Enterobacter kobei | Kosakonia cowanii | Proteus terrae subsp. cibarius |
291+ | Bacillus subtilis | Enterobacter ludwigii | Kosakonia radicincitans | Proteus vulgaris |
292+ | Brenneria goodwinii | Enterobacter mori | Leclercia adecarboxylata | Providencia alcalifaciens |
293+ | Buchnera aphidicola (Schizaphis graminum) | Enterobacter quasiroggenkampii | Lelliottia amnigena | Providencia heimbachae |
294+ | Caenorhabditis elegans | Enterobacter roggenkampii | Lonsdalea populi | Providencia rettgeri |
295+ | Candidatus Erwinia haradaeae | Enterobacter sichuanensis | Moellerella wisconsensis | Providencia rustigianii |
296+ | Candidatus Hamiltonella defensa 5AT (Acyrthosiphon pisum) | Erwinia amylovora CFBP1430 | Morganella morganii | Providencia stuartii |
297+ | Chlamydomonas reinhardtii | Erwinia persicina | Mus musculus | Providencia thailandensis |
298+ | Chlamydomonas reinhardtii chloroplast | Escherichia albertii | Nicotiana tabacum | Pseudomonas putida |
299+ | Citrobacter amalonaticus | Escherichia coli general | Nicotiana tabacum chloroplast | Pyrococcus furiosus |
300+ | Citrobacter braakii | Escherichia coli O157-H7 str. Sakai | Obesumbacterium proteus | Pyrococcus horikoshii |
301+ | Citrobacter cronae | Escherichia coli str. K-12 substr. MG1655 | Pantoea agglomerans | Pyrococcus yayanosii |
302+ | Citrobacter europaeus | Escherichia fergusonii | Pantoea allii | Rahnella aquatilis CIP 78.65 = ATCC 33071 |
303+ | Citrobacter farmeri | Escherichia marmotae | Pantoea ananatis PA13 | Raoultella ornithinolytica |
304+ | Citrobacter freundii | Escherichia ruysiae | Pantoea dispersa | Raoultella planticola |
305+ | Citrobacter koseri ATCC BAA-895 | Ewingella americana | Pantoea stewartii | Raoultella terrigena |
306+ | Citrobacter portucalensis | Hafnia alvei | Pantoea vagans | Rosenbergiella epipactidis |
307+ | Citrobacter werkmanii | Hafnia paralvei | Pectobacterium aroidearum | Rouxiella badensis |
308+ | Citrobacter youngae | Homo sapiens | Pectobacterium atrosepticum | Saccharomyces cerevisiae |
309+ | Cronobacter dublinensis subsp. dublinensis LMG 23823 | Kalamiella piersonii | Pectobacterium brasiliense | Saccharolobus solfataricus |
310+ | Cronobacter malonaticus LMG 23826 | Klebsiella aerogenes | Pectobacterium carotovorum | Salmonella bongori N268-08 |
311+ | Cronobacter sakazakii | Klebsiella grimontii | Pectobacterium odoriferum | Salmonella enterica subsp. enterica serovar Typhimurium str. LT2 |
312+ | Cronobacter turicensis | Klebsiella michiganensis | Pectobacterium parmentieri | Serratia bockelmannii |
313+ | Danio rerio | Klebsiella oxytoca | Pectobacterium polaris | Serratia entomophila |
314+ | Dickeya dadantii 3937 | Klebsiella pasteurii | Pectobacterium versatile | Serratia ficaria |
315+ | Dickeya dianthicola | Klebsiella pneumoniae subsp. pneumoniae HS11286 | Photorhabdus laumondii subsp. laumondii TTO1 | Serratia fonticola |
316+ | Dickeya fangzhongdai | Klebsiella quasipneumoniae | Plesiomonas shigelloides | Serratia grimesii |
317+ | Dickeya solani | Klebsiella quasivariicola | Pluralibacter gergoviae | Serratia liquefaciens |
318+ | Dickeya zeae | Thermoccoccus kodakarensis | Proteus faecis | Serratia marcescens |
319+ | Drosophila melanogaster | Thermococcus barophilus MPT | Proteus mirabilis HI4320 | Serratia nevei |
320+ | Edwardsiella anguillarum ET080813 | Thermococcus chitonophagus | Yersinia aldovae 670-83 | Serratia plymuthica AS9 |
321+ | Edwardsiella ictaluri | Thermococcus gammatolerans | Yersinia aleksiciae | Serratia proteamaculans |
322+ | Edwardsiella piscicida | Thermococcus litoralis | Yersinia alsatica | Serratia quinivorans |
323+ | Edwardsiella tarda | Thermococcus onnurineus | Yersinia enterocolitica | Serratia rubidaea |
324+ | Enterobacter asburiae | Thermococcus sibiricus | Yersinia frederiksenii ATCC 33641 | Serratia ureilytica |
325+ | Enterobacter bugandensis | Xenorhabdus bovienii str. feltiae Florida | Yersinia intermedia | Shigella boydii |
326+ | Enterobacter cancerogenus | Yersinia kristensenii | Yersinia massiliensis CCUG 53443 | Shigella dysenteriae |
327+ | Enterobacter chengduensis | Yersinia mollaretii ATCC 43969 | Yersinia pestis A1122 | Shigella flexneri 2a str. 301 |
328+ | Enterobacter cloacae | Yersinia proxima | Yersinia pseudotuberculosis IP 32953 | Shigella sonnei |
329+ | Yersinia rochesterensis | Yersinia rohdei | Yersinia ruckeri | Yokenella regensburgei |
330+
331+ < br><br>
332+
224333# # Star History
225334< p align=" center" >
226335
0 commit comments