|
| 1 | +# Example dataset for Rabies virus |
| 2 | + |
| 3 | +This is a minimal example dataset for *Lyssavirus rabies*. |
| 4 | + |
| 5 | +# Overview |
| 6 | + |
| 7 | +The phylogenetic tree was generated from all complete *Lyssavirus rabies* genomes obtained from [NCBI Nucleotide](https://ftp.ncbi.nlm.nih.gov/genomes/Viruses/AllNuclMetadata/) (accessed 2025/05/01) with > 95% unambiguous nucleotides and < 15,000 bp. Clade, subclade, and other associated metadata were acquired referencing the original rabies Nextstrain build and [RABV-GLUE's metadata](https://github.com/giffordlabcvr/RABV-GLUE/blob/master/tabular/reference-set-data.tsv). Clade- and subclade-specific mutations [were extracted for all monophyletic major and minor lineages](https://github.com/theiagen/utilities/pull/21), and disregarded for previously assigned clades that were not monophyletic in the phylogeny. Complete genome assemblies were aligned with `mafft v7.525` using default parameters and the phylogeny was reconstructed using `iqtree v2.4.0` with 1000 ultrafast bootstrap iterations and GTR+F+I+R7 selected as the evolutionary model by the *ModelFinder* module. A root node internal to the *L. rabies* phylogeny was selected by reconstructing an additional phylogeny using the same methodology with *L. australis* (NC_003243.1) and *L. gannoruwa* (KU244266.2) included as known outgroups ([Baynard & Fooks, 2021](https://www.sciencedirect.com/science/article/abs/pii/B9780128096338209369)). |
| 8 | + |
| 9 | +# Metadata acquisition |
| 10 | + |
| 11 | +Clade classification metadata was acquired from [RABV-GLUE's metadata](https://github.com/giffordlabcvr/RABV-GLUE/blob/master/tabular/reference-set-data.tsv) (referenced [2025/06/01](https://github.com/giffordlabcvr/RABV-GLUE/blob/357613e78c397e10499e77bbd6f2b5aeeb9d10e6/tabular/reference-set-data.tsv#L4)). Sample, submitter, location, and host metadata was acquired from NCBI using the following `datasets` command: |
| 12 | + |
| 13 | +```bash |
| 14 | +datasets summary virus genome accession \ |
| 15 | + --inputfile <ACCESIONS> \ |
| 16 | + --as-json-lines | \ |
| 17 | +dataformat tsv virus-genome \ |
| 18 | + --fields accession,geo-region,host-common-name,submitter-names,isolate-collection-date |
| 19 | +``` |
| 20 | + |
| 21 | +# Included clades |
| 22 | + |
| 23 | +We conservatively assigned clades as all accessions that descend from a most-recent common ancestor (MRCA) of classified accessions within the metadata. Subclades that were not monophyletic or did not contain uniquely defining mutations were removed. The following are the accounted clades, except where exclusion criteria are noted: |
| 24 | + |
| 25 | +| Clade | Subclade | Exclusion Criteria | |
| 26 | +|-------|----------|-------------------| |
| 27 | +| Africa-2 | - | - | |
| 28 | +| Africa-3 | - | - | |
| 29 | +| Arctic | - | - | |
| 30 | +| Arctic | A | - | |
| 31 | +| Arctic | AL1a | - | |
| 32 | +| Arctic | AL1b | - | |
| 33 | +| Arctic | AL2 | - | |
| 34 | +| Arctic | <s>AL3</s> | Mutations are not unique | |
| 35 | +| Asian | - | - | |
| 36 | +| Asian | SEA1a | - | |
| 37 | +| Asian | SEA1b | - | |
| 38 | +| Asian | SEA2a | - | |
| 39 | +| Asian | SEA2b | - | |
| 40 | +| Asian | SEA3 | - | |
| 41 | +| Asian | SEA4 | - | |
| 42 | +| Asian | SEA5 | - | |
| 43 | +| Bats | - | - | |
| 44 | +| Bats | <s>AP</s> | Mutations are not unique | |
| 45 | +| Bats | DR | - | |
| 46 | +| Bats | <s>EF-E1</s> | Mutations are not unique | |
| 47 | +| Bats | <s>EF-E2</s> | Mutations are not unique | |
| 48 | +| Bats | EF-W1 | - | |
| 49 | +| Bats | EF-W2 | - | |
| 50 | +| Bats | <s>LB1</s> | Mutations are not unique | |
| 51 | +| Bats | <s>LB2</s> | Mutations are not unique | |
| 52 | +| Bats | <s>LC</s> | Mutations are not unique | |
| 53 | +| Bats | LI | - | |
| 54 | +| Bats | <s>LN</s> | Mutations are not unique | |
| 55 | +| Bats | <s>LS</s> | Mutations are not unique | |
| 56 | +| Bats | <s>LX</s> | Mutations are not unique | |
| 57 | +| Bats | MYsp | - | |
| 58 | +| Bats | <s>MYu</s> | Mutations are not unique | |
| 59 | +| Bats | <s>PH</s> | Not monophyletic | |
| 60 | +| Bats | PS | - | |
| 61 | +| Bats | TB1 | - | |
| 62 | +| Bats | TB2 | - | |
| 63 | +| Cosmopolitan | - | - | |
| 64 | +| Cosmopolitan | AF1a | - | |
| 65 | +| Cosmopolitan | AF1b | - | |
| 66 | +| Cosmopolitan | AF1c | - | |
| 67 | +| Cosmopolitan | AF4 | - | |
| 68 | +| Cosmopolitan | AM1 | - | |
| 69 | +| Cosmopolitan | AM2a | - | |
| 70 | +| Cosmopolitan | AM2b | - | |
| 71 | +| Cosmopolitan | AM3a | - | |
| 72 | +| Cosmopolitan | AM3b | - | |
| 73 | +| Cosmopolitan | AM4 | - | |
| 74 | +| Cosmopolitan | CA1 | - | |
| 75 | +| Cosmopolitan | CA2 | - | |
| 76 | +| Cosmopolitan | CA3 | - | |
| 77 | +| Cosmopolitan | CE | - | |
| 78 | +| Cosmopolitan | EE | - | |
| 79 | +| Cosmopolitan | ME1a | - | |
| 80 | +| Cosmopolitan | ME1b | - | |
| 81 | +| Cosmopolitan | ME2 | - | |
| 82 | +| Cosmopolitan | NEE | - | |
| 83 | +| Cosmopolitan | Vac | - | |
| 84 | +| Cosmopolitan | <s>Vac2</s> | Mutations are not unique | |
| 85 | +| Cosmopolitan | WE | - | |
| 86 | +| Cosmopolitan | <s>YUGCOW</s> | Mutations are not unique | |
| 87 | +| Cosmopolitan | <s>YUGFOX</s> | Mutations are not unique | |
| 88 | +| Indian-Sub | - | - | |
| 89 | +| RAC-SK | - | - | |
| 90 | + |
| 91 | + |
| 92 | +# Phylogenetic tree reconstruction |
| 93 | + |
| 94 | +The tree was generated from the multifasta containing the sequences depicted above using the following commands: |
| 95 | + |
| 96 | +Align relative to the reference sequence |
| 97 | +```bash |
| 98 | +augur align -s rabv.fna --nthreads 8 --output rabv.mafft.fna --reference-name NC_001542.1 --debug |
| 99 | +``` |
| 100 | + |
| 101 | +Build the tree with 1000 ultrafast bootstrap iterations |
| 102 | +```bash |
| 103 | +iqtree -s rabv.mafft.fna -B 1000 -m GTR+F+I+R7 |
| 104 | +``` |
| 105 | + |
| 106 | +Root the tree with the MRCA of the identified outgroup sequences |
| 107 | +```python |
| 108 | +from ete3 import Tree |
| 109 | + |
| 110 | +t = Tree('rabv.mafft.fna.contree') |
| 111 | +mrca = t.get_common_ancestor(['OU524413.1', 'JQ685954.1']) |
| 112 | +t.set_outgroup(mrca) |
| 113 | + |
| 114 | +with open('rabv.rooted.newick', 'w') as out: |
| 115 | + out.write(t.write()) |
| 116 | +``` |
0 commit comments