Skip to content

Genome build#2

Open
jameshadfield wants to merge 2 commits into
hodcroftlab:masterfrom
jameshadfield:genome
Open

Genome build#2
jameshadfield wants to merge 2 commits into
hodcroftlab:masterfrom
jameshadfield:genome

Conversation

@jameshadfield
Copy link
Copy Markdown

@jameshadfield jameshadfield commented May 18, 2026

The included strains are taken by finding a part of the tree I'm confident doesn't
have reassortment in from looking at the segment trees.

The genome annotation file was generated by concatenating the S,M & L segments
from the Nextclade genbank references in https://github.com/nextstrain/andv

While the clock signal didn't look terrible, the timetree inference placed the CA of the Hondius outbreak back in 2004 so I've removed the timetree.

I used a new snakefile, but we can add these to the main snakefile if preferred. Run via:

snakemake --cores 1 --snakefile genome.snakefile -fp

TODO

  • There's a lot of spurious mutations at segment boundaries which aren't being correctly masked. I'll add a script to mask these blocks for tree/refine steps. This should also improve the temporal signal.

Comment thread genome.snakefile Outdated
@jameshadfield jameshadfield marked this pull request as draft May 19, 2026 00:37
The included strains are taken by finding a part of the tree I'm confident doesn't
have reassortment in (from looking at the segment trees)

The genome annotation file was generated by concatenating the S,M & L segments
from the Nextclade genbank references in <https://github.com/nextstrain/andv>

We mask the terminal 50bp regions of each segment for phylogenetic reconstruction
The clock signal didn't look terrible, but the timetree inference
placed the CA of the hondius outbreak back in 2004
@jameshadfield jameshadfield marked this pull request as ready for review May 20, 2026 19:24
@jameshadfield
Copy link
Copy Markdown
Author

My main concern is the rooting of the Hondius outbreak subtree. The (putative?) index case (PP_006WDKH, South Africa, 2026-04-26) has 2 unique mutations relative to the others and in our rooting this places it as a descendant. One of them is ~52nt from a segment boundary, so maybe that should be masked, leaving only a G8979 mutation. There's too few strains for TImeTree temporal analysis, but setting PP_006WDKH as the root lines up better with the metadata.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants