Skip to content

Commit 6f58bbd

Browse files
Merge branch 'nextstrain:master' into master
2 parents 051f3eb + c9eea98 commit 6f58bbd

15 files changed

Lines changed: 727 additions & 20 deletions

File tree

data/nextstrain/mpox/all-clades/CHANGELOG.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,8 @@
1+
## Unreleased
2+
3+
- Clade Ia and clade Ib are now distinguished
4+
- Sequences shared via Genbank since 2024 have been added
5+
16
## 2024-04-19T07:50:39Z
27

38
- New hMPXV-1 lineages B.1.21, B.1.22, and C.1.1 are now included in the dataset. For more information on these lineages, see the [hMPXV-1 lineage definitions PR](https://github.com/mpxv-lineages/lineage-designation/pull/37)
@@ -16,3 +21,4 @@ Some genes have been renamed and one has been added. The new annotation is based
1621
- The gene previously named `NBT03_gp174` is now called `OPG016`
1722
- The gene previously named `NBT03_gp175` is now called `OPG015_dup`
1823
- Gene `OPG166` has been added
24+

data/nextstrain/mpox/all-clades/README.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,24 @@
11
# Nextclade dataset for "Mpox virus (All Clades)"
22

3-
| Key | Value |
4-
| ---------------------- | --------------------------------------------------------------------------------------------------------------------- |
5-
| authors | [Cornelius Roemer](https://neherlab.org), [Richard Neher](https://neherlab.org), [Nextstrain](https://nextstrain.org) |
6-
| data source | Genbank |
7-
| workflow | [github.com/nextstrain/mpox/nextclade](https://github.com/nextstrain/mpox/nextclade) |
8-
| nextclade dataset path | nextstrain/mpox/all-clades |
9-
| annotation | [NC_063383.1](https://www.ncbi.nlm.nih.gov/nuccore/NC_063383) |
10-
| clade definitions | [github.com/mpxv-lineages/lineage-designation](https://github.com/mpxv-lineages/lineage-designation) |
11-
| related datasets | Mpox virus (Clade IIb): `nextstrain/mpox/clade-iib`<br> Mpox virus (Lineage B.1) `nextstrain/mpox/lineage-b.1` |
3+
| Key | Value |
4+
| ---------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------- |
5+
| authors | [Cornelius Roemer](https://neherlab.org), [Richard Neher](https://neherlab.org), [Nextstrain](https://nextstrain.org) |
6+
| data source | Genbank |
7+
| workflow | [github.com/nextstrain/mpox/nextclade](https://github.com/nextstrain/mpox/nextclade) |
8+
| nextclade dataset path | nextstrain/mpox/all-clades |
9+
| annotation | [NC_063383.1](https://www.ncbi.nlm.nih.gov/nuccore/NC_063383) |
10+
| clade definitions | [github.com/mpxv-lineages/lineage-designation](https://github.com/mpxv-lineages/lineage-designation) |
11+
| related datasets | Mpox virus (Clade IIb): `nextstrain/mpox/clade-iib`<br>Mpox virus (Lineage B.1) `nextstrain/mpox/lineage-b.1`<br>Mpox virus (Clade I): `nextstrain/mpox/clade-i` |
1212

1313
## Scope of this dataset
1414

15-
This dataset is for Mpox viruses of all clades (I, IIa and IIb). For a focused analysis of sequences from clade IIb, you may want to use the more specific dataset: "Clade IIb" (`nextstrain/mpox/clade-iib`). For an even more focused analysis of 2022-2023 outbreak sequences (lineage B.1 and sublineages), you may want to use the even more specific dataset: "Lineage B.1" (`nextstrain/mpox/lineage-b.1`).
15+
This dataset is for Mpox viruses of all clades (Ia, Ib, IIa and IIb). For a focused analysis of sequences from clade IIb, you may want to use the more specific dataset: "Clade IIb" (`nextstrain/mpox/clade-iib`). For an even more focused analysis of 2022-2023 outbreak sequences (lineage B.1 and sublineages), you may want to use the even more specific dataset: "Lineage B.1" (`nextstrain/mpox/lineage-b.1`). For clade I sequences, you may want to use the dataset "Clade I" (`nextstrain/mpox/clade-i`).
1616

1717
## Reference sequence and reference tree
1818

1919
The reference used in this dataset is the clade IIb NCBI refseq `NC_063383.1` (Isolate `MPXV-M5312_HM12_Rivers`).
2020

21-
Sequences for the reference tree come from NCBI/Genbank and are downsampled to around 500 sequences from the diversity of clades, lineages, countries and collection dates.
21+
Sequences for the reference tree come from NCBI/Genbank and are downsampled to around 900 sequences from the diversity of clades, lineages, countries and collection dates.
2222

2323
## Further reading
2424

data/nextstrain/mpox/all-clades/pathogen.json

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -173,10 +173,10 @@
173173
"weightUnlabeledSubstitutions": 1
174174
},
175175
"snpClusters": {
176-
"clusterCutOff": 10,
177-
"enabled": false,
176+
"clusterCutOff": 5,
177+
"enabled": true,
178178
"scoreWeight": 10,
179-
"windowSize": 100
179+
"windowSize": 300
180180
},
181181
"stopCodons": {
182182
"enabled": true,
@@ -209,5 +209,8 @@
209209
"shortcuts": [
210210
"MPXV",
211211
"nextstrain/mpox"
212-
]
212+
],
213+
"version": {
214+
"tag": "unreleased"
215+
}
213216
}

data/nextstrain/mpox/all-clades/reference.fasta

Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

data/nextstrain/mpox/all-clades/sequences.fasta

Lines changed: 4 additions & 0 deletions
Large diffs are not rendered by default.

data/nextstrain/mpox/all-clades/tree.json

Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

data_output/index.json

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1720,7 +1720,7 @@
17201720
"treeJson": "tree.json"
17211721
},
17221722
"capabilities": {
1723-
"clades": 5,
1723+
"clades": 7,
17241724
"customClades": {
17251725
"outbreak": 1,
17261726
"lineage": 33
@@ -1730,10 +1730,18 @@
17301730
"missingData",
17311731
"mixedSites",
17321732
"privateMutations",
1733+
"snpClusters",
17331734
"stopCodons"
17341735
]
17351736
},
17361737
"versions": [
1738+
{
1739+
"tag": "unreleased",
1740+
"compatibility": {
1741+
"cli": "3.0.0-alpha.0",
1742+
"web": "3.0.0-alpha.0"
1743+
}
1744+
},
17371745
{
17381746
"updatedAt": "2024-04-19T07:50:39Z",
17391747
"tag": "2024-04-19--07-50-39Z",
@@ -1752,8 +1760,7 @@
17521760
}
17531761
],
17541762
"version": {
1755-
"updatedAt": "2024-04-19T07:50:39Z",
1756-
"tag": "2024-04-19--07-50-39Z",
1763+
"tag": "unreleased",
17571764
"compatibility": {
17581765
"cli": "3.0.0-alpha.0",
17591766
"web": "3.0.0-alpha.0"
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
## Unreleased
2+
3+
- Clade Ia and clade Ib are now distinguished
4+
- Sequences shared via Genbank since 2024 have been added
5+
6+
## 2024-04-19T07:50:39Z
7+
8+
- New hMPXV-1 lineages B.1.21, B.1.22, and C.1.1 are now included in the dataset. For more information on these lineages, see the [hMPXV-1 lineage definitions PR](https://github.com/mpxv-lineages/lineage-designation/pull/37)
9+
- The sequences used in the reference trees have been updated to include the latest sequences available in Genbank as of 2024-04-16
10+
11+
## 2024-01-16T20:31:02Z
12+
13+
Initial release of this dataset. This dataset is similar to the v2 dataset [`MPXV/ancestral`](https://github.com/nextstrain/nextclade_data/tree/2023-08-17--15-51-24--UTC/data/datasets/MPXV/references/ancestral/versions/2023-08-01T12%3A00%3A00Z/files) with some differences.
14+
15+
### New and changed gene names
16+
17+
Some genes have been renamed and one has been added. The new annotation is based on NCBI refseq annotations that were released in November 2022. The v2 dataset predates this refseq:
18+
19+
- The 4 genes in the inverted terminal repeat segment (ITR) on both ends of the genome (OPG001, OPG002, OPG003,OPG015) are now all included. The genes on the 3' end (~positions 190000-197000) now have an `_dup` appended to distinguish them.
20+
- The gene previously named `NBT03_gp052` is now called `OPG073`
21+
- The gene previously named `NBT03_gp174` is now called `OPG016`
22+
- The gene previously named `NBT03_gp175` is now called `OPG015_dup`
23+
- Gene `OPG166` has been added
24+
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
# Nextclade dataset for "Mpox virus (All Clades)"
2+
3+
| Key | Value |
4+
| ---------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------- |
5+
| authors | [Cornelius Roemer](https://neherlab.org), [Richard Neher](https://neherlab.org), [Nextstrain](https://nextstrain.org) |
6+
| data source | Genbank |
7+
| workflow | [github.com/nextstrain/mpox/nextclade](https://github.com/nextstrain/mpox/nextclade) |
8+
| nextclade dataset path | nextstrain/mpox/all-clades |
9+
| annotation | [NC_063383.1](https://www.ncbi.nlm.nih.gov/nuccore/NC_063383) |
10+
| clade definitions | [github.com/mpxv-lineages/lineage-designation](https://github.com/mpxv-lineages/lineage-designation) |
11+
| related datasets | Mpox virus (Clade IIb): `nextstrain/mpox/clade-iib`<br>Mpox virus (Lineage B.1) `nextstrain/mpox/lineage-b.1`<br>Mpox virus (Clade I): `nextstrain/mpox/clade-i` |
12+
13+
## Scope of this dataset
14+
15+
This dataset is for Mpox viruses of all clades (Ia, Ib, IIa and IIb). For a focused analysis of sequences from clade IIb, you may want to use the more specific dataset: "Clade IIb" (`nextstrain/mpox/clade-iib`). For an even more focused analysis of 2022-2023 outbreak sequences (lineage B.1 and sublineages), you may want to use the even more specific dataset: "Lineage B.1" (`nextstrain/mpox/lineage-b.1`). For clade I sequences, you may want to use the dataset "Clade I" (`nextstrain/mpox/clade-i`).
16+
17+
## Reference sequence and reference tree
18+
19+
The reference used in this dataset is the clade IIb NCBI refseq `NC_063383.1` (Isolate `MPXV-M5312_HM12_Rivers`).
20+
21+
Sequences for the reference tree come from NCBI/Genbank and are downsampled to around 900 sequences from the diversity of clades, lineages, countries and collection dates.
22+
23+
## Further reading
24+
25+
The lineage system used is described in [Happi et. al. (2022)](https://doi.org/10.1371/journal.pbio.3001769). Lineage definitions are available at [github.com/mpxv-lineages/lineage-designation](https://github.com/nextstrain/mpox/nextclade).
26+
27+
Read more about Nextclade datasets in Nextclade documentation: https://docs.nextstrain.org/projects/nextclade/en/stable/user/datasets.html
4.29 MB
Binary file not shown.

0 commit comments

Comments
 (0)