Skip to content

Commit 5c28551

Browse files
feat(datasets): add example ebolavirus mutation pattern configs
Placeholder mutationPatterns configuration for BDBV, EBOV, and SUDV to demonstrate the feature; parameters are not scientifically validated. - BDBV: ADAR (A>G, T>C) and APOBEC (G>A in YGH context), window 50, cutoff 3, with synthetic example sequences - EBOV: ADAR (A>G, T>C) and APOBEC-like CpG deamination (C>T in HCG context), window 100, cutoff 5 - SUDV: ADAR (A>G, T>C), window 80, cutoff 4
1 parent fb6dd75 commit 5c28551

31 files changed

Lines changed: 34541 additions & 16 deletions

data/nextstrain/orthoebolavirus/bdbv/CHANGELOG.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,7 @@
1+
## Unreleased
2+
3+
- Add mutationPatterns config for ADAR-mediated A-to-I editing detection (A>G and T>C substitution type filtering)
4+
15
## 2026-05-22T16:04:17Z
26

37
- adjust QC param settings to reudce private mutation threshold (outbreak genomes should be very similar)
@@ -10,7 +14,6 @@
1014
- add GP_003:367 to known stop codons
1115
- Include 2026 genomes
1216

13-
1417
## 2026-05-15T16:16:45Z
1518

1619
Initial release of this dataset.

data/nextstrain/orthoebolavirus/bdbv/examples.fasta

Lines changed: 951 additions & 0 deletions
Large diffs are not rendered by default.

data/nextstrain/orthoebolavirus/bdbv/pathogen.json

Lines changed: 48 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,48 @@
2929
"reference": "reference.fasta"
3030
},
3131
"defaultCds": "GP",
32+
"mutationPatterns": {
33+
"patterns": [
34+
{
35+
"id": "adar",
36+
"name": "ADAR-like RNA editing",
37+
"description": "ADAR-mediated A-to-I editing (observed as A>G and complementary T>C)",
38+
"events": [
39+
{
40+
"type": "nucSubstitution",
41+
"ref": ["A"],
42+
"qry": ["G"]
43+
},
44+
{
45+
"type": "nucSubstitution",
46+
"ref": ["T"],
47+
"qry": ["C"]
48+
}
49+
],
50+
"cluster": {
51+
"windowSize": 50,
52+
"cutoff": 3
53+
}
54+
},
55+
{
56+
"id": "apobec",
57+
"name": "APOBEC-like cytosine deamination",
58+
"description": "APOBEC-like cytosine deamination (observed as G>A)",
59+
"events": [
60+
{
61+
"type": "nucSubstitution",
62+
"ref": ["G"],
63+
"qry": ["A"],
64+
"motifs": ["[CT]G[ACT]"]
65+
}
66+
],
67+
"cluster": {
68+
"windowSize": 50,
69+
"cutoff": 3
70+
}
71+
}
72+
]
73+
},
3274
"qc": {
3375
"frameShifts": {
3476
"enabled": true,
@@ -60,7 +102,12 @@
60102
},
61103
"stopCodons": {
62104
"enabled": true,
63-
"ignoredStopCodons": [{"cdsName": "GP_003", "codon": 366}],
105+
"ignoredStopCodons": [
106+
{
107+
"cdsName": "GP_003",
108+
"codon": 366
109+
}
110+
],
64111
"scoreWeight": 20
65112
}
66113
},

data/nextstrain/orthoebolavirus/ebov/CHANGELOG.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,7 @@
1+
## Unreleased
2+
3+
- Add mutationPatterns config for ADAR editing and APOBEC-like CpG deamination detection
4+
15
## 2026-04-14T11:55:23Z
26

37
- Align `pathogen.json` metadata with the current Nextclade schema layout.

data/nextstrain/orthoebolavirus/ebov/pathogen.json

Lines changed: 33 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -27,11 +27,41 @@
2727
"readme": "README.md",
2828
"reference": "reference.fasta"
2929
},
30+
"mutationPatterns": {
31+
"patterns": [
32+
{
33+
"id": "rna_editing",
34+
"name": "RNA editing signatures",
35+
"description": "ADAR editing (A>G, T>C) and APOBEC-like deamination (C>T in CpG context)",
36+
"events": [
37+
{
38+
"type": "nucSubstitution",
39+
"ref": ["A"],
40+
"qry": ["G"]
41+
},
42+
{
43+
"type": "nucSubstitution",
44+
"ref": ["T"],
45+
"qry": ["C"]
46+
},
47+
{
48+
"type": "nucSubstitution",
49+
"ref": ["C"],
50+
"qry": ["T"],
51+
"motifs": ["[ACGT]CG"]
52+
}
53+
],
54+
"cluster": {
55+
"windowSize": 100,
56+
"cutoff": 5
57+
}
58+
}
59+
]
60+
},
3061
"qc": {
3162
"frameShifts": {
3263
"enabled": true,
33-
"ignoredFrameShifts": [
34-
],
64+
"ignoredFrameShifts": [],
3565
"scoreWeight": 50
3666
},
3767
"missingData": {
@@ -59,8 +89,7 @@
5989
},
6090
"stopCodons": {
6191
"enabled": true,
62-
"ignoredStopCodons": [
63-
],
92+
"ignoredStopCodons": [],
6493
"scoreWeight": 50
6594
}
6695
},

data/nextstrain/orthoebolavirus/sudv/CHANGELOG.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,7 @@
1+
## Unreleased
2+
3+
- Add mutationPatterns config for ADAR editing detection
4+
15
## 2026-04-14T11:55:23Z
26

37
- Align `pathogen.json` metadata with the current Nextclade schema layout.

data/nextstrain/orthoebolavirus/sudv/pathogen.json

Lines changed: 27 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -26,11 +26,35 @@
2626
"readme": "README.md",
2727
"reference": "reference.fasta"
2828
},
29+
"mutationPatterns": {
30+
"patterns": [
31+
{
32+
"id": "adar",
33+
"name": "ADAR-like RNA editing",
34+
"description": "ADAR-mediated A-to-I editing",
35+
"events": [
36+
{
37+
"type": "nucSubstitution",
38+
"ref": ["A"],
39+
"qry": ["G"]
40+
},
41+
{
42+
"type": "nucSubstitution",
43+
"ref": ["T"],
44+
"qry": ["C"]
45+
}
46+
],
47+
"cluster": {
48+
"windowSize": 80,
49+
"cutoff": 4
50+
}
51+
}
52+
]
53+
},
2954
"qc": {
3055
"frameShifts": {
3156
"enabled": true,
32-
"ignoredFrameShifts": [
33-
],
57+
"ignoredFrameShifts": [],
3458
"scoreWeight": 20
3559
},
3660
"missingData": {
@@ -58,8 +82,7 @@
5882
},
5983
"stopCodons": {
6084
"enabled": true,
61-
"ignoredStopCodons": [
62-
],
85+
"ignoredStopCodons": [],
6386
"scoreWeight": 20
6487
}
6588
},

data_output/index.json

Lines changed: 24 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -2233,6 +2233,13 @@
22332233
]
22342234
},
22352235
"versions": [
2236+
{
2237+
"tag": "unreleased",
2238+
"compatibility": {
2239+
"cli": "3.0.0-alpha.0",
2240+
"web": "3.0.0-alpha.0"
2241+
}
2242+
},
22362243
{
22372244
"updatedAt": "2026-04-14T11:55:23Z",
22382245
"tag": "2026-04-14--11-55-23Z",
@@ -2251,8 +2258,7 @@
22512258
}
22522259
],
22532260
"version": {
2254-
"updatedAt": "2026-04-14T11:55:23Z",
2255-
"tag": "2026-04-14--11-55-23Z",
2261+
"tag": "unreleased",
22562262
"compatibility": {
22572263
"cli": "3.0.0-alpha.0",
22582264
"web": "3.0.0-alpha.0"
@@ -2288,6 +2294,13 @@
22882294
]
22892295
},
22902296
"versions": [
2297+
{
2298+
"tag": "unreleased",
2299+
"compatibility": {
2300+
"cli": "3.0.0-alpha.0",
2301+
"web": "3.0.0-alpha.0"
2302+
}
2303+
},
22912304
{
22922305
"updatedAt": "2026-04-14T11:55:23Z",
22932306
"tag": "2026-04-14--11-55-23Z",
@@ -2306,8 +2319,7 @@
23062319
}
23072320
],
23082321
"version": {
2309-
"updatedAt": "2026-04-14T11:55:23Z",
2310-
"tag": "2026-04-14--11-55-23Z",
2322+
"tag": "unreleased",
23112323
"compatibility": {
23122324
"cli": "3.0.0-alpha.0",
23132325
"web": "3.0.0-alpha.0"
@@ -2348,6 +2360,13 @@
23482360
]
23492361
},
23502362
"versions": [
2363+
{
2364+
"tag": "unreleased",
2365+
"compatibility": {
2366+
"cli": "3.0.0-alpha.0",
2367+
"web": "3.0.0-alpha.0"
2368+
}
2369+
},
23512370
{
23522371
"updatedAt": "2026-05-22T16:04:17Z",
23532372
"tag": "2026-05-22--16-04-17Z",
@@ -2374,8 +2393,7 @@
23742393
}
23752394
],
23762395
"version": {
2377-
"updatedAt": "2026-05-22T16:04:17Z",
2378-
"tag": "2026-05-22--16-04-17Z",
2396+
"tag": "unreleased",
23792397
"compatibility": {
23802398
"cli": "3.0.0-alpha.0",
23812399
"web": "3.0.0-alpha.0"
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
## Unreleased
2+
3+
- Add mutationPatterns config for ADAR-mediated A-to-I editing detection (A>G and T>C substitution type filtering)
4+
5+
## 2026-05-22T16:04:17Z
6+
7+
- adjust QC param settings to reudce private mutation threshold (outbreak genomes should be very similar)
8+
- add SNP cluster QC rule to trigger on stretches of high private mutation density
9+
- update tree
10+
11+
## 2026-05-18T20:09:34Z
12+
13+
- Add outbreak annotation
14+
- add GP_003:367 to known stop codons
15+
- Include 2026 genomes
16+
17+
## 2026-05-15T16:16:45Z
18+
19+
Initial release of this dataset.
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
# Nextclade dataset for Bundibugyo virus (Orthoebolavirus bundibugyoense)
2+
3+
| Key | Value |
4+
| ---------------------- | ------------------------------------------------------------------------------- |
5+
| authors | [Richard Neher](https://neherlab.org) |
6+
| data source | Genbank |
7+
| nextclade dataset path | nextstrain/orthoebolavirus/bdbv |
8+
| annotation | [NC_014373.1](https://www.ncbi.nlm.nih.gov/nuccore/NC_014373.1) |
9+
10+
This Nextclade dataset for Bundibugyo virus [(Orthoebolavirus bundibugyoense)](https://ictv.global/report/chapter/filoviridae/filoviridae/orthoebolavirus) aligns to the reference sequence [NC_014373.1](https://www.ncbi.nlm.nih.gov/nuccore/NC_014373) and translates major CDS. It scores the sequence with respect to unexpected frameshifts or stop codons, missing sequence (in form of `NNN`s) and mixed bases.
11+
12+
Data from the 2026 outbreak were generously shared by the groups of Prof. Placide Mbala-Kingebeni (INRB, DRC) and Dr Isaac Ssewanyana (CPHL, Uganda) to facilitate the public health response and containment of the virus. These data are described in a post on [Virological.org](https://virological.org/t/initial-genomes-from-may-2026-bundibugyo-virus-disease-outbreak-in-the-democratic-republic-of-the-congo-and-uganda/1032) and were deposited in Pathoplexus under [Restricted Data-Use terms](https://pathoplexus.org/about/terms-of-use/restricted-data). Please consult the authors and the [data-use terms](https://pathoplexus.org/about/terms-of-use/restricted-data) before using these sequences.
13+
14+
15+

0 commit comments

Comments
 (0)