Skip to content

Commit bb8faae

Browse files
dezordirneher
authored andcommitted
Add orov dataset
1 parent 6b2fb7b commit bb8faae

42 files changed

Lines changed: 58971 additions & 0 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
## Unreleased
2+
3+
Initial release of a Oropouch Virus (OROV) dataset for segment L based on NCBI refseq reference genome.
4+
5+
Read more about Nextclade datasets in the documentation: https://docs.nextstrain.org/projects/nextclade/en/stable/user/datasets.html
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# Nextclade Dataset for "oroV" L segment based on RefSeq reference genome
2+
3+
## Dataset Attributes
4+
5+
| Attribute | Value |
6+
| -------------------- | ---------------------------------------- |
7+
| Name | orov/L/refseq |
8+
| RefName | Oropouche virus segment L |
9+
| RefAccession | NC_005776.1 |
10+
11+
## Scope of This Dataset
12+
13+
The dataset aims to enable the quality control of segment L of Oropouche virus using ncbi refseq as reference.
14+
15+
16+
The source code is available at [InstitutoTodosPelaSaude/nextclade-datasets-workflows](https://github.com/InstitutoTodosPelaSaude/nextclade-datasets-workflows/tree/main/orov).
17+
18+
For bugs, please open an [issue](https://github.com/InstitutoTodosPelaSaude/nextclade-datasets-workflows/issues).
19+
20+
Read more about Nextclade datasets in the Nextclade documentation: [Nextclade Datasets](https://docs.nextstrain.org/projects/nextclade/en/stable/user/datasets.html).
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
##gff-version 3
2+
#!gff-spec-version 1.21
3+
#!processor NCBI annotwriter
4+
##sequence-region NC_005776.1 1 6846
5+
##species https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=118655
6+
NC_005776.1 RefSeq region 1 6846 . + . ID=NC_005776.1:1..6846;Dbxref=taxon:118655;Name=L;gbkey=Src;genome=genomic;mol_type=genomic RNA;segment=L
7+
NC_005776.1 GenBank gene 44 6796 . + . gene_name=L
Lines changed: 97 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,97 @@
1+
{
2+
"$schema": "https://raw.githubusercontent.com/nextstrain/nextclade/refs/heads/release/packages/nextclade-schemas/input-pathogen-json.schema.json",
3+
"alignmentParams": {
4+
"retryReverseComplement": true
5+
},
6+
"attributes": {
7+
"name": "orov/L/refseq",
8+
"reference accession": "NC_005776.1",
9+
"reference name": "Oropouche virus, L segment"
10+
},
11+
"compatibility": {
12+
"cli": "3.0.0-alpha.0",
13+
"web": "3.0.0-alpha.0"
14+
},
15+
"placementMaskRanges": [
16+
{
17+
"begin": 0,
18+
"end": 44
19+
},
20+
{
21+
"begin": 6797,
22+
"end": 6846
23+
}
24+
],
25+
"deprecated": false,
26+
"enabled": true,
27+
"experimental": true,
28+
"files": {
29+
"changelog": "CHANGELOG.md",
30+
"examples": "sequences.fasta",
31+
"genomeAnnotation": "genome_annotation.gff3",
32+
"pathogenJson": "pathogen.json",
33+
"readme": "README.md",
34+
"reference": "reference.fasta",
35+
"treeJson": "tree.json"
36+
},
37+
"meta": {
38+
"bugs": "https://github.com/dezordi/nextclade_data_workflows/issues",
39+
"source code": "https://github.com/dezordi/nextclade_data_workflows/tree/main/oroV"
40+
},
41+
"qc": {
42+
"frameShifts": {
43+
"enabled": true,
44+
"ignoredFrameShifts": [
45+
{
46+
"codonRange": {
47+
"begin": 788,
48+
"end": 792
49+
},
50+
"cdsName": "L"
51+
},
52+
{
53+
"codonRange": {
54+
"begin": 797,
55+
"end": 800
56+
},
57+
"cdsName": "L"
58+
},
59+
{
60+
"codonRange": {
61+
"begin": 846,
62+
"end": 855
63+
},
64+
"cdsName": "L"
65+
}
66+
]
67+
},
68+
"missingData": {
69+
"enabled": true,
70+
"missingDataThreshold": 1369,
71+
"scoreBias": 95
72+
},
73+
"mixedSites": {
74+
"enabled": true,
75+
"mixedSitesThreshold": 7
76+
},
77+
"privateMutations": {
78+
"cutoff": 20,
79+
"enabled": true,
80+
"typical": 10,
81+
"weightLabeledSubstitutions": 2,
82+
"weightReversionSubstitutions": 1,
83+
"weightUnlabeledSubstitutions": 1
84+
},
85+
"snpClusters": {
86+
"enabled": false
87+
},
88+
"stopCodons": {
89+
"enabled": true
90+
}
91+
},
92+
"schemaVersion": "3.0.0",
93+
"version": {
94+
"tag": "unreleased"
95+
},
96+
"defaultCds": "L"
97+
}
Lines changed: 116 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,116 @@
1+
>NC_005776.1 Oropouche virus segment L, complete genome
2+
AGTAGTGTGCTCCTATTCCGAAACAAACAAAAACAATCTCAAAATGTCACAACTGTTGCT
3+
CAACCAATATCGGAATAGGATATTGCACTGCCGTGAACCTGAGATAGCAAAGGATATATG
4+
GCGAGATCTATTAAATGATCGACACAATTACTTTTCTCGGGAATTTTGCAGAGCTGCAAA
5+
TCTTGAGTACAGAAATGATGTTCCTGCTGAGGATATTTGTGCTGAAGTTCTTGATGGTTA
6+
TAAAGCAAGGAAAGTTCGCTTTTGTACACCTGATAATTACTTACTACATGATGGAAAGAT
7+
GTATATAATAGACTTCAAAGTGTCTGTAGACGACCGATCTTCTAGAATCACAAGGGAGAA
8+
ATATAATGAGATTTTTGGAGAGGTATTCAATCCAGAAGGTGTAGATTTTGAAATTGTTAT
9+
TATTAGATTAGATCCTTCAAATATGACGATACATGTGGACTCTCGAGATTTCGTGAATAC
10+
AATTGGGCCGATTACATTAAACATTAGTATGCAATGGTTTTTTGATATGAAAGACTTCTT
11+
GTTCGGGAAATTTCGGGATGATGATAAATTCCATGCTATAATAAGTCAAGGAGAATTCAC
12+
AATGACATTGCCATGGATTGAAGAAGACACCCCAGAATTGCTTACTCATCCTATATACAA
13+
TGAATTCATGAGTTCAATGCCAGAGGCAGAACAGGCCCTATTCAAGGAAGCATTGGAATT
14+
CAAATCATTTGGGGCAGAAAAATGGAATATCTTTTTGAAGGGGGTGATGTCAAAGTATGG
15+
TGAATATTATAAAGAATTTACTAAAGGACATGCTCATTCTATATTTCTGACAACAGGGGA
16+
CTACCCCAAGCCAGACAAAGACCAAATTTCAGCAGGTTGGAGAGAAATGGTAAACAGAGT
17+
AAGCTCTGAACGTGACATGTCAAATGACATAAATCAGGAAAAACCAAGCATGCATTTTAT
18+
ATGGGCAAAGAATGATTCAAATAGCAACAATAATATACAAAAGCTAATCAAACTATCTAA
19+
ATCACTGCAAGCTATGAGCGGGACAGGGAGCTATGTAAATGCTTTCAAGTCATTAGGGAG
20+
ATTAATGGATATATCATCAGATGTTAAAAAATATGAATCATTTTGTGGGAAATTGAAATC
21+
TCTGGCAAGGTCTAGTATAAAAAAACTTGACAGGAAAATAGAGCCAATACAAATTGGGAC
22+
TGCAACTGTCTTATGGGAACAGCAATTTAAACTAGATACAGATGTTATAAAAAGAGAAGA
23+
CAGAATACATTTAATGAAAGATTATCTTGGGATCGGTAAGCACAAATCATTTTCAAAGAA
24+
ATTAAACAACGACATAAATACTGATAAGCCTAAAATATTAAATTTCAACAATGATGATAT
25+
AGTCAGGAAATGCAAAGATAAATATAATCAAGTCATACATAACCTATCCCAAATCAATGA
26+
ATTAGATAAGATTGGAAACTACCTAGAGCACTTTTCAGCTAAAATTAGTGCCTGCAGTGT
27+
AGAAATGTGGGATTTTATATATAATACAACCAAAACTAAATACTGGCAATGCATCAATGA
28+
CTATTCCACCCTAATGAAAAACATGTTAGCTGTCTCTCAATATAATAGACACAATACGTT
29+
TAGAATTGTCTCATGTGCAAACAATAATGTATTTGGTCTAGTAATGCCAAGCTCAGATAT
30+
AAAGACAAAAAAAGCAACTTTAGTCTATGCAATAATGGCTCTCCATAATGAGGAGGCAGA
31+
AATAGCAGAACTTGGCTCACTCTACTCAACTTTTAAGACAGCAACAGGATATATTTCAAT
32+
ATCAAAGGCTTTTAGGCTGGATAAAGAAAGATGCCAACGCATAGTATCCTCTCCAGGCTT
33+
GTTCCTCATGACAAGCTGCCTATTATTCAACGGTAACAAGAGTTTAGAATTTGATAAATT
34+
ACTAGGATTTTCATTTTTTACGTCAATATCAATTACGAAAGCTATGCTCTCCCTTACTGA
35+
GCCTTCACGTTATATGATCATGAACTCGTTAGCAGTTTCCAGCCATGTAAGAGAGTATAT
36+
ATCTGAAAAATTCTCCCCTTATACAAAAACATCATTTTCTGTGGTAATGACAGACTTAAT
37+
CAAGAAGGGTTGCTATTCAGCATATGAACAGAGAAAAAAAGTACAAATAAGAGACATAAA
38+
ATTAACAGATTATGATATAACACAAAAGGGAGTGGATTCCAAAAGAGATCTTAAATCTAT
39+
TTGGTTCCCAGGAAAGGTAAACCTGAAAGAATATTTAAACCAAATTTATCTACCATTTTA
40+
TTTTAACTCTAAAGGATTACATGAAAAACATCATGTCTTGATAGATTTGGCTAAAACAGT
41+
ACTAGAAATCGAAAAAGAGCAAAGGGAGTCATTACCTGAGCCATGGTCAGAGATACCTGC
42+
TAAGCGACTGTCACTTAATGTTTTAATTTACTCATTGCAGGAACTGAATTTAGATACTTC
43+
AAGACATAATTTTGTAAGAAGCCGGGTGGAAAACGCAAATAATTTCAACAGATCTATAAC
44+
GACAATATCTACTTTTACCAGCTCAAAATCATGCATTAAGATTGGTGATTTTGAAGAAGA
45+
AAAAAGAGAAAAACTAAGAATGATACAAAAGAAACTTGCAAAGGATATTTCTAAATTAAC
46+
CATAGCCAACCCAGCATTCTTAGATGAGATCACAAACGAACATGAGATAAGGCATTCAAC
47+
TTATGAGGACTTAAAACAATCTATCCCAGATTACACAGATTATATGTCTGTGAAAGTTTT
48+
TGACAGATTGTACGAGAAGATTACTACCAATGAAATAAATGATAAGGAAACAGTCAAGCT
49+
GATTCTAGAGACCATGAAAAAACATAAAATATTTCATTTTGGATTCTTCAATAAAGGACA
50+
AAAAACAGCCAAAGATAGAGAAATATTTTTAGGTGAATTTGAAGCAAAAATGTGTCTGTA
51+
CCTTGTCGAAAGAATAGCTAAAGAGAGGTGCAAATTAAACCCTGAAGAAATGATAAGTGA
52+
ACCAGGCGACTCGAAACTAAGGGTATTAGAGAAGCAATCAGAAGACGAAATCAGGTATAT
53+
TAGCAATACAATAAAGACATTAGGGAATGCCATAGAGAACTTGCAATCTGGATCTTTAAA
54+
TTGGGCAGATATATGCGAAAACAAAGCAAGAGGACTTAAGATAGAAATAAATGCTGATAT
55+
GTCCAAATGGAGTGCCCAAGATGTACTTTTTAAATATTTTTGGTTGATAGTGCTTGATCC
56+
CATCTTATATCCTGCTGAGAGGAAAAGGATAATTTATTTCCTCTGTAATTATATGCAGAA
57+
AAGGCTTATAATGCCCGATGAATTGCTCACTACTATATTGGATCAAAGAGTTCCTTATTC
58+
AAATGACATAATTGGATTAATGACAAACAATTATAGGTCTAATACAGTAGAAATAAAGCG
59+
TAACTGGCTTCAAGGCAACTTAAATTATACAAGCAGTTACTTACACAGCTGTAGTATGTC
60+
TGTGTACAAAGATATAATAAGAGAAGCAGCAATATTATTAGAAGGAGAAGCCCTTGTGAA
61+
CTCAATGGTACATTCTGATGATAATCAAACATCTATATGTATGGTGCAGAATAAATTACC
62+
AGATGACAATATAATTGAATTTTGCATTAAGATATTCGAGAAGATATGCTTAACTTTTGG
63+
CAATCAGGCAAATATGAAGAAGACATATCTAACTAACTTCATCAAAGAGTTTGTTTCTTT
64+
ATTTAATATACATGGAGAACCATTTTCTATATATGGGAGATTTCTACTCACAGCAGTAGG
65+
AGACTGTGCCTATCTAGGGCCTTATGAAGATTTAGCAAGTAGGCTATCTGCAACACAAAC
66+
TGCTATAAAGCATGGTTGCCCACCATCACTTGCATGGGTATCTATCGCTCTAAATCACTG
67+
GATAACCCACACTACATATAATATGTTGCCTGGCCAAAATAATGACCCGTTACCATTCTT
68+
CCCTACTAACAATAGAAGTGAAATACCAGTAGAGATGTGCGGAATACTAGAAAGTGATTT
69+
ATCAACAATTGCACTAACTGGTTTAGAAGCAGGGAATGTCACGTTTCTAACAAATATAGC
70+
AAGGAAGTTATCATCCCCAATCTTACAAAGAGAAAGTATTCAAGATCAATACAATTCTAT
71+
AGAAAAGTGGGATCTGAGCAAATTATCACAGATCGACATTCTAAGGCTTAAAATGCTCAG
72+
GTATATATCTCTTGATAGTTCAGTCACATCTGATGATGGTATGGGGGAGACTAGTGAAAT
73+
GAGATCTCGATCACTTTTAACACCTCGTAAATTCACAACAAGTGGGTCACTTAATAGGTT
74+
GAAATCATATAAAGACTTTCAAGATATAATAGCAGATGAGGACAAGACAAACGAACTATT
75+
TGAGAATTTCATTAGACACCCAGAGTTACTGGTTACAAAAGGCGAAACATTTGAAGAATT
76+
TGTTAATACGATATTATTTAGGTACAATTCAAAGAAATTCAAAGAATCTTTGTCAATACA
77+
AAACCCAGCACAGCTTTTTATTGAGCAAATATTATTTTCCAATAAACCAGTAATTGACTA
78+
CACTAGCATACATGACAAGATTTTTGGATTACAAGACATGCCAGGAATTGAAGAACTAGA
79+
TACAATTATAGGTCGCAAAACATTTGTTGAGAGTTATGTTCAAATCGTAGATGACTTAAG
80+
CAATTTAACATTGGATATAAACGATGTCAAGACTATATTTGCCTTTTGTCTTATGAATGA
81+
CCCACTACTGATCACATCTGCTAACAATATAATAATGTCTGTTAAGGGACATAGTCAAGA
82+
AAGAATAGGTCAATCAGCATGCAAAATGCCAGAGGTCCGAAGTCTAAAACTCATACATTA
83+
TTCACCAGCAGTTGTTTTGAGAGCCTATGTGAGAGGGCCAACAAATGTACCGAATGTAGA
84+
TATAGATGAACTTGCAAGGGATCTATCTCATTTAGAAGACTTCATACAAAGTACAAAACT
85+
CAGAGAAAATATGAGAGAGAGAATAGAAATAAATGAGAAGCGGCACTTAGGAAGGGATTT
86+
CAAATTTGAAATCAAAGAACTAACTAGATTTTACCAAGTGTGTTATGATTACATAAAGTC
87+
TACAGAACATAAAGTCAAGGTATTCATATTGCCATACAAAGTTTTCACATCAATAGAATT
88+
CTGCGGGGCACTGACAGGTAACTTGATAAATGACAAATTATGGTACATAACGCATTATCT
89+
GAAAAATATAGTGTCTACTACACATAAGGCACAAATTTCTTCTTCACCTGAATTGGAATT
90+
GCAAATTGCTGATGAGGCACTAAGACTAGTAGCACATTTTGCTGATACTTTCTTGGCATC
91+
AGAATCAAGAATACAATTTCTGAAGAAAATTATTGAAGAATTCACATACAAAGGGATACC
92+
TGTAAAACATTTATACTCAAAAATAAAGAACTCCAAGTTGAGGGTTAAATTTCTAGGGAT
93+
TCTTTTATGGTTAGATGATCTAACACAGAATGATCTGGATAAATTTGATGCAGATAAATC
94+
AGATGAAAAGATTATATGGAATAACTGGCAAGTGTCAAGAGATATGAATACTGGACCAAT
95+
AGACTTAATGATAAGCGGTTACTCTAGACAGCTGCGGATCACTGGGGAAGATGACAAATT
96+
GATTGCTGCTGAATTGCAGGTTACTAGATTGTCAGAAGATTTAATTTATAGACACGGTCA
97+
GGCAATGTTGAATAAGCCACACGGCTTAAAGCTTGAAAAAATGCAACCTGTGACTGAGAT
98+
GTCTAAACGATTACATTATATCGTTTTCCAGCAAAGATCACGGAAACGATACTTCTATTC
99+
TATATTACCCACCCAAGTAATTGAGGACCATAATTCTAGAGTTGAATCATCTAGGCTAAG
100+
CAGAGATTCAAAATGGGTTCCTGTATGCCCTGTTGCAATATCAAAACTCTACCAACAAGG
101+
ACGGCCTATACTTTCCAAAGTTAGAAATCTGAATATGCAGACTCATTCGCTTTCCAGAAT
102+
ACAAGTTAATGTAGATGAATATGCCATCACGAGAAGAGCACATTTTCAGAAAATGCCTTT
103+
CTTCGAAGGACCATCAATCCCTTCTGGTGGTATGGATTTGTCTGAGTTGATGAAATCTAC
104+
ATCCCTATTAAGCTTGAATTATGATAACATAAAAAATGCATCCTTATTGGACATGTCTAG
105+
GGTATTTAAGTGCAATGGCAGTGGAGATGACCAAATGGCTTTCGAATTTCTATCGGACGA
106+
AATTTTGGAGCAAGATGTAGTTGAAGAAATAGAATGCAACCCTATATTTTCTATTAGTTA
107+
TACAAAAAGAGGAGAATCCAATATGACTTATAAAAATGCTTTCCACAAAGCCTTAATCTC
108+
AGAATGTGACAAATTTGAAGAAGCATTTGACTTCCTCGACATGGGATTTTGCTCGAATGA
109+
AAATCTTAGTATTCTGGAGGAAATACATTGGATAATCAGTTATTTAAAAACAAATCAATG
110+
GTCTACGGAACTAGACAATTGTATTCACATGTGCATGTACAGGAATGGATATGATGCAGA
111+
ATATCATAAATTTGATATACCCTCTAAATTCCTCAAAGACCCAATAAACCGAACAATAAA
112+
TTGGACTGAAGTCATTGAATTTATATTATTAATTGAAGATTTCCAAACAAAAATTGAGCC
113+
ATGGTCTAGTATGAAGTCACACTTCTGTTCAAAAGCACACAGTGTAGCACTAGAGTGTAT
114+
GAAAAATGAGAAAAGATCATTGGCAGAATTTGTAGACAAAAGTAAGAAAACTGGCAAATC
115+
CAAATTTGACTTCTAAGGTATACACATGTAAAAGTAGTGTTTGTTTCTAAATAGGAGCAC
116+
ACTACT

0 commit comments

Comments
 (0)