diff --git a/.gitignore b/.gitignore new file mode 100644 index 000000000..e43b0f988 --- /dev/null +++ b/.gitignore @@ -0,0 +1 @@ +.DS_Store diff --git a/README.md b/README.md index fb12179ae..72e556cb2 100644 --- a/README.md +++ b/README.md @@ -32,6 +32,44 @@ git remote set-branches --add origin [remote-branch] git fetch ``` +## rnastructurome pipeline test data + +Test data for the [nf-core/rnastructurome](https://github.com/nf-core/rnastructurome) pipeline lives on the `rnastructurome` branch under `testdata/`. + +There are three independent test datasets, each exercising a different route through the pipeline: + +- **`test_genome`**: tests the genome-alignment route. Run with `-profile test,docker`. +- **`test_transcriptome`**: tests the transcriptome-alignment route, including the `structextract` module. Run with `-profile test_transcriptome,docker`. +- **`test_prokaryote`**: tests the `jackknife` and `eval` modules using prokaryote 16S rRNA data and a known reference structure for that rRNA. Run with `-profile test_prokaryote,conda`. + +**Human (HEK293) data** — the `test_genome` and `test_transcriptome` datasets are both derived from a real HEK293 dataset (accessions listed below). Reads aligning to the mitochondrial gene RNR1 (`ENST00000389680.2`) were extracted from the full dataset to create this small test data. +- For the genome route, the full GRCh38 FASTA and GTF were each subsetted to just the MT chromosome. +- For the transcriptome route, the RNR1 transcript was extracted from the full GRCh38 transcriptome FASTA, and the same MT GTF was reused. + +**Prokaryote (E. coli) data** — reads aligning to 16S rRNA were extracted from a real *E. coli* dataset (accession listed below) to create the tiny prokaryote dataset. The reference structure for 16S rRNA was supplied by Danny Incarnato and is used by both the `jackknife` and `eval` modules. The FASTA and GTF files were created from the 16S rRNA reference structure. + +### Directory structure + +``` +testdata/ +├── HEK293T_untreated_r1.fastq.gz # shared SHAPE-seq reads (untreated, GSM4333255) +├── HEK293T_treated_r1.fastq.gz # shared SHAPE-seq reads (treated, GSM4333256) +├── test_genome/ +│ ├── Homo_sapiens.GRCh38.MT.fa # human mitochondrial chromosome +│ ├── Homo_sapiens.GRCh38.MT.gtf # MT genome annotation +│ └── samplesheet.genome_test.csv +├── test_transcriptome/ +│ ├── Homo_sapiens.GRCh38.ENST00000389680.fa # single-transcript FASTA (MT-RNR1) +│ ├── Homo_sapiens.GRCh38.MT.gtf # MT genome annotation +│ └── samplesheet.transcriptome_test.csv +└── test_prokaryote/ + ├── 16S_rRNA.fa # E. coli 16S rRNA FASTA + ├── 16S_rRNA.gtf # matching single-transcript GTF + ├── 16S_rRNA.reference.db # dot-bracket reference structure for rf-eval/rf-jackknife + ├── E_coli_DH5a_treated.16S_rRNA.fastq.gz # treated, GSM7885842 + └── samplesheet.prokaryote_test.csv +``` + ## Support For further information or help, don't hesitate to get in touch on our [Slack organisation](https://nf-co.re/join/slack) (a tool for instant messaging). diff --git a/testdata/HEK293T_treated_r1.fastq.gz b/testdata/HEK293T_treated_r1.fastq.gz new file mode 100644 index 000000000..7d855aea3 Binary files /dev/null and b/testdata/HEK293T_treated_r1.fastq.gz differ diff --git a/testdata/HEK293T_untreated_r1.fastq.gz b/testdata/HEK293T_untreated_r1.fastq.gz new file mode 100644 index 000000000..3a7688205 Binary files /dev/null and b/testdata/HEK293T_untreated_r1.fastq.gz differ diff --git a/testdata/test_genome/Homo_sapiens.GRCh38.MT.fa b/testdata/test_genome/Homo_sapiens.GRCh38.MT.fa new file mode 100644 index 000000000..6c80cb43f --- /dev/null +++ b/testdata/test_genome/Homo_sapiens.GRCh38.MT.fa @@ -0,0 +1,278 @@ +>MT +GATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCATTTGGTATTTT +CGTCTGGGGGGTATGCACGCGATAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATGTC +GCAGTATCTGTCTTTGATTCCTGCCTCATCCTATTATTTATCGCACCTACGTTCAATATT +ACAGGCGAACATACTTACTAAAGTGTGTTAATTAATTAATGCTTGTAGGACATAATAATA +ACAATTGAATGTCTGCACAGCCACTTTCCACACAGACATCATAACAAAAAATTTCCACCA +AACCCCCCCTCCCCCGCTTCTGGCCACAGCACTTAAACACATCTCTGCCAAACCCCAAAA +ACAAAGAACCCTAACACCAGCCTAACCAGATTTCAAATTTTATCTTTTGGCGGTATGCAC +TTTTAACAGTCACCCCCCAACTAACACATTATTTTCCCCTCCCACTCCCATACTACTAAT +CTCATCAATACAACCCCCGCCCATCCTACCCAGCACACACACACCGCTGCTAACCCCATA +CCCCGAACCAACCAAACCCCAAAGACACCCCCCACAGTTTATGTAGCTTACCTCCTCAAA +GCAATACACTGAAAATGTTTAGACGGGCTCACATCACCCCATAAACAAATAGGTTTGGTC +CTAGCCTTTCTATTAGCTCTTAGTAAGATTACACATGCAAGCATCCCCGTTCCAGTGAGT +TCACCCTCTAAATCACCACGATCAAAAGGAACAAGCATCAAGCACGCAGCAATGCAGCTC +AAAACGCTTAGCCTAGCCACACCCCCACGGGAAACAGCAGTGATTAACCTTTAGCAATAA +ACGAAAGTTTAACTAAGCTATACTAACCCCAGGGTTGGTCAATTTCGTGCCAGCCACCGC +GGTCACACGATTAACCCAAGTCAATAGAAGCCGGCGTAAAGAGTGTTTTAGATCACCCCC +TCCCCAATAAAGCTAAAACTCACCTGAGTTGTAAAAAACTCCAGTTGACACAAAATAGAC +TACGAAAGTGGCTTTAACATATCTGAACACACAATAGCTAAGACCCAAACTGGGATTAGA +TACCCCACTATGCTTAGCCCTAAACCTCAACAGTTAAATCAACAAAACTGCTCGCCAGAA +CACTACGAGCCACAGCTTAAAACTCAAAGGACCTGGCGGTGCTTCATATCCCTCTAGAGG +AGCCTGTTCTGTAATCGATAAACCCCGATCAACCTCACCACCTCTTGCTCAGCCTATATA +CCGCCATCTTCAGCAAACCCTGATGAAGGCTACAAAGTAAGCGCAAGTACCCACGTAAAG +ACGTTAGGTCAAGGTGTAGCCCATGAGGTGGCAAGAAATGGGCTACATTTTCTACCCCAG +AAAACTACGATAGCCCTTATGAAACTTAAGGGTCGAAGGTGGATTTAGCAGTAAACTAAG +AGTAGAGTGCTTAGTTGAACAGGGCCCTGAAGCGCGTACACACCGCCCGTCACCCTCCTC +AAGTATACTTCAAAGGACATTTAACTAAAACCCCTACGCATTTATATAGAGGAGACAAGT +CGTAACATGGTAAGTGTACTGGAAAGTGCACTTGGACGAACCAGAGTGTAGCTTAACACA +AAGCACCCAACTTACACTTAGGAGATTTCAACTTAACTTGACCGCTCTGAGCTAAACCTA +GCCCCAAACCCACTCCACCTTACTACCAGACAACCTTAGCCAAACCATTTACCCAAATAA +AGTATAGGCGATAGAAATTGAAACCTGGCGCAATAGATATAGTACCGCAAGGGAAAGATG +AAAAATTATAACCAAGCATAATATAGCAAGGACTAACCCCTATACCTTCTGCATAATGAA +TTAACTAGAAATAACTTTGCAAGGAGAGCCAAAGCTAAGACCCCCGAAACCAGACGAGCT +ACCTAAGAACAGCTAAAAGAGCACACCCGTCTATGTAGCAAAATAGTGGGAAGATTTATA +GGTAGAGGCGACAAACCTACCGAGCCTGGTGATAGCTGGTTGTCCAAGATAGAATCTTAG +TTCAACTTTAAATTTGCCCACAGAACCCTCTAAATCCCCTTGTAAATTTAACTGTTAGTC +CAAAGAGGAACAGCTCTTTGGACACTAGGAAAAAACCTTGTAGAGAGAGTAAAAAATTTA +ACACCCATAGTAGGCCTAAAAGCAGCCACCAATTAAGAAAGCGTTCAAGCTCAACACCCA +CTACCTAAAAAATCCCAAACATATAACTGAACTCCTCACACCCAATTGGACCAATCTATC +ACCCTATAGAAGAACTAATGTTAGTATAAGTAACATGAAAACATTCTCCTCCGCATAAGC +CTGCGTCAGATTAAAACACTGAACTGACAATTAACAGCCCAATATCTACAATCAACCAAC +AAGTCATTATTACCCTCACTGTCAACCCAACACAGGCATGCTCATAAGGAAAGGTTAAAA +AAAGTAAAAGGAACTCGGCAAATCTTACCCCGCCTGTTTACCAAAAACATCACCTCTAGC +ATCACCAGTATTAGAGGCACCGCCTGCCCAGTGACACATGTTTAACGGCCGCGGTACCCT +AACCGTGCAAAGGTAGCATAATCACTTGTTCCTTAAATAGGGACCTGTATGAATGGCTCC +ACGAGGGTTCAGCTGTCTCTTACTTTTAACCAGTGAAATTGACCTGCCCGTGAAGAGGCG +GGCATAACACAGCAAGACGAGAAGACCCTATGGAGCTTTAATTTATTAATGCAAACAGTA +CCTAACAAACCCACAGGTCCTAAACTACCAAACCTGCATTAAAAATTTCGGTTGGGGCGA +CCTCGGAGCAGAACCCAACCTCCGAGCAGTACATGCTAAGACTTCACCAGTCAAAGCGAA +CTACTATACTCAATTGATCCAATAACTTGACCAACGGAACAAGTTACCCTAGGGATAACA +GCGCAATCCTATTCTAGAGTCCATATCAACAATAGGGTTTACGACCTCGATGTTGGATCA +GGACATCCCGATGGTGCAGCCGCTATTAAAGGTTCGTTTGTTCAACGATTAAAGTCCTAC +GTGATCTGAGTTCAGACCGGAGTAATCCAGGTCGGTTTCTATCTACNTTCAAATTCCTCC +CTGTACGAAAGGACAAGAGAAATAAGGCCTACTTCACAAAGCGCCTTCCCCCGTAAATGA +TATCATCTCAACTTAGTATTATACCCACACCCACCCAAGAACAGGGTTTGTTAAGATGGC +AGAGCCCGGTAATCGCATAAAACTTAAAACTTTACAGTCAGAGGTTCAATTCCTCTTCTT +AACAACATACCCATGGCCAACCTCCTACTCCTCATTGTACCCATTCTAATCGCAATGGCA +TTCCTAATGCTTACCGAACGAAAAATTCTAGGCTATATACAACTACGCAAAGGCCCCAAC +GTTGTAGGCCCCTACGGGCTACTACAACCCTTCGCTGACGCCATAAAACTCTTCACCAAA +GAGCCCCTAAAACCCGCCACATCTACCATCACCCTCTACATCACCGCCCCGACCTTAGCT +CTCACCATCGCTCTTCTACTATGAACCCCCCTCCCCATACCCAACCCCCTGGTCAACCTC +AACCTAGGCCTCCTATTTATTCTAGCCACCTCTAGCCTAGCCGTTTACTCAATCCTCTGA +TCAGGGTGAGCATCAAACTCAAACTACGCCCTGATCGGCGCACTGCGAGCAGTAGCCCAA +ACAATCTCATATGAAGTCACCCTAGCCATCATTCTACTATCAACATTACTAATAAGTGGC +TCCTTTAACCTCTCCACCCTTATCACAACACAAGAACACCTCTGATTACTCCTGCCATCA +TGACCCTTGGCCATAATATGATTTATCTCCACACTAGCAGAGACCAACCGAACCCCCTTC +GACCTTGCCGAAGGGGAGTCCGAACTAGTCTCAGGCTTCAACATCGAATACGCCGCAGGC +CCCTTCGCCCTATTCTTCATAGCCGAATACACAAACATTATTATAATAAACACCCTCACC +ACTACAATCTTCCTAGGAACAACATATGACGCACTCTCCCCTGAACTCTACACAACATAT +TTTGTCACCAAGACCCTACTTCTAACCTCCCTGTTCTTATGAATTCGAACAGCATACCCC +CGATTCCGCTACGACCAACTCATACACCTCCTATGAAAAAACTTCCTACCACTCACCCTA +GCATTACTTATATGATATGTCTCCATACCCATTACAATCTCCAGCATTCCCCCTCAAACC +TAAGAAATATGTCTGATAAAAGAGTTACTTTGATAGAGTAAATAATAGGAGCTTAAACCC +CCTTATTTCTAGGACTATGAGAATCGAACCCATCCCTGAGAATCCAAAATTCTCCGTGCC +ACCTATCACACCCCATCCTAAAGTAAGGTCAGCTAAATAAGCTATCGGGCCCATACCCCG +AAAATGTTGGTTATACCCTTCCCGTACTAATTAATCCCCTGGCCCAACCCGTCATCTACT +CTACCATCTTTGCAGGCACACTCATCACAGCGCTAAGCTCGCACTGATTTTTTACCTGAG +TAGGCCTAGAAATAAACATGCTAGCTTTTATTCCAGTTCTAACCAAAAAAATAAACCCTC +GTTCCACAGAAGCTGCCATCAAGTATTTCCTCACGCAAGCAACCGCATCCATAATCCTTC +TAATAGCTATCCTCTTCAACAATATACTCTCCGGACAATGAACCATAACCAATACTACCA +ATCAATACTCATCATTAATAATCATAATAGCTATAGCAATAAAACTAGGAATAGCCCCCT +TTCACTTCTGAGTCCCAGAGGTTACCCAAGGCACCCCTCTGACATCCGGCCTGCTTCTTC +TCACATGACAAAAACTAGCCCCCATCTCAATCATATACCAAATCTCTCCCTCACTAAACG +TAAGCCTTCTCCTCACTCTCTCAATCTTATCCATCATAGCAGGCAGTTGAGGTGGATTAA +ACCAAACCCAGCTACGCAAAATCTTAGCATACTCCTCAATTACCCACATAGGATGAATAA +TAGCAGTTCTACCGTACAACCCTAACATAACCATTCTTAATTTAACTATTTATATTATCC +TAACTACTACCGCATTCCTACTACTCAACTTAAACTCCAGCACCACGACCCTACTACTAT +CTCGCACCTGAAACAAGCTAACATGACTAACACCCTTAATTCCATCCACCCTCCTCTCCC +TAGGAGGCCTGCCCCCGCTAACCGGCTTTTTGCCCAAATGGGCCATTATCGAAGAATTCA +CAAAAAACAATAGCCTCATCATCCCCACCATCATAGCCACCATCACCCTCCTTAACCTCT +ACTTCTACCTACGCCTAATCTACTCCACCTCAATCACACTACTCCCCATATCTAACAACG +TAAAAATAAAATGACAGTTTGAACATACAAAACCCACCCCATTCCTCCCCACACTCATCG +CCCTTACCACGCTACTCCTACCTATCTCCCCTTTTATACTAATAATCTTATAGAAATTTA +GGTTAAATACAGACCAAGAGCCTTCAAAGCCCTCAGTAAGTTGCAATACTTAATTTCTGT +AACAGCTAAGGACTGCAAAACCCCACTCTGCATCAACTGAACGCAAATCAGCCACTTTAA +TTAAGCTAAGCCCTTACTAGACCAATGGGACTTAAACCCACAAACACTTAGTTAACAGCT +AAGCACCCTAATCAACTGGCTTCAATCTACTTCTCCCGCCGCCGGGAAAAAAGGCGGGAG +AAGCCCCGGCAGGTTTGAAGCTGCTTCTTCGAATTTGCAATTCAATATGAAAATCACCTC +GGAGCTGGTAAAAAGAGGCCTAACCCCTGTCTTTAGATTTACAGTCCAATGCTTCACTCA +GCCATTTTACCTCACCCCCACTGATGTTCGCCGACCGTTGACTATTCTCTACAAACCACA +AAGACATTGGAACACTATACCTATTATTCGGCGCATGAGCTGGAGTCCTAGGCACAGCTC +TAAGCCTCCTTATTCGAGCCGAGCTGGGCCAGCCAGGCAACCTTCTAGGTAACGACCACA +TCTACAACGTTATCGTCACAGCCCATGCATTTGTAATAATCTTCTTCATAGTAATACCCA +TCATAATCGGAGGCTTTGGCAACTGACTAGTTCCCCTAATAATCGGTGCCCCCGATATGG +CGTTTCCCCGCATAAACAACATAAGCTTCTGACTCTTACCTCCCTCTCTCCTACTCCTGC +TCGCATCTGCTATAGTGGAGGCCGGAGCAGGAACAGGTTGAACAGTCTACCCTCCCTTAG +CAGGGAACTACTCCCACCCTGGAGCCTCCGTAGACCTAACCATCTTCTCCTTACACCTAG +CAGGTGTCTCCTCTATCTTAGGGGCCATCAATTTCATCACAACAATTATCAATATAAAAC +CCCCTGCCATAACCCAATACCAAACGCCCCTCTTCGTCTGATCCGTCCTAATCACAGCAG +TCCTACTTCTCCTATCTCTCCCAGTCCTAGCTGCTGGCATCACTATACTACTAACAGACC +GCAACCTCAACACCACCTTCTTCGACCCCGCCGGAGGAGGAGACCCCATTCTATACCAAC +ACCTATTCTGATTTTTCGGTCACCCTGAAGTTTATATTCTTATCCTACCAGGCTTCGGAA +TAATCTCCCATATTGTAACTTACTACTCCGGAAAAAAAGAACCATTTGGATACATAGGTA +TGGTCTGAGCTATGATATCAATTGGCTTCCTAGGGTTTATCGTGTGAGCACACCATATAT +TTACAGTAGGAATAGACGTAGACACACGAGCATATTTCACCTCCGCTACCATAATCATCG +CTATCCCCACCGGCGTCAAAGTATTTAGCTGACTCGCCACACTCCACGGAAGCAATATGA +AATGATCTGCTGCAGTGCTCTGAGCCCTAGGATTCATCTTTCTTTTCACCGTAGGTGGCC +TGACTGGCATTGTATTAGCAAACTCATCACTAGACATCGTACTACACGACACGTACTACG +TTGTAGCCCACTTCCACTATGTCCTATCAATAGGAGCTGTATTTGCCATCATAGGAGGCT +TCATTCACTGATTTCCCCTATTCTCAGGCTACACCCTAGACCAAACCTACGCCAAAATCC +ATTTCACTATCATATTCATCGGCGTAAATCTAACTTTCTTCCCACAACACTTTCTCGGCC +TATCCGGAATGCCCCGACGTTACTCGGACTACCCCGATGCATACACCACATGAAACATCC +TATCATCTGTAGGCTCATTCATTTCTCTAACAGCAGTAATATTAATAATTTTCATGATTT +GAGAAGCCTTCGCTTCGAAGCGAAAAGTCCTAATAGTAGAAGAACCCTCCATAAACCTGG +AGTGACTATATGGATGCCCCCCACCCTACCACACATTCGAAGAACCCGTATACATAAAAT +CTAGACAAAAAAGGAAGGAATCGAACCCCCCAAAGCTGGTTTCAAGCCAACCCCATGGCC +TCCATGACTTTTTCAAAAAGGTATTAGAAAAACCATTTCATAACTTTGTCAAAGTTAAAT +TATAGGCTAAATCCTATATATCTTAATGGCACATGCAGCGCAAGTAGGTCTACAAGACGC +TACTTCCCCTATCATAGAAGAGCTTATCACCTTTCATGATCACGCCCTCATAATCATTTT +CCTTATCTGCTTCCTAGTCCTGTATGCCCTTTTCCTAACACTCACAACAAAACTAACTAA +TACTAACATCTCAGACGCTCAGGAAATAGAAACCGTCTGAACTATCCTGCCCGCCATCAT +CCTAGTCCTCATCGCCCTCCCATCCCTACGCATCCTTTACATAACAGACGAGGTCAACGA +TCCCTCCCTTACCATCAAATCAATTGGCCACCAATGGTACTGAACCTACGAGTACACCGA +CTACGGCGGACTAATCTTCAACTCCTACATACTTCCCCCATTATTCCTAGAACCAGGCGA +CCTGCGACTCCTTGACGTTGACAATCGAGTAGTACTCCCGATTGAAGCCCCCATTCGTAT +AATAATTACATCACAAGACGTCTTGCACTCATGAGCTGTCCCCACATTAGGCTTAAAAAC +AGATGCAATTCCCGGACGTCTAAACCAAACCACTTTCACCGCTACACGACCGGGGGTATA +CTACGGTCAATGCTCTGAAATCTGTGGAGCAAACCACAGTTTCATGCCCATCGTCCTAGA +ATTAATTCCCCTAAAAATCTTTGAAATAGGGCCCGTATTTACCCTATAGCACCCCCTCTA +CCCCCTCTAGAGCCCACTGTAAAGCTAACTTAGCATTAACCTTTTAAGTTAAAGATTAAG +AGAACCAACACCTCTTTACAGTGAAATGCCCCAACTAAATACTACCGTATGGCCCACCAT +AATTACCCCCATACTCCTTACACTATTCCTCATCACCCAACTAAAAATATTAAACACAAA +CTACCACCTACCTCCCTCACCAAAGCCCATAAAAATAAAAAATTATAACAAACCCTGAGA +ACCAAAATGAACGAAAATCTGTTCGCTTCATTCATTGCCCCCACAATCCTAGGCCTACCC +GCCGCAGTACTGATCATTCTATTTCCCCCTCTATTGATCCCCACCTCCAAATATCTCATC +AACAACCGACTAATCACCACCCAACAATGACTAATCAAACTAACCTCAAAACAAATGATA +ACCATACACAACACTAAAGGACGAACCTGATCTCTTATACTAGTATCCTTAATCATTTTT +ATTGCCACAACTAACCTCCTCGGACTCCTGCCTCACTCATTTACACCAACCACCCAACTA +TCTATAAACCTAGCCATGGCCATCCCCTTATGAGCGGGCACAGTGATTATAGGCTTTCGC +TCTAAGATTAAAAATGCCCTAGCCCACTTCTTACCACAAGGCACACCTACACCCCTTATC +CCCATACTAGTTATTATCGAAACCATCAGCCTACTCATTCAACCAATAGCCCTGGCCGTA +CGCCTAACCGCTAACATTACTGCAGGCCACCTACTCATGCACCTAATTGGAAGCGCCACC +CTAGCAATATCAACCATTAACCTTCCCTCTACACTTATCATCTTCACAATTCTAATTCTA +CTGACTATCCTAGAAATCGCTGTCGCCTTAATCCAAGCCTACGTTTTCACACTTCTAGTA +AGCCTCTACCTGCACGACAACACATAATGACCCACCAATCACATGCCTATCATATAGTAA +AACCCAGCCCATGACCCCTAACAGGGGCCCTCTCAGCCCTCCTAATGACCTCCGGCCTAG +CCATGTGATTTCACTTCCACTCCATAACGCTCCTCATACTAGGCCTACTAACCAACACAC +TAACCATATACCAATGATGGCGCGATGTAACACGAGAAAGCACATACCAAGGCCACCACA +CACCACCTGTCCAAAAAGGCCTTCGATACGGGATAATCCTATTTATTACCTCAGAAGTTT +TTTTCTTCGCAGGATTTTTCTGAGCCTTTTACCACTCCAGCCTAGCCCCTACCCCCCAAT +TAGGAGGGCACTGGCCCCCAACAGGCATCACCCCGCTAAATCCCCTAGAAGTCCCACTCC +TAAACACATCCGTATTACTCGCATCAGGAGTATCAATCACCTGAGCTCACCATAGTCTAA +TAGAAAACAACCGAAACCAAATAATTCAAGCACTGCTTATTACAATTTTACTGGGTCTCT +ATTTTACCCTCCTACAAGCCTCAGAGTACTTCGAGTCTCCCTTCACCATTTCCGACGGCA +TCTACGGCTCAACATTTTTTGTAGCCACAGGCTTCCACGGACTTCACGTCATTATTGGCT +CAACTTTCCTCACTATCTGCTTCATCCGCCAACTAATATTTCACTTTACATCCAAACATC +ACTTTGGCTTCGAAGCCGCCGCCTGATACTGGCATTTTGTAGATGTGGTTTGACTATTTC +TGTATGTCTCCATCTATTGATGAGGGTCTTACTCTTTTAGTATAAATAGTACCGTTAACT +TCCAATTAACTAGTTTTGACAACATTCAAAAAAGAGTAATAAACTTCGCCTTAATTTTAA +TAATCAACACCCTCCTAGCCTTACTACTAATAATTATTACATTTTGACTACCACAACTCA +ACGGCTACATAGAAAAATCCACCCCTTACGAGTGCGGCTTCGACCCTATATCCCCCGCCC +GCGTCCCTTTCTCCATAAAATTCTTCTTAGTAGCTATTACCTTCTTATTATTTGATCTAG +AAATTGCCCTCCTTTTACCCCTACCATGAGCCCTACAAACAACTAACCTGCCACTAATAG +TTATGTCATCCCTCTTATTAATCATCATCCTAGCCCTAAGTCTGGCCTATGAGTGACTAC +AAAAAGGATTAGACTGAACCGAATTGGTATATAGTTTAAACAAAACGAATGATTTCGACT +CATTAAATTATGATAATCATATTTACCAAATGCCCCTCATTTACATAAATATTATACTAG +CATTTACCATCTCACTTCTAGGAATACTAGTATATCGCTCACACCTCATATCCTCCCTAC +TATGCCTAGAAGGAATAATACTATCGCTGTTCATTATAGCTACTCTCATAACCCTCAACA +CCCACTCCCTCTTAGCCAATATTGTGCCTATTGCCATACTAGTCTTTGCCGCCTGCGAAG +CAGCGGTGGGCCTAGCCCTACTAGTCTCAATCTCCAACACATATGGCCTAGACTACGTAC +ATAACCTAAACCTACTCCAATGCTAAAACTAATCGTCCCAACAATTATATTACTACCACT +GACATGACTTTCCAAAAAACACATAATTTGAATCAACACAACCACCCACAGCCTAATTAT +TAGCATCATCCCTCTACTATTTTTTAACCAAATCAACAACAACCTATTTAGCTGTTCCCC +AACCTTTTCCTCCGACCCCCTAACAACCCCCCTCCTAATACTAACTACCTGACTCCTACC +CCTCACAATCATGGCAAGCCAACGCCACTTATCCAGTGAACCACTATCACGAAAAAAACT +CTACCTCTCTATACTAATCTCCCTACAAATCTCCTTAATTATAACATTCACAGCCACAGA +ACTAATCATATTTTATATCTTCTTCGAAACCACACTTATCCCCACCTTGGCTATCATCAC +CCGATGAGGCAACCAGCCAGAACGCCTGAACGCAGGCACATACTTCCTATTCTACACCCT +AGTAGGCTCCCTTCCCCTACTCATCGCACTAATTTACACTCACAACACCCTAGGCTCACT +AAACATTCTACTACTCACTCTCACTGCCCAAGAACTATCAAACTCCTGAGCCAACAACTT +AATATGACTAGCTTACACAATAGCTTTTATAGTAAAGATACCTCTTTACGGACTCCACTT +ATGACTCCCTAAAGCCCATGTCGAAGCCCCCATCGCTGGGTCAATAGTACTTGCCGCAGT +ACTCTTAAAACTAGGCGGCTATGGTATAATACGCCTCACACTCATTCTCAACCCCCTGAC +AAAACACATAGCCTACCCCTTCCTTGTACTATCCCTATGAGGCATAATTATAACAAGCTC +CATCTGCCTACGACAAACAGACCTAAAATCGCTCATTGCATACTCTTCAATCAGCCACAT +AGCCCTCGTAGTAACAGCCATTCTCATCCAAACCCCCTGAAGCTTCACCGGCGCAGTCAT +TCTCATAATCGCCCACGGGCTTACATCCTCATTACTATTCTGCCTAGCAAACTCAAACTA +CGAACGCACTCACAGTCGCATCATAATCCTCTCTCAAGGACTTCAAACTCTACTCCCACT +AATAGCTTTTTGATGACTTCTAGCAAGCCTCGCTAACCTCGCCTTACCCCCCACTATTAA +CCTACTGGGAGAACTCTCTGTGCTAGTAACCACGTTCTCCTGATCAAATATCACTCTCCT +ACTTACAGGACTCAACATACTAGTCACAGCCCTATACTCCCTCTACATATTTACCACAAC +ACAATGGGGCTCACTCACCCACCACATTAACAACATAAAACCCTCATTCACACGAGAAAA +CACCCTCATGTTCATACACCTATCCCCCATTCTCCTCCTATCCCTCAACCCCGACATCAT +TACCGGGTTTTCCTCTTGTAAATATAGTTTAACCAAAACATCAGATTGTGAATCTGACAA +CAGAGGCTTACGACCCCTTATTTACCGAGAAAGCTCACAAGAACTGCTAACTCATGCCCC +CATGTCTAACAACATGGCTTTCTCAACTTTTAAAGGATAACAGCTATCCATTGGTCTTAG +GCCCCAAAAATTTTGGTGCAACTCCAAATAAAAGTAATAACCATGCACACTACTATAACC +ACCCTAACCCTGACTTCCCTAATTCCCCCCATCCTTACCACCCTCGTTAACCCTAACAAA +AAAAACTCATACCCCCATTATGTAAAATCCATTGTCGCATCCACCTTTATTATCAGTCTC +TTCCCCACAACAATATTCATGTGCCTAGACCAAGAAGTTATTATCTCGAACTGACACTGA +GCCACAACCCAAACAACCCAGCTCTCCCTAAGCTTCAAACTAGACTACTTCTCCATAATA +TTCATCCCTGTAGCATTGTTCGTTACATGGTCCATCATAGAATTCTCACTGTGATATATA +AACTCAGACCCAAACATTAATCAGTTCTTCAAATATCTACTCATCTTCCTAATTACCATA +CTAATCTTAGTTACCGCTAACAACCTATTCCAACTGTTCATCGGCTGAGAGGGCGTAGGA +ATTATATCCTTCTTGCTCATCAGTTGATGATACGCCCGAGCAGATGCCAACACAGCAGCC +ATTCAAGCAATCCTATACAACCGTATCGGCGATATCGGTTTCATCCTCGCCTTAGCATGA +TTTATCCTACACTCCAACTCATGAGACCCACAACAAATAGCCCTTCTAAACGCTAATCCA +AGCCTCACCCCACTACTAGGCCTCCTCCTAGCAGCAGCAGGCAAATCAGCCCAATTAGGT +CTCCACCCCTGACTCCCCTCAGCCATAGAAGGCCCCACCCCAGTCTCAGCCCTACTCCAC +TCAAGCACTATAGTTGTAGCAGGAATCTTCTTACTCATCCGCTTCCACCCCCTAGCAGAA +AATAGCCCACTAATCCAAACTCTAACACTATGCTTAGGCGCTATCACCACTCTGTTCGCA +GCAGTCTGCGCCCTTACACAAAATGACATCAAAAAAATCGTAGCCTTCTCCACTTCAAGT +CAACTAGGACTCATAATAGTTACAATCGGCATCAACCAACCACACCTAGCATTCCTGCAC +ATCTGTACCCACGCCTTCTTCAAAGCCATACTATTTATGTGCTCCGGGTCCATCATCCAC +AACCTTAACAATGAACAAGATATTCGAAAAATAGGAGGACTACTCAAAACCATACCTCTC +ACTTCAACCTCCCTCACCATTGGCAGCCTAGCATTAGCAGGAATACCTTTCCTCACAGGT +TTCTACTCCAAAGACCACATCATCGAAACCGCAAACATATCATACACAAACGCCTGAGCC +CTATCTATTACTCTCATCGCTACCTCCCTGACAAGCGCCTATAGCACTCGAATAATTCTT +CTCACCCTAACAGGTCAACCTCGCTTCCCCACCCTTACTAACATTAACGAAAATAACCCC +ACCCTACTAAACCCCATTAAACGCCTGGCAGCCGGAAGCCTATTCGCAGGATTTCTCATT +ACTAACAACATTTCCCCCGCATCCCCCTTCCAAACAACAATCCCCCTCTACCTAAAACTC +ACAGCCCTCGCTGTCACTTTCCTAGGACTTCTAACAGCCCTAGACCTCAACTACCTAACC +AACAAACTTAAAATAAAATCCCCACTATGCACATTTTATTTCTCCAACATACTCGGATTC +TACCCTAGCATCACACACCGCACAATCCCCTATCTAGGCCTTCTTACGAGCCAAAACCTG +CCCCTACTCCTCCTAGACCTAACCTGACTAGAAAAGCTATTACCTAAAACAATTTCACAG +CACCAAATCTCCACCTCCATCATCACCTCAACCCAAAAAGGCATAATTAAACTTTACTTC +CTCTCTTTCTTCTTCCCACTCATCCTAACCCTACTCCTAATCACATAACCTATTCCCCCG +AGCAATCTCAATTACAATATATACACCAACAAACAATGTTCAACCAGTAACTACTACTAA +TCAACGCCCATAATCATACAAAGCCCCCGCACCAATAGGATCCTCCCGAATCAACCCTGA +CCCCTCTCCTTCATAAATTATTCAGCTTCCTACACTATTAAAGTTTACCACAACCACCAC +CCCATCATACTCTTTCACCCACAGCACCAATCCTACCTCCATCGCTAACCCCACTAAAAC +ACTCACCAAGACCTCAACCCCTGACCCCCATGCCTCAGGATACTCCTCAATAGCCATCGC +TGTAGTATATCCAAAGACAACCATCATTCCCCCTAAATAAATTAAAAAAACTATTAAACC +CATATAACCTCCCCCAAAATTCAGAATAATAACACACCCGACCACACCGCTAACAATCAA +TACTAAACCCCCATAAATAGGAGAAGGCTTAGAAGAAAACCCCACAAACCCCATTACTAA +ACCCACACTCAACAGAAACAAAGCATACATCATTATTCTCGCACGGACTACAACCACGAC +CAATGATATGAAAAACCATCGTTGTATTTCAACTACAAGAACACCAATGACCCCAATACG +CAAAACTAACCCCCTAATAAAATTAATTAACCACTCATTCATCGACCTCCCCACCCCATC +CAACATCTCCGCATGATGAAACTTCGGCTCACTCCTTGGCGCCTGCCTGATCCTCCAAAT +CACCACAGGACTATTCCTAGCCATGCACTACTCACCAGACGCCTCAACCGCCTTTTCATC +AATCGCCCACATCACTCGAGACGTAAATTATGGCTGAATCATCCGCTACCTTCACGCCAA +TGGCGCCTCAATATTCTTTATCTGCCTCTTCCTACACATCGGGCGAGGCCTATATTACGG +ATCATTTCTCTACTCAGAAACCTGAAACATCGGCATTATCCTCCTGCTTGCAACTATAGC +AACAGCCTTCATAGGCTATGTCCTCCCGTGAGGCCAAATATCATTCTGAGGGGCCACAGT +AATTACAAACTTACTATCCGCCATCCCATACATTGGGACAGACCTAGTTCAATGAATCTG +AGGAGGCTACTCAGTAGACAGTCCCACCCTCACACGATTCTTTACCTTTCACTTCATCTT +GCCCTTCATTATTGCAGCCCTAGCAACACTCCACCTCCTATTCTTGCACGAAACGGGATC +AAACAACCCCCTAGGAATCACCTCCCATTCCGATAAAATCACCTTCCACCCTTACTACAC +AATCAAAGACGCCCTCGGCTTACTTCTCTTCCTTCTCTCCTTAATGACATTAACACTATT +CTCACCAGACCTCCTAGGCGACCCAGACAATTATACCCTAGCCAACCCCTTAAACACCCC +TCCCCACATCAAGCCCGAATGATATTTCCTATTCGCCTACACAATTCTCCGATCCGTCCC +TAACAAACTAGGAGGCGTCCTTGCCCTATTACTATCCATCCTCATCCTAGCAATAATCCC +CATCCTCCATATATCCAAACAACAAAGCATAATATTTCGCCCACTAAGCCAATCACTTTA +TTGACTCCTAGCCGCAGACCTCCTCATTCTAACCTGAATCGGAGGACAACCAGTAAGCTA +CCCTTTTACCATCATTGGACAAGTAGCATCCGTACTATACTTCACAACAATCCTAATCCT +AATACCAACTATCTCCCTAATTGAAAACAAAATACTCAAATGGGCCTGTCCTTGTAGTAT +AAACTAATACACCAGTCTTGTAAACCGGAGATGAAAACCTTTTTCCAAGGACAAATCAGA +GAAAAAGTCTTTAACTCCACCATTAGCACCCAAAGCTAAGATTCTAATTTAAACTATTCT +CTGTTCTTTCATGGGGAAGCAGATTTGGGTACCACCCAAGTATTGACTCACCCATCAACA +ACCGCTATGTATTTCGTACATTACTGCCAGCCACCATGAATATTGTACGGTACCATAAAT +ACTTGACCACCTGTAGTACATAAAAACCCAATCCACATCAAAACCCCCTCCCCATGCTTA +CAAGCAAGTACAGCAATCAACCCTCAACTATCACACATCAACTGCAACTCCAAAGCCACC +CCTCACCCACTAGGATACCAACAAACCTACCCACCCTTAACAGTACATAGTACATAAAGC +CATTTACCGTACATAGCACATTACAGTCAAATCCCTTCTCGTCCCCATGGATGACCCCCC +TCAGATAGGGGTCCCTTGACCACCATCCTCCGTGAAATCAATATCCCGCACAAGAGTGCT +ACTCTCCTCGCTCCGGGCCCATAACACTTGGGGGTAGCTAAAGTGAACTGTATCCGACAT +CTGGTTCCTACTTCAGGGTCATAAAGCCTAAATAGCCCACACGTTCCCCTTAAATAAGAC +ATCACGATG diff --git a/testdata/test_genome/Homo_sapiens.GRCh38.MT.gtf b/testdata/test_genome/Homo_sapiens.GRCh38.MT.gtf new file mode 100644 index 000000000..66618681c --- /dev/null +++ b/testdata/test_genome/Homo_sapiens.GRCh38.MT.gtf @@ -0,0 +1,127 @@ +##description: Homo sapiens GRCh38 mitochondrial chromosome annotations +##provider: Ensembl REST API +##sequence-region MT 1 16569 +MT ensembl_havana gene 577 647 . + . gene_id "ENSG00000210049"; gene_name "MT-TF"; gene_biotype "Mt_tRNA"; gene_source "insdc"; +MT ensembl_havana gene 648 1601 . + . gene_id "ENSG00000211459"; gene_name "MT-RNR1"; gene_biotype "Mt_rRNA"; gene_source "insdc"; +MT ensembl_havana gene 1602 1670 . + . gene_id "ENSG00000210077"; gene_name "MT-TV"; gene_biotype "Mt_tRNA"; gene_source "insdc"; +MT ensembl_havana gene 1671 3229 . + . gene_id "ENSG00000210082"; gene_name "MT-RNR2"; gene_biotype "Mt_rRNA"; gene_source "insdc"; +MT ensembl_havana gene 3230 3304 . + . gene_id "ENSG00000209082"; gene_name "MT-TL1"; gene_biotype "Mt_tRNA"; gene_source "insdc"; +MT ensembl_havana gene 3307 4262 . + . gene_id "ENSG00000198888"; gene_name "MT-ND1"; gene_biotype "protein_coding"; gene_source "insdc"; +MT ensembl_havana gene 4263 4331 . + . gene_id "ENSG00000210100"; gene_name "MT-TI"; gene_biotype "Mt_tRNA"; gene_source "insdc"; +MT ensembl_havana gene 4329 4400 . - . gene_id "ENSG00000210107"; gene_name "MT-TQ"; gene_biotype "Mt_tRNA"; gene_source "insdc"; +MT ensembl_havana gene 4402 4469 . + . gene_id "ENSG00000210112"; gene_name "MT-TM"; gene_biotype "Mt_tRNA"; gene_source "insdc"; +MT ensembl_havana gene 4470 5511 . + . gene_id "ENSG00000198763"; gene_name "MT-ND2"; gene_biotype "protein_coding"; gene_source "insdc"; +MT ensembl_havana gene 5512 5579 . + . gene_id "ENSG00000210117"; gene_name "MT-TW"; gene_biotype "Mt_tRNA"; gene_source "insdc"; +MT ensembl_havana gene 5587 5655 . - . gene_id "ENSG00000210127"; gene_name "MT-TA"; gene_biotype "Mt_tRNA"; gene_source "insdc"; +MT ensembl_havana gene 5657 5729 . - . gene_id "ENSG00000210135"; gene_name "MT-TN"; gene_biotype "Mt_tRNA"; gene_source "insdc"; +MT ensembl_havana gene 5761 5826 . - . gene_id "ENSG00000210140"; gene_name "MT-TC"; gene_biotype "Mt_tRNA"; gene_source "insdc"; +MT ensembl_havana gene 5826 5891 . - . gene_id "ENSG00000210144"; gene_name "MT-TY"; gene_biotype "Mt_tRNA"; gene_source "insdc"; +MT ensembl_havana gene 5904 7445 . + . gene_id "ENSG00000198804"; gene_name "MT-CO1"; gene_biotype "protein_coding"; gene_source "insdc"; +MT ensembl_havana gene 7446 7514 . - . gene_id "ENSG00000210151"; gene_name "MT-TS1"; gene_biotype "Mt_tRNA"; gene_source "insdc"; +MT ensembl_havana gene 7518 7585 . + . gene_id "ENSG00000210154"; gene_name "MT-TD"; gene_biotype "Mt_tRNA"; gene_source "insdc"; +MT ensembl_havana gene 7586 8269 . + . gene_id "ENSG00000198712"; gene_name "MT-CO2"; gene_biotype "protein_coding"; gene_source "insdc"; +MT ensembl_havana gene 8295 8364 . + . gene_id "ENSG00000210156"; gene_name "MT-TK"; gene_biotype "Mt_tRNA"; gene_source "insdc"; +MT ensembl_havana gene 8366 8572 . + . gene_id "ENSG00000228253"; gene_name "MT-ATP8"; gene_biotype "protein_coding"; gene_source "insdc"; +MT ensembl_havana gene 8527 9207 . + . gene_id "ENSG00000198899"; gene_name "MT-ATP6"; gene_biotype "protein_coding"; gene_source "insdc"; +MT ensembl_havana gene 9207 9990 . + . gene_id "ENSG00000198938"; gene_name "MT-CO3"; gene_biotype "protein_coding"; gene_source "insdc"; +MT ensembl_havana gene 9991 10058 . + . gene_id "ENSG00000210164"; gene_name "MT-TG"; gene_biotype "Mt_tRNA"; gene_source "insdc"; +MT ensembl_havana gene 10059 10404 . + . gene_id "ENSG00000198840"; gene_name "MT-ND3"; gene_biotype "protein_coding"; gene_source "insdc"; +MT ensembl_havana gene 10405 10469 . + . gene_id "ENSG00000210174"; gene_name "MT-TR"; gene_biotype "Mt_tRNA"; gene_source "insdc"; +MT ensembl_havana gene 10470 10766 . + . gene_id "ENSG00000212907"; gene_name "MT-ND4L"; gene_biotype "protein_coding"; gene_source "insdc"; +MT ensembl_havana gene 10760 12137 . + . gene_id "ENSG00000198886"; gene_name "MT-ND4"; gene_biotype "protein_coding"; gene_source "insdc"; +MT ensembl_havana gene 12138 12206 . + . gene_id "ENSG00000210176"; gene_name "MT-TH"; gene_biotype "Mt_tRNA"; gene_source "insdc"; +MT ensembl_havana gene 12207 12265 . + . gene_id "ENSG00000210184"; gene_name "MT-TS2"; gene_biotype "Mt_tRNA"; gene_source "insdc"; +MT ensembl_havana gene 12266 12336 . + . gene_id "ENSG00000210191"; gene_name "MT-TL2"; gene_biotype "Mt_tRNA"; gene_source "insdc"; +MT ensembl_havana gene 12337 14148 . + . gene_id "ENSG00000198786"; gene_name "MT-ND5"; gene_biotype "protein_coding"; gene_source "insdc"; +MT ensembl_havana gene 14149 14673 . - . gene_id "ENSG00000198695"; gene_name "MT-ND6"; gene_biotype "protein_coding"; gene_source "insdc"; +MT ensembl_havana gene 14674 14742 . - . gene_id "ENSG00000210194"; gene_name "MT-TE"; gene_biotype "Mt_tRNA"; gene_source "insdc"; +MT ensembl_havana gene 14747 15887 . + . gene_id "ENSG00000198727"; gene_name "MT-CYB"; gene_biotype "protein_coding"; gene_source "insdc"; +MT ensembl_havana gene 15888 15953 . + . gene_id "ENSG00000210195"; gene_name "MT-TT"; gene_biotype "Mt_tRNA"; gene_source "insdc"; +MT ensembl_havana gene 15956 16023 . - . gene_id "ENSG00000210196"; gene_name "MT-TP"; gene_biotype "Mt_tRNA"; gene_source "insdc"; +MT ensembl_havana transcript 577 647 . + . gene_id "ENSG00000210049"; transcript_id "ENST00000387314"; gene_name "MT-TF-201"; transcript_biotype "Mt_tRNA"; transcript_source "insdc"; +MT ensembl_havana transcript 648 1601 . + . gene_id "ENSG00000211459"; transcript_id "ENST00000389680"; gene_name "MT-RNR1-201"; transcript_biotype "Mt_rRNA"; transcript_source "insdc"; +MT ensembl_havana transcript 1602 1670 . + . gene_id "ENSG00000210077"; transcript_id "ENST00000387342"; gene_name "MT-TV-201"; transcript_biotype "Mt_tRNA"; transcript_source "insdc"; +MT ensembl_havana transcript 1671 3229 . + . gene_id "ENSG00000210082"; transcript_id "ENST00000387347"; gene_name "MT-RNR2-201"; transcript_biotype "Mt_rRNA"; transcript_source "insdc"; +MT ensembl_havana transcript 3230 3304 . + . gene_id "ENSG00000209082"; transcript_id "ENST00000386347"; gene_name "MT-TL1-201"; transcript_biotype "Mt_tRNA"; transcript_source "insdc"; +MT ensembl_havana transcript 3307 4262 . + . gene_id "ENSG00000198888"; transcript_id "ENST00000361390"; gene_name "MT-ND1-201"; transcript_biotype "protein_coding"; transcript_source "insdc"; +MT ensembl_havana transcript 4263 4331 . + . gene_id "ENSG00000210100"; transcript_id "ENST00000387365"; gene_name "MT-TI-201"; transcript_biotype "Mt_tRNA"; transcript_source "insdc"; +MT ensembl_havana transcript 4329 4400 . - . gene_id "ENSG00000210107"; transcript_id "ENST00000387372"; gene_name "MT-TQ-201"; transcript_biotype "Mt_tRNA"; transcript_source "insdc"; +MT ensembl_havana transcript 4402 4469 . + . gene_id "ENSG00000210112"; transcript_id "ENST00000387377"; gene_name "MT-TM-201"; transcript_biotype "Mt_tRNA"; transcript_source "insdc"; +MT ensembl_havana transcript 4470 5511 . + . gene_id "ENSG00000198763"; transcript_id "ENST00000361453"; gene_name "MT-ND2-201"; transcript_biotype "protein_coding"; transcript_source "insdc"; +MT ensembl_havana transcript 5512 5579 . + . gene_id "ENSG00000210117"; transcript_id "ENST00000387382"; gene_name "MT-TW-201"; transcript_biotype "Mt_tRNA"; transcript_source "insdc"; +MT ensembl_havana transcript 5587 5655 . - . gene_id "ENSG00000210127"; transcript_id "ENST00000387392"; gene_name "MT-TA-201"; transcript_biotype "Mt_tRNA"; transcript_source "insdc"; +MT ensembl_havana transcript 5657 5729 . - . gene_id "ENSG00000210135"; transcript_id "ENST00000387400"; gene_name "MT-TN-201"; transcript_biotype "Mt_tRNA"; transcript_source "insdc"; +MT ensembl_havana transcript 5761 5826 . - . gene_id "ENSG00000210140"; transcript_id "ENST00000387405"; gene_name "MT-TC-201"; transcript_biotype "Mt_tRNA"; transcript_source "insdc"; +MT ensembl_havana transcript 5826 5891 . - . gene_id "ENSG00000210144"; transcript_id "ENST00000387409"; gene_name "MT-TY-201"; transcript_biotype "Mt_tRNA"; transcript_source "insdc"; +MT ensembl_havana transcript 5904 7445 . + . gene_id "ENSG00000198804"; transcript_id "ENST00000361624"; gene_name "MT-CO1-201"; transcript_biotype "protein_coding"; transcript_source "insdc"; +MT ensembl_havana transcript 7446 7514 . - . gene_id "ENSG00000210151"; transcript_id "ENST00000387416"; gene_name "MT-TS1-201"; transcript_biotype "Mt_tRNA"; transcript_source "insdc"; +MT ensembl_havana transcript 7518 7585 . + . gene_id "ENSG00000210154"; transcript_id "ENST00000387419"; gene_name "MT-TD-201"; transcript_biotype "Mt_tRNA"; transcript_source "insdc"; +MT ensembl_havana transcript 7586 8269 . + . gene_id "ENSG00000198712"; transcript_id "ENST00000361739"; gene_name "MT-CO2-201"; transcript_biotype "protein_coding"; transcript_source "insdc"; +MT ensembl_havana transcript 8295 8364 . + . gene_id "ENSG00000210156"; transcript_id "ENST00000387421"; gene_name "MT-TK-201"; transcript_biotype "Mt_tRNA"; transcript_source "insdc"; +MT ensembl_havana transcript 8366 8572 . + . gene_id "ENSG00000228253"; transcript_id "ENST00000361851"; gene_name "MT-ATP8-201"; transcript_biotype "protein_coding"; transcript_source "insdc"; +MT ensembl_havana transcript 8527 9207 . + . gene_id "ENSG00000198899"; transcript_id "ENST00000361899"; gene_name "MT-ATP6-201"; transcript_biotype "protein_coding"; transcript_source "insdc"; +MT ensembl_havana transcript 9207 9990 . + . gene_id "ENSG00000198938"; transcript_id "ENST00000362079"; gene_name "MT-CO3-201"; transcript_biotype "protein_coding"; transcript_source "insdc"; +MT ensembl_havana transcript 9991 10058 . + . gene_id "ENSG00000210164"; transcript_id "ENST00000387429"; gene_name "MT-TG-201"; transcript_biotype "Mt_tRNA"; transcript_source "insdc"; +MT ensembl_havana transcript 10059 10404 . + . gene_id "ENSG00000198840"; transcript_id "ENST00000361227"; gene_name "MT-ND3-201"; transcript_biotype "protein_coding"; transcript_source "insdc"; +MT ensembl_havana transcript 10405 10469 . + . gene_id "ENSG00000210174"; transcript_id "ENST00000387439"; gene_name "MT-TR-201"; transcript_biotype "Mt_tRNA"; transcript_source "insdc"; +MT ensembl_havana transcript 10470 10766 . + . gene_id "ENSG00000212907"; transcript_id "ENST00000361335"; gene_name "MT-ND4L-201"; transcript_biotype "protein_coding"; transcript_source "insdc"; +MT ensembl_havana transcript 10760 12137 . + . gene_id "ENSG00000198886"; transcript_id "ENST00000361381"; gene_name "MT-ND4-201"; transcript_biotype "protein_coding"; transcript_source "insdc"; +MT ensembl_havana transcript 12138 12206 . + . gene_id "ENSG00000210176"; transcript_id "ENST00000387441"; gene_name "MT-TH-201"; transcript_biotype "Mt_tRNA"; transcript_source "insdc"; +MT ensembl_havana transcript 12207 12265 . + . gene_id "ENSG00000210184"; transcript_id "ENST00000387449"; gene_name "MT-TS2-201"; transcript_biotype "Mt_tRNA"; transcript_source "insdc"; +MT ensembl_havana transcript 12266 12336 . + . gene_id "ENSG00000210191"; transcript_id "ENST00000387456"; gene_name "MT-TL2-201"; transcript_biotype "Mt_tRNA"; transcript_source "insdc"; +MT ensembl_havana transcript 12337 14148 . + . gene_id "ENSG00000198786"; transcript_id "ENST00000361567"; gene_name "MT-ND5-201"; transcript_biotype "protein_coding"; transcript_source "insdc"; +MT ensembl_havana transcript 14149 14673 . - . gene_id "ENSG00000198695"; transcript_id "ENST00000361681"; gene_name "MT-ND6-201"; transcript_biotype "protein_coding"; transcript_source "insdc"; +MT ensembl_havana transcript 14674 14742 . - . gene_id "ENSG00000210194"; transcript_id "ENST00000387459"; gene_name "MT-TE-201"; transcript_biotype "Mt_tRNA"; transcript_source "insdc"; +MT ensembl_havana transcript 14747 15887 . + . gene_id "ENSG00000198727"; transcript_id "ENST00000361789"; gene_name "MT-CYB-201"; transcript_biotype "protein_coding"; transcript_source "insdc"; +MT ensembl_havana transcript 15888 15953 . + . gene_id "ENSG00000210195"; transcript_id "ENST00000387460"; gene_name "MT-TT-201"; transcript_biotype "Mt_tRNA"; transcript_source "insdc"; +MT ensembl_havana transcript 15956 16023 . - . gene_id "ENSG00000210196"; transcript_id "ENST00000387461"; gene_name "MT-TP-201"; transcript_biotype "Mt_tRNA"; transcript_source "insdc"; +MT ensembl_havana exon 577 647 . + . gene_id "ENSG00000210049"; transcript_id "ENST00000387314"; exon_id "ENSE00001544501"; exon_number "1"; +MT ensembl_havana exon 648 1601 . + . gene_id "ENSG00000211459"; transcript_id "ENST00000389680"; exon_id "ENSE00001544499"; exon_number "1"; +MT ensembl_havana exon 1602 1670 . + . gene_id "ENSG00000210077"; transcript_id "ENST00000387342"; exon_id "ENSE00001544498"; exon_number "1"; +MT ensembl_havana exon 1671 3229 . + . gene_id "ENSG00000210082"; transcript_id "ENST00000387347"; exon_id "ENSE00001544497"; exon_number "1"; +MT ensembl_havana exon 3230 3304 . + . gene_id "ENSG00000209082"; transcript_id "ENST00000386347"; exon_id "ENSE00002006242"; exon_number "1"; +MT ensembl_havana exon 3307 4262 . + . gene_id "ENSG00000198888"; transcript_id "ENST00000361390"; exon_id "ENSE00001435714"; exon_number "1"; +MT ensembl_havana exon 4263 4331 . + . gene_id "ENSG00000210100"; transcript_id "ENST00000387365"; exon_id "ENSE00001993597"; exon_number "1"; +MT ensembl_havana exon 4329 4400 . - . gene_id "ENSG00000210107"; transcript_id "ENST00000387372"; exon_id "ENSE00001544494"; exon_number "1"; +MT ensembl_havana exon 4402 4469 . + . gene_id "ENSG00000210112"; transcript_id "ENST00000387377"; exon_id "ENSE00001544493"; exon_number "1"; +MT ensembl_havana exon 4470 5511 . + . gene_id "ENSG00000198763"; transcript_id "ENST00000361453"; exon_id "ENSE00001435686"; exon_number "1"; +MT ensembl_havana exon 5512 5579 . + . gene_id "ENSG00000210117"; transcript_id "ENST00000387382"; exon_id "ENSE00001544492"; exon_number "1"; +MT ensembl_havana exon 5587 5655 . - . gene_id "ENSG00000210127"; transcript_id "ENST00000387392"; exon_id "ENSE00001544491"; exon_number "1"; +MT ensembl_havana exon 5657 5729 . - . gene_id "ENSG00000210135"; transcript_id "ENST00000387400"; exon_id "ENSE00001544490"; exon_number "1"; +MT ensembl_havana exon 5761 5826 . - . gene_id "ENSG00000210140"; transcript_id "ENST00000387405"; exon_id "ENSE00001544489"; exon_number "1"; +MT ensembl_havana exon 5826 5891 . - . gene_id "ENSG00000210144"; transcript_id "ENST00000387409"; exon_id "ENSE00001544488"; exon_number "1"; +MT ensembl_havana exon 5904 7445 . + . gene_id "ENSG00000198804"; transcript_id "ENST00000361624"; exon_id "ENSE00001435647"; exon_number "1"; +MT ensembl_havana exon 7446 7514 . - . gene_id "ENSG00000210151"; transcript_id "ENST00000387416"; exon_id "ENSE00001544487"; exon_number "1"; +MT ensembl_havana exon 7518 7585 . + . gene_id "ENSG00000210154"; transcript_id "ENST00000387419"; exon_id "ENSE00001544486"; exon_number "1"; +MT ensembl_havana exon 7586 8269 . + . gene_id "ENSG00000198712"; transcript_id "ENST00000361739"; exon_id "ENSE00001435613"; exon_number "1"; +MT ensembl_havana exon 8295 8364 . + . gene_id "ENSG00000210156"; transcript_id "ENST00000387421"; exon_id "ENSE00001544484"; exon_number "1"; +MT ensembl_havana exon 8366 8572 . + . gene_id "ENSG00000228253"; transcript_id "ENST00000361851"; exon_id "ENSE00001435286"; exon_number "1"; +MT ensembl_havana exon 8527 9207 . + . gene_id "ENSG00000198899"; transcript_id "ENST00000361899"; exon_id "ENSE00001727012"; exon_number "1"; +MT ensembl_havana exon 9207 9990 . + . gene_id "ENSG00000198938"; transcript_id "ENST00000362079"; exon_id "ENSE00001608952"; exon_number "1"; +MT ensembl_havana exon 9991 10058 . + . gene_id "ENSG00000210164"; transcript_id "ENST00000387429"; exon_id "ENSE00001544483"; exon_number "1"; +MT ensembl_havana exon 10059 10404 . + . gene_id "ENSG00000198840"; transcript_id "ENST00000361227"; exon_id "ENSE00001435444"; exon_number "1"; +MT ensembl_havana exon 10405 10469 . + . gene_id "ENSG00000210174"; transcript_id "ENST00000387439"; exon_id "ENSE00001544482"; exon_number "1"; +MT ensembl_havana exon 10470 10766 . + . gene_id "ENSG00000212907"; transcript_id "ENST00000361335"; exon_id "ENSE00001596097"; exon_number "1"; +MT ensembl_havana exon 10760 12137 . + . gene_id "ENSG00000198886"; transcript_id "ENST00000361381"; exon_id "ENSE00001666004"; exon_number "1"; +MT ensembl_havana exon 12138 12206 . + . gene_id "ENSG00000210176"; transcript_id "ENST00000387441"; exon_id "ENSE00001544480"; exon_number "1"; +MT ensembl_havana exon 12207 12265 . + . gene_id "ENSG00000210184"; transcript_id "ENST00000387449"; exon_id "ENSE00001544479"; exon_number "1"; +MT ensembl_havana exon 12266 12336 . + . gene_id "ENSG00000210191"; transcript_id "ENST00000387456"; exon_id "ENSE00001544478"; exon_number "1"; +MT ensembl_havana exon 12337 14148 . + . gene_id "ENSG00000198786"; transcript_id "ENST00000361567"; exon_id "ENSE00001435330"; exon_number "1"; +MT ensembl_havana exon 14149 14673 . - . gene_id "ENSG00000198695"; transcript_id "ENST00000361681"; exon_id "ENSE00001434974"; exon_number "1"; +MT ensembl_havana exon 14674 14742 . - . gene_id "ENSG00000210194"; transcript_id "ENST00000387459"; exon_id "ENSE00001544476"; exon_number "1"; +MT ensembl_havana exon 14747 15887 . + . gene_id "ENSG00000198727"; transcript_id "ENST00000361789"; exon_id "ENSE00001436074"; exon_number "1"; +MT ensembl_havana exon 15888 15953 . + . gene_id "ENSG00000210195"; transcript_id "ENST00000387460"; exon_id "ENSE00001544475"; exon_number "1"; +MT ensembl_havana exon 15956 16023 . - . gene_id "ENSG00000210196"; transcript_id "ENST00000387461"; exon_id "ENSE00001544473"; exon_number "1"; +MT ensembl_havana CDS 3307 4262 . + 0 gene_id "ENSG00000198888"; transcript_id "ENST00000361390"; +MT ensembl_havana CDS 4470 5511 . + 0 gene_id "ENSG00000198763"; transcript_id "ENST00000361453"; +MT ensembl_havana CDS 5904 7445 . + 0 gene_id "ENSG00000198804"; transcript_id "ENST00000361624"; +MT ensembl_havana CDS 7586 8269 . + 0 gene_id "ENSG00000198712"; transcript_id "ENST00000361739"; +MT ensembl_havana CDS 8366 8572 . + 0 gene_id "ENSG00000228253"; transcript_id "ENST00000361851"; +MT ensembl_havana CDS 8527 9207 . + 0 gene_id "ENSG00000198899"; transcript_id "ENST00000361899"; +MT ensembl_havana CDS 9207 9990 . + 0 gene_id "ENSG00000198938"; transcript_id "ENST00000362079"; +MT ensembl_havana CDS 10059 10404 . + 0 gene_id "ENSG00000198840"; transcript_id "ENST00000361227"; +MT ensembl_havana CDS 10470 10766 . + 0 gene_id "ENSG00000212907"; transcript_id "ENST00000361335"; +MT ensembl_havana CDS 10760 12137 . + 0 gene_id "ENSG00000198886"; transcript_id "ENST00000361381"; +MT ensembl_havana CDS 12337 14148 . + 0 gene_id "ENSG00000198786"; transcript_id "ENST00000361567"; +MT ensembl_havana CDS 14149 14673 . - 0 gene_id "ENSG00000198695"; transcript_id "ENST00000361681"; +MT ensembl_havana CDS 14747 15887 . + 0 gene_id "ENSG00000198727"; transcript_id "ENST00000361789"; diff --git a/testdata/test_genome/samplesheet.genome_test.csv b/testdata/test_genome/samplesheet.genome_test.csv new file mode 100644 index 000000000..f5ec4e0ed --- /dev/null +++ b/testdata/test_genome/samplesheet.genome_test.csv @@ -0,0 +1,3 @@ +sample,sample_id,fastq_1,fastq_2,method,principle,sample_group,condition,replicate,organism,pH,adapter_3p,adapter_5p,umi_pattern +HEK293T_untreated_r1,GSM4333255,https://raw.githubusercontent.com/nf-core/test-datasets/refs/heads/rnastructurome/testdata/HEK293T_untreated_r1.fastq.gz,,SHAPE,RT-stop,HEK293T,untreated,1,Homo sapiens,7.5,,, +HEK293T_treated_r1,GSM4333256,https://raw.githubusercontent.com/nf-core/test-datasets/refs/heads/rnastructurome/testdata/HEK293T_treated_r1.fastq.gz,,SHAPE,RT-stop,HEK293T,treated,1,Homo sapiens,7.5,,, diff --git a/testdata/test_prokaryote/16S_rRNA.fa b/testdata/test_prokaryote/16S_rRNA.fa new file mode 100644 index 000000000..f461f6435 --- /dev/null +++ b/testdata/test_prokaryote/16S_rRNA.fa @@ -0,0 +1,2 @@ +>16S_rRNA Escherichia coli 16S ribosomal RNA +AAATTGAAGAGTTTGATCATGGCTCAGATTGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAACGGTAACAGGAAGAAGCTTGCTTCTTTGCTGACGAGTGGCGGACGGGTGAGTAATGTCTGGGAAACTGCCTGATGGAGGGGGATAACTACTGGAAACGGTAGCTAATACCGCATAACGTCGCAAGACCAAAGAGGGGGACCTTCGGGCCTCTTGCCATCGGATGTGCCCAGATGGGATTAGCTAGTAGGTGGGGTAACGGCTCACCTAGGCGACGATCCCTAGCTGGTCTGAGAGGATGACCAGCCACACTGGAACTGAGACACGGTCCAGACTCCTACGGGAGGCAGCAGTGGGGAATATTGCACAATGGGCGCAAGCCTGATGCAGCCATGCCGCGTGTATGAAGAAGGCCTTCGGGTTGTAAAGTACTTTCAGCGGGGAGGAAGGGAGTAAAGTTAATACCTTTGCTCATTGACGTTACCCGCAGAAGAAGCACCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCACGCAGGCGGTTTGTTAAGTCAGATGTGAAATCCCCGGGCTCAACCTGGGAACTGCATCTGATACTGGCAAGCTTGAGTCTCGTAGAGGGGGGTAGAATTCCAGGTGTAGCGGTGAAATGCGTAGAGATCTGGAGGAATACCGGTGGCGAAGGCGGCCCCCTGGACGAAGACTGACGCTCAGGTGCGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGTCGACTTGGAGGTTGTGCCCTTGAGGCGTGGCTTCCGGAGCTAACGCGTTAAGTCGACCGCCTGGGGAGTACGGCCGCAAGGTTAAAACTCAAATGAATTGACGGGGGCCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGATGCAACGCGAAGAACCTTACCTGGTCTTGACATCCACGGAAGTTTTCAGAGATGAGAATGTGCCTTCGGGAACCGTGAGACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTTGTGAAATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTATCCTTTGTTGCCAGCGGTCCGGCCGGGAACTCAAAGGAGACTGCCAGTGATAAACTGGAGGAAGGTGGGGATGACGTCAAGTCATCATGGCCCTTACGACCAGGGCTACACACGTGCTACAATGGCGCATACAAAGAGAAGCGACCTCGCGAGAGCAAGCGGACCTCATAAAGTGCGTCGTAGTCCGGATTGGAGTCTGCAACTCGACTCCATGAAGTCGGAATCGCTAGTAATCGTGGATCAGAATGCCACGGTGAATACGTTCCCGGGCCTTGTACACACCGCCCGTCACACCATGGGAGTGGGTTGCAAAAGAAGTAGGTAGCTTAACCTTCGGGAGGGCGCTTACCACTTTGTGATTCATGACTGGGGTGAAGTCGTAACAAGGTAACCGTAGGGGAACCTGCGGTTGGATCACCTCCTTA diff --git a/testdata/test_prokaryote/16S_rRNA.gtf b/testdata/test_prokaryote/16S_rRNA.gtf new file mode 100644 index 000000000..ec9dae26b --- /dev/null +++ b/testdata/test_prokaryote/16S_rRNA.gtf @@ -0,0 +1,3 @@ +16S_rRNA testdata gene 1 1542 . + . gene_id "16S_rRNA"; gene_name "16S_rRNA"; gene_biotype "rRNA"; +16S_rRNA testdata transcript 1 1542 . + . gene_id "16S_rRNA"; transcript_id "16S_rRNA"; gene_name "16S_rRNA"; transcript_biotype "rRNA"; +16S_rRNA testdata exon 1 1542 . + . gene_id "16S_rRNA"; transcript_id "16S_rRNA"; exon_id "16S_rRNA.exon1"; exon_number "1"; diff --git a/testdata/test_prokaryote/16S_rRNA.reference.db b/testdata/test_prokaryote/16S_rRNA.reference.db new file mode 100644 index 000000000..1f7ada7c9 --- /dev/null +++ b/testdata/test_prokaryote/16S_rRNA.reference.db @@ -0,0 +1,3 @@ +>16S_rRNA +AAATTGAAGAGTTTGATCATGGCTCAGATTGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAACGGTAACAGGAAGAAGCTTGCTTCTTTGCTGACGAGTGGCGGACGGGTGAGTAATGTCTGGGAAACTGCCTGATGGAGGGGGATAACTACTGGAAACGGTAGCTAATACCGCATAACGTCGCAAGACCAAAGAGGGGGACCTTCGGGCCTCTTGCCATCGGATGTGCCCAGATGGGATTAGCTAGTAGGTGGGGTAACGGCTCACCTAGGCGACGATCCCTAGCTGGTCTGAGAGGATGACCAGCCACACTGGAACTGAGACACGGTCCAGACTCCTACGGGAGGCAGCAGTGGGGAATATTGCACAATGGGCGCAAGCCTGATGCAGCCATGCCGCGTGTATGAAGAAGGCCTTCGGGTTGTAAAGTACTTTCAGCGGGGAGGAAGGGAGTAAAGTTAATACCTTTGCTCATTGACGTTACCCGCAGAAGAAGCACCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCACGCAGGCGGTTTGTTAAGTCAGATGTGAAATCCCCGGGCTCAACCTGGGAACTGCATCTGATACTGGCAAGCTTGAGTCTCGTAGAGGGGGGTAGAATTCCAGGTGTAGCGGTGAAATGCGTAGAGATCTGGAGGAATACCGGTGGCGAAGGCGGCCCCCTGGACGAAGACTGACGCTCAGGTGCGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGTCGACTTGGAGGTTGTGCCCTTGAGGCGTGGCTTCCGGAGCTAACGCGTTAAGTCGACCGCCTGGGGAGTACGGCCGCAAGGTTAAAACTCAAATGAATTGACGGGGGCCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGATGCAACGCGAAGAACCTTACCTGGTCTTGACATCCACGGAAGTTTTCAGAGATGAGAATGTGCCTTCGGGAACCGTGAGACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTTGTGAAATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTATCCTTTGTTGCCAGCGGTCCGGCCGGGAACTCAAAGGAGACTGCCAGTGATAAACTGGAGGAAGGTGGGGATGACGTCAAGTCATCATGGCCCTTACGACCAGGGCTACACACGTGCTACAATGGCGCATACAAAGAGAAGCGACCTCGCGAGAGCAAGCGGACCTCATAAAGTGCGTCGTAGTCCGGATTGGAGTCTGCAACTCGACTCCATGAAGTCGGAATCGCTAGTAATCGTGGATCAGAATGCCACGGTGAATACGTTCCCGGGCCTTGTACACACCGCCCGTCACACCATGGGAGTGGGTTGCAAAAGAAGTAGGTAGCTTAACCTTCGGGAGGGCGCTTACCACTTTGTGATTCATGACTGGGGTGAAGTCGTAACAAGGTAACCGTAGGGGAACCTGCGGTTGGATCACCTCCTTA +........(((((...[[[.))))).((((.((((((.(((((((((....(((.(((..(((..((.((..((((((((((....))))))).)))))..)))))......(((......(((((((..((...(((((((.(.((.....((((((....))))))......)).).....(((....)))....((((((..((....)))))))).)))))))..)).)))))))(((....(((..((((((((.......)))))))))))......)))..((((((((....))))...))))))).(((((............))))).((((....))))...)))))).).....(.(((...(((((....))))).)))).)).))))))..((((......((((....)))).....)))).[.(.(((((...(....((((((((.......)))))))).....)....)))))...])..((((([[[...(((((.....((.]]])).......)))))))))).))))))))))..........((([[...(.((((...(((.(((((((.(((((((((((.....((((((.....))))))...)))))))))..)))))))))...((((((((...((((((((...((((((((...(((......)))......))))))))...).......((....)).)))))))..)))).))))...)))...))))....((((((...((...((((.........))))...))))))))..........((((((..((((((((((((.....))))))))))))...((..]])).....)))))))))).(((......((((....))))....)))...]]]..(((((.(((((((.((..((((((.((((((((((....((((........))))........(((((((......(((((((..(((((((....))))))).(.((....)).)))))).))..((.((((..((((((.((...(((((((((....)))..((((......))))..)))))).....((((.(((((((...((..(((.....)))))....)))))))..((.(((((.....))))).)).....))))....)).).)))...))))))))....)))))))...)).)))))))).)...(((((((.....(((..((...(((....)))...))....))).....)))))))......(....(((((((........)))))))....).....))))).....(((((((.........)))))))......))...)))))))))).))..(.(..((.(.((((.(((..((((((((((((....((((((.((((..((....)).))))))))))...))))))))))))..))).))))..).))...)..).((((((((((....))))))))))............. diff --git a/testdata/test_prokaryote/E_coli_DH5a_treated.16S_rRNA.fastq.gz b/testdata/test_prokaryote/E_coli_DH5a_treated.16S_rRNA.fastq.gz new file mode 100644 index 000000000..c2c834420 Binary files /dev/null and b/testdata/test_prokaryote/E_coli_DH5a_treated.16S_rRNA.fastq.gz differ diff --git a/testdata/test_prokaryote/samplesheet.prokaryote_test.csv b/testdata/test_prokaryote/samplesheet.prokaryote_test.csv new file mode 100644 index 000000000..901224c89 --- /dev/null +++ b/testdata/test_prokaryote/samplesheet.prokaryote_test.csv @@ -0,0 +1,2 @@ +sample,sample_id,fastq_1,fastq_2,method,principle,sample_group,condition,replicate,organism,pH,adapter_3p,adapter_5p,umi_pattern +E_coli_DH5a_treated,GSM7885844,https://raw.githubusercontent.com/nf-core/test-datasets/refs/heads/rnastructurome/testdata/test_prokaryote/E_coli_DH5a_treated.16S_rRNA.fastq.gz,,DMS,MaP,E_coli_DH5a,treated,1,Escherichia coli,,,, diff --git a/testdata/test_transcriptome/Homo_sapiens.GRCh38.ENST00000389680.fa b/testdata/test_transcriptome/Homo_sapiens.GRCh38.ENST00000389680.fa new file mode 100644 index 000000000..29edb6bab --- /dev/null +++ b/testdata/test_transcriptome/Homo_sapiens.GRCh38.ENST00000389680.fa @@ -0,0 +1,17 @@ +>ENST00000389680.2 +AATAGGTTTGGTCCTAGCCTTTCTATTAGCTCTTAGTAAGATTACACATGCAAGCATCCC +CGTTCCAGTGAGTTCACCCTCTAAATCACCACGATCAAAAGGAACAAGCATCAAGCACGC +AGCAATGCAGCTCAAAACGCTTAGCCTAGCCACACCCCCACGGGAAACAGCAGTGATTAA +CCTTTAGCAATAAACGAAAGTTTAACTAAGCTATACTAACCCCAGGGTTGGTCAATTTCG +TGCCAGCCACCGCGGTCACACGATTAACCCAAGTCAATAGAAGCCGGCGTAAAGAGTGTT +TTAGATCACCCCCTCCCCAATAAAGCTAAAACTCACCTGAGTTGTAAAAAACTCCAGTTG +ACACAAAATAGACTACGAAAGTGGCTTTAACATATCTGAACACACAATAGCTAAGACCCA +AACTGGGATTAGATACCCCACTATGCTTAGCCCTAAACCTCAACAGTTAAATCAACAAAA +CTGCTCGCCAGAACACTACGAGCCACAGCTTAAAACTCAAAGGACCTGGCGGTGCTTCAT +ATCCCTCTAGAGGAGCCTGTTCTGTAATCGATAAACCCCGATCAACCTCACCACCTCTTG +CTCAGCCTATATACCGCCATCTTCAGCAAACCCTGATGAAGGCTACAAAGTAAGCGCAAG +TACCCACGTAAAGACGTTAGGTCAAGGTGTAGCCCATGAGGTGGCAAGAAATGGGCTACA +TTTTCTACCCCAGAAAACTACGATAGCCCTTATGAAACTTAAGGGTCGAAGGTGGATTTA +GCAGTAAACTAAGAGTAGAGTGCTTAGTTGAACAGGGCCCTGAAGCGCGTACACACCGCC +CGTCACCCTCCTCAAGTATACTTCAAAGGACATTTAACTAAAACCCCTACGCATTTATAT +AGAGGAGACAAGTCGTAACATGGTAAGTGTACTGGAAAGTGCACTTGGACGAAC diff --git a/testdata/test_transcriptome/Homo_sapiens.GRCh38.MT.gtf b/testdata/test_transcriptome/Homo_sapiens.GRCh38.MT.gtf new file mode 100644 index 000000000..66618681c --- /dev/null +++ b/testdata/test_transcriptome/Homo_sapiens.GRCh38.MT.gtf @@ -0,0 +1,127 @@ +##description: Homo sapiens GRCh38 mitochondrial chromosome annotations +##provider: Ensembl REST API +##sequence-region MT 1 16569 +MT ensembl_havana gene 577 647 . + . gene_id "ENSG00000210049"; gene_name "MT-TF"; gene_biotype "Mt_tRNA"; gene_source "insdc"; +MT ensembl_havana gene 648 1601 . + . gene_id "ENSG00000211459"; gene_name "MT-RNR1"; gene_biotype "Mt_rRNA"; gene_source "insdc"; +MT ensembl_havana gene 1602 1670 . + . gene_id "ENSG00000210077"; gene_name "MT-TV"; gene_biotype "Mt_tRNA"; gene_source "insdc"; +MT ensembl_havana gene 1671 3229 . + . gene_id "ENSG00000210082"; gene_name "MT-RNR2"; gene_biotype "Mt_rRNA"; gene_source "insdc"; +MT ensembl_havana gene 3230 3304 . + . gene_id "ENSG00000209082"; gene_name "MT-TL1"; gene_biotype "Mt_tRNA"; gene_source "insdc"; +MT ensembl_havana gene 3307 4262 . + . gene_id "ENSG00000198888"; gene_name "MT-ND1"; gene_biotype "protein_coding"; gene_source "insdc"; +MT ensembl_havana gene 4263 4331 . + . gene_id "ENSG00000210100"; gene_name "MT-TI"; gene_biotype "Mt_tRNA"; gene_source "insdc"; +MT ensembl_havana gene 4329 4400 . - . gene_id "ENSG00000210107"; gene_name "MT-TQ"; gene_biotype "Mt_tRNA"; gene_source "insdc"; +MT ensembl_havana gene 4402 4469 . + . gene_id "ENSG00000210112"; gene_name "MT-TM"; gene_biotype "Mt_tRNA"; gene_source "insdc"; +MT ensembl_havana gene 4470 5511 . + . gene_id "ENSG00000198763"; gene_name "MT-ND2"; gene_biotype "protein_coding"; gene_source "insdc"; +MT ensembl_havana gene 5512 5579 . + . gene_id "ENSG00000210117"; gene_name "MT-TW"; gene_biotype "Mt_tRNA"; gene_source "insdc"; +MT ensembl_havana gene 5587 5655 . - . gene_id "ENSG00000210127"; gene_name "MT-TA"; gene_biotype "Mt_tRNA"; gene_source "insdc"; +MT ensembl_havana gene 5657 5729 . - . gene_id "ENSG00000210135"; gene_name "MT-TN"; gene_biotype "Mt_tRNA"; gene_source "insdc"; +MT ensembl_havana gene 5761 5826 . - . gene_id "ENSG00000210140"; gene_name "MT-TC"; gene_biotype "Mt_tRNA"; gene_source "insdc"; +MT ensembl_havana gene 5826 5891 . - . gene_id "ENSG00000210144"; gene_name "MT-TY"; gene_biotype "Mt_tRNA"; gene_source "insdc"; +MT ensembl_havana gene 5904 7445 . + . gene_id "ENSG00000198804"; gene_name "MT-CO1"; gene_biotype "protein_coding"; gene_source "insdc"; +MT ensembl_havana gene 7446 7514 . - . gene_id "ENSG00000210151"; gene_name "MT-TS1"; gene_biotype "Mt_tRNA"; gene_source "insdc"; +MT ensembl_havana gene 7518 7585 . + . gene_id "ENSG00000210154"; gene_name "MT-TD"; gene_biotype "Mt_tRNA"; gene_source "insdc"; +MT ensembl_havana gene 7586 8269 . + . gene_id "ENSG00000198712"; gene_name "MT-CO2"; gene_biotype "protein_coding"; gene_source "insdc"; +MT ensembl_havana gene 8295 8364 . + . gene_id "ENSG00000210156"; gene_name "MT-TK"; gene_biotype "Mt_tRNA"; gene_source "insdc"; +MT ensembl_havana gene 8366 8572 . + . gene_id "ENSG00000228253"; gene_name "MT-ATP8"; gene_biotype "protein_coding"; gene_source "insdc"; +MT ensembl_havana gene 8527 9207 . + . gene_id "ENSG00000198899"; gene_name "MT-ATP6"; gene_biotype "protein_coding"; gene_source "insdc"; +MT ensembl_havana gene 9207 9990 . + . gene_id "ENSG00000198938"; gene_name "MT-CO3"; gene_biotype "protein_coding"; gene_source "insdc"; +MT ensembl_havana gene 9991 10058 . + . gene_id "ENSG00000210164"; gene_name "MT-TG"; gene_biotype "Mt_tRNA"; gene_source "insdc"; +MT ensembl_havana gene 10059 10404 . + . gene_id "ENSG00000198840"; gene_name "MT-ND3"; gene_biotype "protein_coding"; gene_source "insdc"; +MT ensembl_havana gene 10405 10469 . + . gene_id "ENSG00000210174"; gene_name "MT-TR"; gene_biotype "Mt_tRNA"; gene_source "insdc"; +MT ensembl_havana gene 10470 10766 . + . gene_id "ENSG00000212907"; gene_name "MT-ND4L"; gene_biotype "protein_coding"; gene_source "insdc"; +MT ensembl_havana gene 10760 12137 . + . gene_id "ENSG00000198886"; gene_name "MT-ND4"; gene_biotype "protein_coding"; gene_source "insdc"; +MT ensembl_havana gene 12138 12206 . + . gene_id "ENSG00000210176"; gene_name "MT-TH"; gene_biotype "Mt_tRNA"; gene_source "insdc"; +MT ensembl_havana gene 12207 12265 . + . gene_id "ENSG00000210184"; gene_name "MT-TS2"; gene_biotype "Mt_tRNA"; gene_source "insdc"; +MT ensembl_havana gene 12266 12336 . + . gene_id "ENSG00000210191"; gene_name "MT-TL2"; gene_biotype "Mt_tRNA"; gene_source "insdc"; +MT ensembl_havana gene 12337 14148 . + . gene_id "ENSG00000198786"; gene_name "MT-ND5"; gene_biotype "protein_coding"; gene_source "insdc"; +MT ensembl_havana gene 14149 14673 . - . gene_id "ENSG00000198695"; gene_name "MT-ND6"; gene_biotype "protein_coding"; gene_source "insdc"; +MT ensembl_havana gene 14674 14742 . - . gene_id "ENSG00000210194"; gene_name "MT-TE"; gene_biotype "Mt_tRNA"; gene_source "insdc"; +MT ensembl_havana gene 14747 15887 . + . gene_id "ENSG00000198727"; gene_name "MT-CYB"; gene_biotype "protein_coding"; gene_source "insdc"; +MT ensembl_havana gene 15888 15953 . + . gene_id "ENSG00000210195"; gene_name "MT-TT"; gene_biotype "Mt_tRNA"; gene_source "insdc"; +MT ensembl_havana gene 15956 16023 . - . gene_id "ENSG00000210196"; gene_name "MT-TP"; gene_biotype "Mt_tRNA"; gene_source "insdc"; +MT ensembl_havana transcript 577 647 . + . gene_id "ENSG00000210049"; transcript_id "ENST00000387314"; gene_name "MT-TF-201"; transcript_biotype "Mt_tRNA"; transcript_source "insdc"; +MT ensembl_havana transcript 648 1601 . + . gene_id "ENSG00000211459"; transcript_id "ENST00000389680"; gene_name "MT-RNR1-201"; transcript_biotype "Mt_rRNA"; transcript_source "insdc"; +MT ensembl_havana transcript 1602 1670 . + . gene_id "ENSG00000210077"; transcript_id "ENST00000387342"; gene_name "MT-TV-201"; transcript_biotype "Mt_tRNA"; transcript_source "insdc"; +MT ensembl_havana transcript 1671 3229 . + . gene_id "ENSG00000210082"; transcript_id "ENST00000387347"; gene_name "MT-RNR2-201"; transcript_biotype "Mt_rRNA"; transcript_source "insdc"; +MT ensembl_havana transcript 3230 3304 . + . gene_id "ENSG00000209082"; transcript_id "ENST00000386347"; gene_name "MT-TL1-201"; transcript_biotype "Mt_tRNA"; transcript_source "insdc"; +MT ensembl_havana transcript 3307 4262 . + . gene_id "ENSG00000198888"; transcript_id "ENST00000361390"; gene_name "MT-ND1-201"; transcript_biotype "protein_coding"; transcript_source "insdc"; +MT ensembl_havana transcript 4263 4331 . + . gene_id "ENSG00000210100"; transcript_id "ENST00000387365"; gene_name "MT-TI-201"; transcript_biotype "Mt_tRNA"; transcript_source "insdc"; +MT ensembl_havana transcript 4329 4400 . - . gene_id "ENSG00000210107"; transcript_id "ENST00000387372"; gene_name "MT-TQ-201"; transcript_biotype "Mt_tRNA"; transcript_source "insdc"; +MT ensembl_havana transcript 4402 4469 . + . gene_id "ENSG00000210112"; transcript_id "ENST00000387377"; gene_name "MT-TM-201"; transcript_biotype "Mt_tRNA"; transcript_source "insdc"; +MT ensembl_havana transcript 4470 5511 . + . gene_id "ENSG00000198763"; transcript_id "ENST00000361453"; gene_name "MT-ND2-201"; transcript_biotype "protein_coding"; transcript_source "insdc"; +MT ensembl_havana transcript 5512 5579 . + . gene_id "ENSG00000210117"; transcript_id "ENST00000387382"; gene_name "MT-TW-201"; transcript_biotype "Mt_tRNA"; transcript_source "insdc"; +MT ensembl_havana transcript 5587 5655 . - . gene_id "ENSG00000210127"; transcript_id "ENST00000387392"; gene_name "MT-TA-201"; transcript_biotype "Mt_tRNA"; transcript_source "insdc"; +MT ensembl_havana transcript 5657 5729 . - . gene_id "ENSG00000210135"; transcript_id "ENST00000387400"; gene_name "MT-TN-201"; transcript_biotype "Mt_tRNA"; transcript_source "insdc"; +MT ensembl_havana transcript 5761 5826 . - . gene_id "ENSG00000210140"; transcript_id "ENST00000387405"; gene_name "MT-TC-201"; transcript_biotype "Mt_tRNA"; transcript_source "insdc"; +MT ensembl_havana transcript 5826 5891 . - . gene_id "ENSG00000210144"; transcript_id "ENST00000387409"; gene_name "MT-TY-201"; transcript_biotype "Mt_tRNA"; transcript_source "insdc"; +MT ensembl_havana transcript 5904 7445 . + . gene_id "ENSG00000198804"; transcript_id "ENST00000361624"; gene_name "MT-CO1-201"; transcript_biotype "protein_coding"; transcript_source "insdc"; +MT ensembl_havana transcript 7446 7514 . - . gene_id "ENSG00000210151"; transcript_id "ENST00000387416"; gene_name "MT-TS1-201"; transcript_biotype "Mt_tRNA"; transcript_source "insdc"; +MT ensembl_havana transcript 7518 7585 . + . gene_id "ENSG00000210154"; transcript_id "ENST00000387419"; gene_name "MT-TD-201"; transcript_biotype "Mt_tRNA"; transcript_source "insdc"; +MT ensembl_havana transcript 7586 8269 . + . gene_id "ENSG00000198712"; transcript_id "ENST00000361739"; gene_name "MT-CO2-201"; transcript_biotype "protein_coding"; transcript_source "insdc"; +MT ensembl_havana transcript 8295 8364 . + . gene_id "ENSG00000210156"; transcript_id "ENST00000387421"; gene_name "MT-TK-201"; transcript_biotype "Mt_tRNA"; transcript_source "insdc"; +MT ensembl_havana transcript 8366 8572 . + . gene_id "ENSG00000228253"; transcript_id "ENST00000361851"; gene_name "MT-ATP8-201"; transcript_biotype "protein_coding"; transcript_source "insdc"; +MT ensembl_havana transcript 8527 9207 . + . gene_id "ENSG00000198899"; transcript_id "ENST00000361899"; gene_name "MT-ATP6-201"; transcript_biotype "protein_coding"; transcript_source "insdc"; +MT ensembl_havana transcript 9207 9990 . + . gene_id "ENSG00000198938"; transcript_id "ENST00000362079"; gene_name "MT-CO3-201"; transcript_biotype "protein_coding"; transcript_source "insdc"; +MT ensembl_havana transcript 9991 10058 . + . gene_id "ENSG00000210164"; transcript_id "ENST00000387429"; gene_name "MT-TG-201"; transcript_biotype "Mt_tRNA"; transcript_source "insdc"; +MT ensembl_havana transcript 10059 10404 . + . gene_id "ENSG00000198840"; transcript_id "ENST00000361227"; gene_name "MT-ND3-201"; transcript_biotype "protein_coding"; transcript_source "insdc"; +MT ensembl_havana transcript 10405 10469 . + . gene_id "ENSG00000210174"; transcript_id "ENST00000387439"; gene_name "MT-TR-201"; transcript_biotype "Mt_tRNA"; transcript_source "insdc"; +MT ensembl_havana transcript 10470 10766 . + . gene_id "ENSG00000212907"; transcript_id "ENST00000361335"; gene_name "MT-ND4L-201"; transcript_biotype "protein_coding"; transcript_source "insdc"; +MT ensembl_havana transcript 10760 12137 . + . gene_id "ENSG00000198886"; transcript_id "ENST00000361381"; gene_name "MT-ND4-201"; transcript_biotype "protein_coding"; transcript_source "insdc"; +MT ensembl_havana transcript 12138 12206 . + . gene_id "ENSG00000210176"; transcript_id "ENST00000387441"; gene_name "MT-TH-201"; transcript_biotype "Mt_tRNA"; transcript_source "insdc"; +MT ensembl_havana transcript 12207 12265 . + . gene_id "ENSG00000210184"; transcript_id "ENST00000387449"; gene_name "MT-TS2-201"; transcript_biotype "Mt_tRNA"; transcript_source "insdc"; +MT ensembl_havana transcript 12266 12336 . + . gene_id "ENSG00000210191"; transcript_id "ENST00000387456"; gene_name "MT-TL2-201"; transcript_biotype "Mt_tRNA"; transcript_source "insdc"; +MT ensembl_havana transcript 12337 14148 . + . gene_id "ENSG00000198786"; transcript_id "ENST00000361567"; gene_name "MT-ND5-201"; transcript_biotype "protein_coding"; transcript_source "insdc"; +MT ensembl_havana transcript 14149 14673 . - . gene_id "ENSG00000198695"; transcript_id "ENST00000361681"; gene_name "MT-ND6-201"; transcript_biotype "protein_coding"; transcript_source "insdc"; +MT ensembl_havana transcript 14674 14742 . - . gene_id "ENSG00000210194"; transcript_id "ENST00000387459"; gene_name "MT-TE-201"; transcript_biotype "Mt_tRNA"; transcript_source "insdc"; +MT ensembl_havana transcript 14747 15887 . + . gene_id "ENSG00000198727"; transcript_id "ENST00000361789"; gene_name "MT-CYB-201"; transcript_biotype "protein_coding"; transcript_source "insdc"; +MT ensembl_havana transcript 15888 15953 . + . gene_id "ENSG00000210195"; transcript_id "ENST00000387460"; gene_name "MT-TT-201"; transcript_biotype "Mt_tRNA"; transcript_source "insdc"; +MT ensembl_havana transcript 15956 16023 . - . gene_id "ENSG00000210196"; transcript_id "ENST00000387461"; gene_name "MT-TP-201"; transcript_biotype "Mt_tRNA"; transcript_source "insdc"; +MT ensembl_havana exon 577 647 . + . gene_id "ENSG00000210049"; transcript_id "ENST00000387314"; exon_id "ENSE00001544501"; exon_number "1"; +MT ensembl_havana exon 648 1601 . + . gene_id "ENSG00000211459"; transcript_id "ENST00000389680"; exon_id "ENSE00001544499"; exon_number "1"; +MT ensembl_havana exon 1602 1670 . + . gene_id "ENSG00000210077"; transcript_id "ENST00000387342"; exon_id "ENSE00001544498"; exon_number "1"; +MT ensembl_havana exon 1671 3229 . + . gene_id "ENSG00000210082"; transcript_id "ENST00000387347"; exon_id "ENSE00001544497"; exon_number "1"; +MT ensembl_havana exon 3230 3304 . + . gene_id "ENSG00000209082"; transcript_id "ENST00000386347"; exon_id "ENSE00002006242"; exon_number "1"; +MT ensembl_havana exon 3307 4262 . + . gene_id "ENSG00000198888"; transcript_id "ENST00000361390"; exon_id "ENSE00001435714"; exon_number "1"; +MT ensembl_havana exon 4263 4331 . + . gene_id "ENSG00000210100"; transcript_id "ENST00000387365"; exon_id "ENSE00001993597"; exon_number "1"; +MT ensembl_havana exon 4329 4400 . - . gene_id "ENSG00000210107"; transcript_id "ENST00000387372"; exon_id "ENSE00001544494"; exon_number "1"; +MT ensembl_havana exon 4402 4469 . + . gene_id "ENSG00000210112"; transcript_id "ENST00000387377"; exon_id "ENSE00001544493"; exon_number "1"; +MT ensembl_havana exon 4470 5511 . + . gene_id "ENSG00000198763"; transcript_id "ENST00000361453"; exon_id "ENSE00001435686"; exon_number "1"; +MT ensembl_havana exon 5512 5579 . + . gene_id "ENSG00000210117"; transcript_id "ENST00000387382"; exon_id "ENSE00001544492"; exon_number "1"; +MT ensembl_havana exon 5587 5655 . - . gene_id "ENSG00000210127"; transcript_id "ENST00000387392"; exon_id "ENSE00001544491"; exon_number "1"; +MT ensembl_havana exon 5657 5729 . - . gene_id "ENSG00000210135"; transcript_id "ENST00000387400"; exon_id "ENSE00001544490"; exon_number "1"; +MT ensembl_havana exon 5761 5826 . - . gene_id "ENSG00000210140"; transcript_id "ENST00000387405"; exon_id "ENSE00001544489"; exon_number "1"; +MT ensembl_havana exon 5826 5891 . - . gene_id "ENSG00000210144"; transcript_id "ENST00000387409"; exon_id "ENSE00001544488"; exon_number "1"; +MT ensembl_havana exon 5904 7445 . + . gene_id "ENSG00000198804"; transcript_id "ENST00000361624"; exon_id "ENSE00001435647"; exon_number "1"; +MT ensembl_havana exon 7446 7514 . - . gene_id "ENSG00000210151"; transcript_id "ENST00000387416"; exon_id "ENSE00001544487"; exon_number "1"; +MT ensembl_havana exon 7518 7585 . + . gene_id "ENSG00000210154"; transcript_id "ENST00000387419"; exon_id "ENSE00001544486"; exon_number "1"; +MT ensembl_havana exon 7586 8269 . + . gene_id "ENSG00000198712"; transcript_id "ENST00000361739"; exon_id "ENSE00001435613"; exon_number "1"; +MT ensembl_havana exon 8295 8364 . + . gene_id "ENSG00000210156"; transcript_id "ENST00000387421"; exon_id "ENSE00001544484"; exon_number "1"; +MT ensembl_havana exon 8366 8572 . + . gene_id "ENSG00000228253"; transcript_id "ENST00000361851"; exon_id "ENSE00001435286"; exon_number "1"; +MT ensembl_havana exon 8527 9207 . + . gene_id "ENSG00000198899"; transcript_id "ENST00000361899"; exon_id "ENSE00001727012"; exon_number "1"; +MT ensembl_havana exon 9207 9990 . + . gene_id "ENSG00000198938"; transcript_id "ENST00000362079"; exon_id "ENSE00001608952"; exon_number "1"; +MT ensembl_havana exon 9991 10058 . + . gene_id "ENSG00000210164"; transcript_id "ENST00000387429"; exon_id "ENSE00001544483"; exon_number "1"; +MT ensembl_havana exon 10059 10404 . + . gene_id "ENSG00000198840"; transcript_id "ENST00000361227"; exon_id "ENSE00001435444"; exon_number "1"; +MT ensembl_havana exon 10405 10469 . + . gene_id "ENSG00000210174"; transcript_id "ENST00000387439"; exon_id "ENSE00001544482"; exon_number "1"; +MT ensembl_havana exon 10470 10766 . + . gene_id "ENSG00000212907"; transcript_id "ENST00000361335"; exon_id "ENSE00001596097"; exon_number "1"; +MT ensembl_havana exon 10760 12137 . + . gene_id "ENSG00000198886"; transcript_id "ENST00000361381"; exon_id "ENSE00001666004"; exon_number "1"; +MT ensembl_havana exon 12138 12206 . + . gene_id "ENSG00000210176"; transcript_id "ENST00000387441"; exon_id "ENSE00001544480"; exon_number "1"; +MT ensembl_havana exon 12207 12265 . + . gene_id "ENSG00000210184"; transcript_id "ENST00000387449"; exon_id "ENSE00001544479"; exon_number "1"; +MT ensembl_havana exon 12266 12336 . + . gene_id "ENSG00000210191"; transcript_id "ENST00000387456"; exon_id "ENSE00001544478"; exon_number "1"; +MT ensembl_havana exon 12337 14148 . + . gene_id "ENSG00000198786"; transcript_id "ENST00000361567"; exon_id "ENSE00001435330"; exon_number "1"; +MT ensembl_havana exon 14149 14673 . - . gene_id "ENSG00000198695"; transcript_id "ENST00000361681"; exon_id "ENSE00001434974"; exon_number "1"; +MT ensembl_havana exon 14674 14742 . - . gene_id "ENSG00000210194"; transcript_id "ENST00000387459"; exon_id "ENSE00001544476"; exon_number "1"; +MT ensembl_havana exon 14747 15887 . + . gene_id "ENSG00000198727"; transcript_id "ENST00000361789"; exon_id "ENSE00001436074"; exon_number "1"; +MT ensembl_havana exon 15888 15953 . + . gene_id "ENSG00000210195"; transcript_id "ENST00000387460"; exon_id "ENSE00001544475"; exon_number "1"; +MT ensembl_havana exon 15956 16023 . - . gene_id "ENSG00000210196"; transcript_id "ENST00000387461"; exon_id "ENSE00001544473"; exon_number "1"; +MT ensembl_havana CDS 3307 4262 . + 0 gene_id "ENSG00000198888"; transcript_id "ENST00000361390"; +MT ensembl_havana CDS 4470 5511 . + 0 gene_id "ENSG00000198763"; transcript_id "ENST00000361453"; +MT ensembl_havana CDS 5904 7445 . + 0 gene_id "ENSG00000198804"; transcript_id "ENST00000361624"; +MT ensembl_havana CDS 7586 8269 . + 0 gene_id "ENSG00000198712"; transcript_id "ENST00000361739"; +MT ensembl_havana CDS 8366 8572 . + 0 gene_id "ENSG00000228253"; transcript_id "ENST00000361851"; +MT ensembl_havana CDS 8527 9207 . + 0 gene_id "ENSG00000198899"; transcript_id "ENST00000361899"; +MT ensembl_havana CDS 9207 9990 . + 0 gene_id "ENSG00000198938"; transcript_id "ENST00000362079"; +MT ensembl_havana CDS 10059 10404 . + 0 gene_id "ENSG00000198840"; transcript_id "ENST00000361227"; +MT ensembl_havana CDS 10470 10766 . + 0 gene_id "ENSG00000212907"; transcript_id "ENST00000361335"; +MT ensembl_havana CDS 10760 12137 . + 0 gene_id "ENSG00000198886"; transcript_id "ENST00000361381"; +MT ensembl_havana CDS 12337 14148 . + 0 gene_id "ENSG00000198786"; transcript_id "ENST00000361567"; +MT ensembl_havana CDS 14149 14673 . - 0 gene_id "ENSG00000198695"; transcript_id "ENST00000361681"; +MT ensembl_havana CDS 14747 15887 . + 0 gene_id "ENSG00000198727"; transcript_id "ENST00000361789"; diff --git a/testdata/test_transcriptome/samplesheet.transcriptome_test.csv b/testdata/test_transcriptome/samplesheet.transcriptome_test.csv new file mode 100644 index 000000000..f5ec4e0ed --- /dev/null +++ b/testdata/test_transcriptome/samplesheet.transcriptome_test.csv @@ -0,0 +1,3 @@ +sample,sample_id,fastq_1,fastq_2,method,principle,sample_group,condition,replicate,organism,pH,adapter_3p,adapter_5p,umi_pattern +HEK293T_untreated_r1,GSM4333255,https://raw.githubusercontent.com/nf-core/test-datasets/refs/heads/rnastructurome/testdata/HEK293T_untreated_r1.fastq.gz,,SHAPE,RT-stop,HEK293T,untreated,1,Homo sapiens,7.5,,, +HEK293T_treated_r1,GSM4333256,https://raw.githubusercontent.com/nf-core/test-datasets/refs/heads/rnastructurome/testdata/HEK293T_treated_r1.fastq.gz,,SHAPE,RT-stop,HEK293T,treated,1,Homo sapiens,7.5,,,