Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
.DS_Store
38 changes: 38 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,44 @@ git remote set-branches --add origin [remote-branch]
git fetch
```

## rnastructurome pipeline test data

Test data for the [nf-core/rnastructurome](https://github.com/nf-core/rnastructurome) pipeline lives on the `rnastructurome` branch under `testdata/`.

There are three independent test datasets, each exercising a different route through the pipeline:

- **`test_genome`**: tests the genome-alignment route. Run with `-profile test,docker`.
- **`test_transcriptome`**: tests the transcriptome-alignment route, including the `structextract` module. Run with `-profile test_transcriptome,docker`.
- **`test_prokaryote`**: tests the `jackknife` and `eval` modules using prokaryote 16S rRNA data and a known reference structure for that rRNA. Run with `-profile test_prokaryote,conda`.

**Human (HEK293) data** — the `test_genome` and `test_transcriptome` datasets are both derived from a real HEK293 dataset (accessions listed below). Reads aligning to the mitochondrial gene RNR1 (`ENST00000389680.2`) were extracted from the full dataset to create this small test data.
- For the genome route, the full GRCh38 FASTA and GTF were each subsetted to just the MT chromosome.
- For the transcriptome route, the RNR1 transcript was extracted from the full GRCh38 transcriptome FASTA, and the same MT GTF was reused.

**Prokaryote (E. coli) data** — reads aligning to 16S rRNA were extracted from a real *E. coli* dataset (accession listed below) to create the tiny prokaryote dataset. The reference structure for 16S rRNA was supplied by Danny Incarnato and is used by both the `jackknife` and `eval` modules. The FASTA and GTF files were created from the 16S rRNA reference structure.

### Directory structure

```
testdata/
├── HEK293T_untreated_r1.fastq.gz # shared SHAPE-seq reads (untreated, GSM4333255)
├── HEK293T_treated_r1.fastq.gz # shared SHAPE-seq reads (treated, GSM4333256)
├── test_genome/
│ ├── Homo_sapiens.GRCh38.MT.fa # human mitochondrial chromosome
│ ├── Homo_sapiens.GRCh38.MT.gtf # MT genome annotation
│ └── samplesheet.genome_test.csv
├── test_transcriptome/
│ ├── Homo_sapiens.GRCh38.ENST00000389680.fa # single-transcript FASTA (MT-RNR1)
│ ├── Homo_sapiens.GRCh38.MT.gtf # MT genome annotation
│ └── samplesheet.transcriptome_test.csv
└── test_prokaryote/
├── 16S_rRNA.fa # E. coli 16S rRNA FASTA
├── 16S_rRNA.gtf # matching single-transcript GTF
├── 16S_rRNA.reference.db # dot-bracket reference structure for rf-eval/rf-jackknife
├── E_coli_DH5a_treated.16S_rRNA.fastq.gz # treated, GSM7885842
└── samplesheet.prokaryote_test.csv
```

## Support

For further information or help, don't hesitate to get in touch on our [Slack organisation](https://nf-co.re/join/slack) (a tool for instant messaging).
Expand Down
Binary file added testdata/HEK293T_treated_r1.fastq.gz
Binary file not shown.
Binary file added testdata/HEK293T_untreated_r1.fastq.gz
Binary file not shown.
278 changes: 278 additions & 0 deletions testdata/test_genome/Homo_sapiens.GRCh38.MT.fa

Large diffs are not rendered by default.

127 changes: 127 additions & 0 deletions testdata/test_genome/Homo_sapiens.GRCh38.MT.gtf

Large diffs are not rendered by default.

3 changes: 3 additions & 0 deletions testdata/test_genome/samplesheet.genome_test.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
sample,sample_id,fastq_1,fastq_2,method,principle,sample_group,condition,replicate,organism,pH,adapter_3p,adapter_5p,umi_pattern
HEK293T_untreated_r1,GSM4333255,https://raw.githubusercontent.com/nf-core/test-datasets/refs/heads/rnastructurome/testdata/HEK293T_untreated_r1.fastq.gz,,SHAPE,RT-stop,HEK293T,untreated,1,Homo sapiens,7.5,,,
HEK293T_treated_r1,GSM4333256,https://raw.githubusercontent.com/nf-core/test-datasets/refs/heads/rnastructurome/testdata/HEK293T_treated_r1.fastq.gz,,SHAPE,RT-stop,HEK293T,treated,1,Homo sapiens,7.5,,,
2 changes: 2 additions & 0 deletions testdata/test_prokaryote/16S_rRNA.fa
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
>16S_rRNA Escherichia coli 16S ribosomal RNA
AAATTGAAGAGTTTGATCATGGCTCAGATTGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAACGGTAACAGGAAGAAGCTTGCTTCTTTGCTGACGAGTGGCGGACGGGTGAGTAATGTCTGGGAAACTGCCTGATGGAGGGGGATAACTACTGGAAACGGTAGCTAATACCGCATAACGTCGCAAGACCAAAGAGGGGGACCTTCGGGCCTCTTGCCATCGGATGTGCCCAGATGGGATTAGCTAGTAGGTGGGGTAACGGCTCACCTAGGCGACGATCCCTAGCTGGTCTGAGAGGATGACCAGCCACACTGGAACTGAGACACGGTCCAGACTCCTACGGGAGGCAGCAGTGGGGAATATTGCACAATGGGCGCAAGCCTGATGCAGCCATGCCGCGTGTATGAAGAAGGCCTTCGGGTTGTAAAGTACTTTCAGCGGGGAGGAAGGGAGTAAAGTTAATACCTTTGCTCATTGACGTTACCCGCAGAAGAAGCACCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCACGCAGGCGGTTTGTTAAGTCAGATGTGAAATCCCCGGGCTCAACCTGGGAACTGCATCTGATACTGGCAAGCTTGAGTCTCGTAGAGGGGGGTAGAATTCCAGGTGTAGCGGTGAAATGCGTAGAGATCTGGAGGAATACCGGTGGCGAAGGCGGCCCCCTGGACGAAGACTGACGCTCAGGTGCGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGTCGACTTGGAGGTTGTGCCCTTGAGGCGTGGCTTCCGGAGCTAACGCGTTAAGTCGACCGCCTGGGGAGTACGGCCGCAAGGTTAAAACTCAAATGAATTGACGGGGGCCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGATGCAACGCGAAGAACCTTACCTGGTCTTGACATCCACGGAAGTTTTCAGAGATGAGAATGTGCCTTCGGGAACCGTGAGACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTTGTGAAATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTATCCTTTGTTGCCAGCGGTCCGGCCGGGAACTCAAAGGAGACTGCCAGTGATAAACTGGAGGAAGGTGGGGATGACGTCAAGTCATCATGGCCCTTACGACCAGGGCTACACACGTGCTACAATGGCGCATACAAAGAGAAGCGACCTCGCGAGAGCAAGCGGACCTCATAAAGTGCGTCGTAGTCCGGATTGGAGTCTGCAACTCGACTCCATGAAGTCGGAATCGCTAGTAATCGTGGATCAGAATGCCACGGTGAATACGTTCCCGGGCCTTGTACACACCGCCCGTCACACCATGGGAGTGGGTTGCAAAAGAAGTAGGTAGCTTAACCTTCGGGAGGGCGCTTACCACTTTGTGATTCATGACTGGGGTGAAGTCGTAACAAGGTAACCGTAGGGGAACCTGCGGTTGGATCACCTCCTTA
3 changes: 3 additions & 0 deletions testdata/test_prokaryote/16S_rRNA.gtf
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
16S_rRNA testdata gene 1 1542 . + . gene_id "16S_rRNA"; gene_name "16S_rRNA"; gene_biotype "rRNA";
16S_rRNA testdata transcript 1 1542 . + . gene_id "16S_rRNA"; transcript_id "16S_rRNA"; gene_name "16S_rRNA"; transcript_biotype "rRNA";
16S_rRNA testdata exon 1 1542 . + . gene_id "16S_rRNA"; transcript_id "16S_rRNA"; exon_id "16S_rRNA.exon1"; exon_number "1";
3 changes: 3 additions & 0 deletions testdata/test_prokaryote/16S_rRNA.reference.db
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
>16S_rRNA
AAATTGAAGAGTTTGATCATGGCTCAGATTGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAACGGTAACAGGAAGAAGCTTGCTTCTTTGCTGACGAGTGGCGGACGGGTGAGTAATGTCTGGGAAACTGCCTGATGGAGGGGGATAACTACTGGAAACGGTAGCTAATACCGCATAACGTCGCAAGACCAAAGAGGGGGACCTTCGGGCCTCTTGCCATCGGATGTGCCCAGATGGGATTAGCTAGTAGGTGGGGTAACGGCTCACCTAGGCGACGATCCCTAGCTGGTCTGAGAGGATGACCAGCCACACTGGAACTGAGACACGGTCCAGACTCCTACGGGAGGCAGCAGTGGGGAATATTGCACAATGGGCGCAAGCCTGATGCAGCCATGCCGCGTGTATGAAGAAGGCCTTCGGGTTGTAAAGTACTTTCAGCGGGGAGGAAGGGAGTAAAGTTAATACCTTTGCTCATTGACGTTACCCGCAGAAGAAGCACCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCACGCAGGCGGTTTGTTAAGTCAGATGTGAAATCCCCGGGCTCAACCTGGGAACTGCATCTGATACTGGCAAGCTTGAGTCTCGTAGAGGGGGGTAGAATTCCAGGTGTAGCGGTGAAATGCGTAGAGATCTGGAGGAATACCGGTGGCGAAGGCGGCCCCCTGGACGAAGACTGACGCTCAGGTGCGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGTCGACTTGGAGGTTGTGCCCTTGAGGCGTGGCTTCCGGAGCTAACGCGTTAAGTCGACCGCCTGGGGAGTACGGCCGCAAGGTTAAAACTCAAATGAATTGACGGGGGCCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGATGCAACGCGAAGAACCTTACCTGGTCTTGACATCCACGGAAGTTTTCAGAGATGAGAATGTGCCTTCGGGAACCGTGAGACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTTGTGAAATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTATCCTTTGTTGCCAGCGGTCCGGCCGGGAACTCAAAGGAGACTGCCAGTGATAAACTGGAGGAAGGTGGGGATGACGTCAAGTCATCATGGCCCTTACGACCAGGGCTACACACGTGCTACAATGGCGCATACAAAGAGAAGCGACCTCGCGAGAGCAAGCGGACCTCATAAAGTGCGTCGTAGTCCGGATTGGAGTCTGCAACTCGACTCCATGAAGTCGGAATCGCTAGTAATCGTGGATCAGAATGCCACGGTGAATACGTTCCCGGGCCTTGTACACACCGCCCGTCACACCATGGGAGTGGGTTGCAAAAGAAGTAGGTAGCTTAACCTTCGGGAGGGCGCTTACCACTTTGTGATTCATGACTGGGGTGAAGTCGTAACAAGGTAACCGTAGGGGAACCTGCGGTTGGATCACCTCCTTA
........(((((...[[[.))))).((((.((((((.(((((((((....(((.(((..(((..((.((..((((((((((....))))))).)))))..)))))......(((......(((((((..((...(((((((.(.((.....((((((....))))))......)).).....(((....)))....((((((..((....)))))))).)))))))..)).)))))))(((....(((..((((((((.......)))))))))))......)))..((((((((....))))...))))))).(((((............))))).((((....))))...)))))).).....(.(((...(((((....))))).)))).)).))))))..((((......((((....)))).....)))).[.(.(((((...(....((((((((.......)))))))).....)....)))))...])..((((([[[...(((((.....((.]]])).......)))))))))).))))))))))..........((([[...(.((((...(((.(((((((.(((((((((((.....((((((.....))))))...)))))))))..)))))))))...((((((((...((((((((...((((((((...(((......)))......))))))))...).......((....)).)))))))..)))).))))...)))...))))....((((((...((...((((.........))))...))))))))..........((((((..((((((((((((.....))))))))))))...((..]])).....)))))))))).(((......((((....))))....)))...]]]..(((((.(((((((.((..((((((.((((((((((....((((........))))........(((((((......(((((((..(((((((....))))))).(.((....)).)))))).))..((.((((..((((((.((...(((((((((....)))..((((......))))..)))))).....((((.(((((((...((..(((.....)))))....)))))))..((.(((((.....))))).)).....))))....)).).)))...))))))))....)))))))...)).)))))))).)...(((((((.....(((..((...(((....)))...))....))).....)))))))......(....(((((((........)))))))....).....))))).....(((((((.........)))))))......))...)))))))))).))..(.(..((.(.((((.(((..((((((((((((....((((((.((((..((....)).))))))))))...))))))))))))..))).))))..).))...)..).((((((((((....)))))))))).............
Binary file not shown.
2 changes: 2 additions & 0 deletions testdata/test_prokaryote/samplesheet.prokaryote_test.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
sample,sample_id,fastq_1,fastq_2,method,principle,sample_group,condition,replicate,organism,pH,adapter_3p,adapter_5p,umi_pattern
E_coli_DH5a_treated,GSM7885844,https://raw.githubusercontent.com/nf-core/test-datasets/refs/heads/rnastructurome/testdata/test_prokaryote/E_coli_DH5a_treated.16S_rRNA.fastq.gz,,DMS,MaP,E_coli_DH5a,treated,1,Escherichia coli,,,,
17 changes: 17 additions & 0 deletions testdata/test_transcriptome/Homo_sapiens.GRCh38.ENST00000389680.fa
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
>ENST00000389680.2
AATAGGTTTGGTCCTAGCCTTTCTATTAGCTCTTAGTAAGATTACACATGCAAGCATCCC
CGTTCCAGTGAGTTCACCCTCTAAATCACCACGATCAAAAGGAACAAGCATCAAGCACGC
AGCAATGCAGCTCAAAACGCTTAGCCTAGCCACACCCCCACGGGAAACAGCAGTGATTAA
CCTTTAGCAATAAACGAAAGTTTAACTAAGCTATACTAACCCCAGGGTTGGTCAATTTCG
TGCCAGCCACCGCGGTCACACGATTAACCCAAGTCAATAGAAGCCGGCGTAAAGAGTGTT
TTAGATCACCCCCTCCCCAATAAAGCTAAAACTCACCTGAGTTGTAAAAAACTCCAGTTG
ACACAAAATAGACTACGAAAGTGGCTTTAACATATCTGAACACACAATAGCTAAGACCCA
AACTGGGATTAGATACCCCACTATGCTTAGCCCTAAACCTCAACAGTTAAATCAACAAAA
CTGCTCGCCAGAACACTACGAGCCACAGCTTAAAACTCAAAGGACCTGGCGGTGCTTCAT
ATCCCTCTAGAGGAGCCTGTTCTGTAATCGATAAACCCCGATCAACCTCACCACCTCTTG
CTCAGCCTATATACCGCCATCTTCAGCAAACCCTGATGAAGGCTACAAAGTAAGCGCAAG
TACCCACGTAAAGACGTTAGGTCAAGGTGTAGCCCATGAGGTGGCAAGAAATGGGCTACA
TTTTCTACCCCAGAAAACTACGATAGCCCTTATGAAACTTAAGGGTCGAAGGTGGATTTA
GCAGTAAACTAAGAGTAGAGTGCTTAGTTGAACAGGGCCCTGAAGCGCGTACACACCGCC
CGTCACCCTCCTCAAGTATACTTCAAAGGACATTTAACTAAAACCCCTACGCATTTATAT
AGAGGAGACAAGTCGTAACATGGTAAGTGTACTGGAAAGTGCACTTGGACGAAC
Loading