You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: 04-04-Pipeline-fastqc.Rmd
+8-4Lines changed: 8 additions & 4 deletions
Original file line number
Diff line number
Diff line change
@@ -76,8 +76,11 @@ What happens to the adapter content and sequence length distribution ?
76
76
77
77
#### {-}
78
78
79
-
Here's our [example RNAseq run](https://laxy.io/#/job/1qUtszdMrQnXC4tawunu92/?access_token=7ad22813-fa68-4848-b93e-de30d53a35b3) without a trimming step. For this dataset,
80
-
important QC metrics like the "% Aligned" to the reference genome are very close, but not identical between trimmed and untrimmed reads.
79
+
Here's a the MultiQC report for the example dataset with
80
+
<ahref="files/multiqc/SRP062287/multiqc_report.html"target="_blank">trimming</a> and
81
+
<ahref="files/multiqc/SRP062287_untrimmed/multiqc_report.html"target="_blank">excluding the trimming step</a>
82
+
(and <ahref="https://laxy.io/#/job/1qUtszdMrQnXC4tawunu92/?access_token=7ad22813-fa68-4848-b93e-de30d53a35b3"target="_blank">the full untrimmed run on laxy.io</a>).
83
+
For this dataset, important QC metrics like the "% Aligned" to the reference genome are very close (within ~1%), but not identical between trimmed and untrimmed reads.
81
84
82
85
> Trimming _is_ beneficial if you are _de novo_ assembling RNA-seq reads (eg, for a non-model organism), or using RNA-seq for variant calling.
83
86
@@ -86,7 +89,7 @@ important QC metrics like the "% Aligned" to the reference genome are very close
86
89
87
90
FastQC reports "Per base sequence content".
88
91
89
-
{width="100%"}
92
+
{width="100%" fig-alt="FastQC 'Per base sequence content' plot showing nucleotide composition bias in the first ~12 bases"}
90
93
91
94
It is quite common to see some sequence bias at the 5' end of reads in RNA-seq libraries.
92
95
@@ -111,7 +114,8 @@ Ideally the %GC distribution of reads will match the theorerical distribution -
111
114
112
115
FastQC shows plots summarizing exact duplicate reads, and overrepresented sequences.
(or <ahref="https://laxy.io/#/job/20nDhlhpb8xv53x3R2CACg/?access_token=29ddd348-0a55-4596-b6f9-060e5186236f"target="_blank">on laxy.io</a>) of attempting to use
109
110
the mouse genome with our human cell line example dataset.
- M3 / Monarch HPC clusters (eg using [https://nf-co.re/rnaseq](nf-co.re/rnaseq) )
12
+
- M3 HPC cluster (eg using [https://nf-co.re/rnaseq](nf-co.re/rnaseq) )
13
13
- NeCTAR virtual machine (~32 - 64Gb RAM)
14
14
- Your laptop .. _while not recommended, you may be able to do this for smaller genomes (bacteria, yeast), and larger genomes if using the Salmon pseudo-mapping protocol_
0 commit comments