|
14 | 14 | ---------- |
15 | 15 |
|
16 | 16 | ## Final take-aways from the workshop |
| 17 | +- Spend the time to plan, consult, practice, (and money) to generate high quality data sets |
| 18 | + |
| 19 | +- If you are going to do a lot of Bioinformatics, you should get really good at the command-line (Bash), otherwise, pre-processing will be slow & painful |
| 20 | + |
| 21 | +- Identify, understand, and check key QC metrics to ensure the quality of your results |
| 22 | + |
17 | 23 | - Spend the time to plan your experiment well (enough replicates, appropriate sequencing depth, etc.). No analysis can rescue a bas dataset. |
18 | 24 |
|
19 | 25 | - If you will perform differential expression analysis regularly, you should build your experience in R, as well as you knowledge of fundamental statistical concepts |
|
29 | 35 |
|
30 | 36 | - Edit/annotate the code, run sub-sections, and read the `help` pages for important functions |
31 | 37 |
|
| 38 | +- Practice with the complete dataset (all chromosomes), that is available to you for approx. 1 month on discovery in /scratch/rnaseq1/data/. This will give you experience running data from the entire genome, and an appreciation for the computational resources and required time to complete these tasks. |
| 39 | + |
32 | 40 | - Read the methods sections of published papers that perform differential expression analyses. This will help you to understand how many of the concepts we have discussed are applied and reported in practice |
33 | 41 |
|
34 | 42 | - Read reviews like [this one](https://pubmed.ncbi.nlm.nih.gov/31341269/) from Stark *et al*, 2019, *Nat. Rev. Genetics*, `RNA Sequencing: The Teenage Years`. |
|
37 | 45 |
|
38 | 46 | ---------- |
39 | 47 |
|
| 48 | +## High performance computing (HPC) |
| 49 | + |
| 50 | + |
| 51 | +During the workshop, we all logged onto Discovery and ran our workflow interactively. Generally, this type of 'interactive' work is not encouraged on Discovery, and is better performed using other servers such as Andes & polaris (see article [here](https://rc.dartmouth.edu/index.php/hrf_faq/how-do-i-run-interactive-jobs/) on this topic from research computing). |
| 52 | + |
| 53 | +However, working interactively with genomics data can be quite slow since many operations will need to be run in sequence across multiple samples. As an alternative, you can use the scheduler system on discovery, that controls how jobs are submitted and managed on the Discovery cluster. |
| 54 | + |
| 55 | +Using the scheduler will allow you to make use of specific resources (such as high memory nodes etc.) and streamline your workflows more efficiently. Dartmouth just transitioned to using [Slurm](https://services.dartmouth.edu/TDClient/1806/Portal/KB/ArticleDet?ID=132625). |
| 56 | + |
| 57 | +We encourage you to get comfortable using the scheduler and submitting jobs using an HPC system. [Research Computing](https://rc.dartmouth.edu/) has a lot of great material regarding using Discovery on their website. |
| 58 | + |
| 59 | +---------- |
| 60 | + |
| 61 | +## Suggested reading: |
| 62 | + |
| 63 | +Reading manuscripts that use RNA-seq, or reviews specifically focused on RNA-seq are excellent ways to further consolidate your learning. |
| 64 | + |
| 65 | +In addition, reading the original manuscripts behind common tools will improve your understanding of how that tool works, and allow you to leverage more complicated options and implementations of that tool when required. |
| 66 | + |
| 67 | +Below we provide some suggested reading to help get you on your way: |
| 68 | + |
| 69 | +#### Review articles |
| 70 | +- [Stark *et al*, 2019, *Nat. Rev. Genetics*.](https://pubmed.ncbi.nlm.nih.gov/31341269/) `RNA Sequencing: The Teenage Years` |
| 71 | +- [Conesa *et al*, 2016, *Genome Biology*.](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0881-8) `A survey of best practices for RNA-seq data analysis` |
| 72 | +- [Wang, *et al*, 2009, *Nat. Rev. Genetics*.](https://www.nature.com/articles/nrg2484) `RNA-Seq: a revolutionary tool for transcriptomics` |
| 73 | +- [Cresko Lab, Univeristy of Oregon.](https://rnaseq.uoregon.edu/) `RNA-seqlopedia |
| 74 | +: provides an overview of RNA-seq and of the choices necessary to carry out a successful RNA-seq experiment.` |
| 75 | + |
| 76 | +#### Original manuscripts: Popular RNA-seq tools |
| 77 | +- [Cutadapt:](http://journal.embnet.org/index.php/embnetjournal/article/view/200) `Cutadapt Removes Adapter Sequences From High-Throughput Sequencing Reads`. |
| 78 | +- [STAR:](https://academic.oup.com/bioinformatics/article/29/1/15/272537) `STAR: ultrafast universal RNA-seq aligner` |
| 79 | +- [HISAT2:](https://www.nature.com/articles/s41587-019-0201-4) `Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype` |
| 80 | +- [Bowtie2:](https://www.nature.com/articles/nmeth.1923)`Fast gapped-read alignment with Bowtie 2` |
| 81 | +- [HTSeq-count:](https://academic.oup.com/bioinformatics/article/31/2/166/2366196)`HTSeq—a Python framework to work with high-throughput sequencing data ` |
| 82 | +- [DESeq2:](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-014-0550-8)`Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2` |
| 83 | + |
| 84 | +---------- |
| 85 | + |
40 | 86 | ## Post DE analysis |
41 | 87 |
|
42 | 88 | After completing a DE analysis, we are usually left with a handful of genes that we wish to extract further meaning from. Depending on our hypothesis and what data is available to us, there are several ways to do this. |
@@ -81,8 +127,7 @@ This workshop will be offered again, in addition to our other bioinformatics wor |
81 | 127 |
|
82 | 128 | --------- |
83 | 129 |
|
84 | | -## Bioinformatics office hours & consultations |
| 130 | +## Bioinformatics consultations |
85 | 131 |
|
86 | | -Please reach out to us with questions related to content from this workshop, or for analysis consultations. We also host **bioinformatics office hours** on **Fridays 1-2pm** for general questions and inquiries (currently on Zoom: https://dartmouth.zoom.us/s/96998379866, password: *bioinfo*) |
| 132 | +Please reach out to us with questions related to content from this workshop, or for analysis consultations. We also host office hours and consultations by request. |
87 | 133 |
|
88 | | -### Now.... Discussion/question time! |
|
0 commit comments