This repository was archived by the owner on Jun 21, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 8
Expand file tree
/
Copy pathbrain.Rmd
More file actions
155 lines (114 loc) · 5.33 KB
/
brain.Rmd
File metadata and controls
155 lines (114 loc) · 5.33 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
---
bibliography: ref.bib
---
# Cross-annotating mouse brains
```{r, echo=FALSE, results="asis"}
library(rebook)
chapterPreamble(cache=TRUE)
```
## Loading the data
We load the classic @zeisel2015brain dataset as our reference.
Here, we'll rely on the fact that the authors have already performed quality control.
```{r loading-zeisel}
library(scRNAseq)
sceZ <- ZeiselBrainData()
```
We compute log-expression values for use in marker detection inside `SingleR()`.
```{r normalize-zeisel}
library(scater)
sceZ <- logNormCounts(sceZ)
```
We examine the distribution of labels in this reference.
```{r}
table(sceZ$level2class)
```
We load the @tasic2016adult dataset as our test.
While not strictly necessary, we remove putative low-quality cells to simplify later interpretation.
```{r loading-tasic}
sceT <- TasicBrainData()
sceT <- addPerCellQC(sceT, subsets=list(mito=grep("^mt_", rownames(sceT))))
qc <- quickPerCellQC(colData(sceT),
percent_subsets=c("subsets_mito_percent", "altexps_ERCC_percent"))
sceT <- sceT[,which(!qc$discard)]
```
The Tasic dataset was generated using read-based technologies so we need to adjust for the transcript length.
```{r normalize-tasic}
library(AnnotationHub)
mm.db <- AnnotationHub()[["AH73905"]]
mm.exons <- exonsBy(mm.db, by="gene")
mm.exons <- reduce(mm.exons)
mm.len <- sum(width(mm.exons))
mm.symb <- mapIds(mm.db, keys=names(mm.len), keytype="GENEID", column="SYMBOL")
names(mm.len) <- mm.symb
library(scater)
keep <- intersect(names(mm.len), rownames(sceT))
sceT <- sceT[keep,]
assay(sceT, "TPM") <- calculateTPM(sceT, lengths=mm.len[keep])
```
## Applying the annotation
We apply `SingleR()` with Wilcoxon rank sum test-based marker detection to annotate the Tasic dataset with the Zeisel labels.
```{r annotation}
library(SingleR)
pred.tasic <- SingleR(test=sceT, ref=sceZ, labels=sceZ$level2class,
assay.type.test="TPM", de.method="wilcox")
```
We examine the distribution of predicted labels:
```{r}
table(pred.tasic$labels)
```
We can also examine the number of discarded cells for each label:
```{r}
table(Label=pred.tasic$labels,
Lost=is.na(pred.tasic$pruned.labels))
```
## Diagnostics
We visualize the assignment scores for each label in Figure \@ref(fig:unref-brain-score-heatmap).
```{r unref-brain-score-heatmap, fig.width=10, fig.height=10, fig.cap="Heatmap of the (normalized) assignment scores for each cell (column) in the Tasic test dataset with respect to each label (row) in the Zeisel reference dataset. The final assignment for each cell is shown in the annotation bar at the top."}
plotScoreHeatmap(pred.tasic)
```
The delta for each cell is visualized in Figure \@ref(fig:unref-brain-delta-dist).
```{r unref-brain-delta-dist, fig.width=10, fig.height=10, fig.cap="Distributions of the deltas for each cell in the Tasic dataset assigned to each label in the Zeisel dataset. Each cell is represented by a point; low-quality assignments that were pruned out are colored in orange."}
plotDeltaDistribution(pred.tasic)
```
Finally, we visualize the heatmaps of the marker genes for the most frequent label in Figure \@ref(fig:unref-brain-marker-heat).
We could show these for all labels but I wouldn't want to bore you with a parade of large heatmaps.
```{r unref-brain-marker-heat, fig.width=10, fig.height=15, fig.cap="Heatmap of log-expression values in the Tasic dataset for all marker genes upregulated in the most frequent label from the Zeisel reference dataset."}
library(scater)
collected <- list()
all.markers <- metadata(pred.tasic)$de.genes
sceT <- logNormCounts(sceT)
top.label <- names(sort(table(pred.tasic$labels), decreasing=TRUE))[1]
per.label <- sumCountsAcrossCells(logcounts(sceT),
ids=pred.tasic$labels, average=TRUE)
per.label <- assay(per.label)[unique(unlist(all.markers[[top.label]])),]
pheatmap::pheatmap(per.label, main=top.label)
```
## Comparison to clusters
For comparison, we will perform a quick unsupervised analysis of the Grun dataset.
We model the variances using the spike-in data and we perform graph-based clustering.
```{r}
library(scran)
decT <- modelGeneVarWithSpikes(sceT, "ERCC")
set.seed(1000100)
sceT <- denoisePCA(sceT, decT, subset.row=getTopHVGs(decT, n=2500))
library(bluster)
sceT$cluster <- clusterRows(reducedDim(sceT, "PCA"), NNGraphParam())
```
We do not observe a clean 1:1 mapping between clusters and labels in Figure \@ref(fig:unref-brain-label-clusters),
probably because many of the labels represent closely related cell types that are difficult to distinguish.
```{r unref-brain-label-clusters, fig.cap="Heatmap of the log-transformed number of cells in each combination of label (column) and cluster (row) in the Tasic dataset."}
tab <- table(cluster=sceT$cluster, label=pred.tasic$labels)
pheatmap::pheatmap(log10(tab+10))
```
We proceed to the most important part of the analysis.
Yes, that's right, the $t$-SNE plot (Figure \@ref(fig:unref-brain-label-tsne)).
```{r unref-brain-label-tsne, fig.cap="$t$-SNE plot of the Tasic dataset, where each point is a cell and is colored by the assigned cluster. Reference labels from the Zeisel dataset are also placed on the median coordinate across all cells assigned with that label."}
set.seed(101010100)
sceT <- runTSNE(sceT, dimred="PCA")
plotTSNE(sceT, colour_by="cluster", text_colour="red",
text_by=I(pred.tasic$labels))
```
## Session information {-}
```{r, echo=FALSE, results='asis'}
prettySessionInfo()
```