Image-Processing-Workflow/6 Analysis.Rmd at main · CVR-MucosalImmunology/Image-Processing-Workflow · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
---
title: "R IMC Pipeline"
output: html_document
---

# R IMC Pipeline

**Authors:** oscardong4\@gmail.com and heeva.baharlou\@gmail.com (20/05/2024)

## 0. Setting Up

#### a) Set Your `analysis` Directory

Change the `root.dir` variable to your `analysis` folder directory, then run the chunk below.

```{r setup, include=FALSE}
source("https://raw.githubusercontent.com/CVR-MucosalImmunology/IMC/main/.assets/functions.R")
if (!require("knitr", quietly = TRUE)) {install.packages("knitr")}

# Set your directory here (do NOT touch anything ABOVE this line)
knitr::opts_knit$set(root.dir = "analysis")
```

#### b) Install Packages

If you do **not** have the necessary packages installed yet, run the chunk below. Otherwise, there is **no need** to run this chunk again.

```{r}
# Install normal packages
norm_pack = c("tidyverse", "BiocManager", "viridis", "RColorBrewer", "cowplot", "gridExtra", "tiff", "kohonen", "ijtiff")
for (p in norm_pack) {
  if (!require(p, quietly = TRUE)) {
    install.packages(p)
  }
}

# Install Bioconductor packages
bioc_pack = c("SpatialExperiment", "dittoSeq", "flowCore", "scater", "BiocSingular", "bluster", "ConsensusClusterPlus", "CATALYST")
for (p in bioc_pack) {
  if (!require(p, quietly = TRUE)) {
    BiocManager::install(p)
  }
}
```

#### c) Load in Required Packages

Make sure to run the chunk before **every** analysis session - this loads in the required packages.

```{r}
# Import required packages
library(tidyverse)
library(SpatialExperiment)
library(dittoSeq)
library(viridis)
library(RColorBrewer)
library(flowCore)
library(scater)
library(cowplot)
library(BiocSingular)
library(bluster)
library(ConsensusClusterPlus)
library(gridExtra)
library(tiff)
library(ijtiff)
```

## 1. Importing the Relevant Data

#### a) Create a DataFrame From Your CellProfiler Output

This step combines the `.csv` files in your `5_cellprofiler_output` folder into a single DataFrame, which is saved as `IMCData.rds` in a new `RDSFiles` folder.

**Important:** before this step, make sure to modify `Image.csv` as necessary to include the `DonorID` and `Condition` associated with each image.

```{r}
# Load CSVs
cells = read.csv("5_cellprofiler_output/cell.csv")
panel = read.csv("../raw/panel.csv")
image = read.csv("5_cellprofiler_output/Image.csv")

# Filter rows and columns
image = image %>%
  mutate(
    Image = str_remove(FileName_FullStack , "_full\\.tiff"),
    ROI = as.integer(str_extract(FileName_FullStack , "(?<=a)\\d+(?=_ac)"))
  ) %>%
  select(-FileName_FullStack)
panel = panel %>%
  dplyr::filter(trimws(Target) != "") %>%
  mutate(Metal = Metal.Tag) %>%
  select("Metal", "Target") %>%
  dplyr::filter(Metal != "Ir191")

# Join image data with cell data
cellsCombined = left_join(cells, image, by = join_by(ImageNumber))

# Add 'ImageShort' column to dataframe
cellsCombined = cellsCombined %>%
  mutate(ImShort = str_extract(Image, "(?<=_)[^_]+_s0_a\\d+")) %>%
  mutate(ImShort = str_replace(ImShort, "_s0_a", "_"))

# Define old column names and what to change them to
rename_vec = c(
  "Image" = "Image",
  "ImShort" = "ImShort",
  "ImageNumber" = "ImageID",
  "ROI" = "ROI",
  "ObjectNumber" = "CellID",
  "AreaShape_Area" = "Area",
  "AreaShape_Center_X" = "X",
  "AreaShape_Center_Y" = "Y",
  "DonorID" = "DonorID",
  "Condition" = "Condition"
)

# Change names in 'cellsCombined' from old names to new names
names(cellsCombined) = ifelse(names(cellsCombined) %in% names(rename_vec), rename_vec[names(cellsCombined)], names(cellsCombined))

# Name marker columns
markers = panel[,"Target"]
ordered_markers = markers[createSortedVector(length(markers))]
colnames(cellsCombined)[6:(length(markers)+5)] = ordered_markers

# Keep only desired columns
meta = unname(sapply(strsplit(rename_vec, " = "), '[', 1))
keep = c(meta, ordered_markers)
cellsCombined = cellsCombined %>% select(all_of(keep))

# Save DataFrame as a '.rds' file
dir.create("RDSFiles", showWarnings = FALSE)
saveRDS(cellsCombined, "RDSFiles/IMCData.rds")
```

#### b) Creating a SpatialExperiment Object

This step creates a `SpatialExperiment` object called `spe`, which we will be working out of for most of the subsequent analysis below. For more information on the `SpatialExperiment` class, see [this](https://www.bioconductor.org/packages/release/bioc/vignettes/SpatialExperiment/inst/doc/SpatialExperiment.html).

An arcsinh transformation is applied to the raw marker values, followed by min-max scaling to standardise the values so they lie between 0 and 1. The distributions of a chosen marker (eg. CD3) are plotted to help visualise the effect of these steps.

An `Image_uCellID_key.csv` file is also generated in your `analysis` folder that indicates which unique cell IDs (uCellIDs) correspond to which images.

```{r}
# Marker expression, other cellular features and spatial features (co-ordinates) are read and merged into a spatial experiment object
dt = readRDS("RDSFiles/IMCData.rds")

# Split data into marker intensities, metadata and co-ordinates
counts = dt[, ordered_markers]*65535
metadata = dt[, setdiff(meta, c("X","Y"))]
coords = dt[, c("X","Y")]

# Create spatial object
spe = SpatialExperiment(
  assays = list(counts = t(counts)),
  colData = metadata,
  spatialCoords = as.matrix(coords),
  sample_id = as.character(metadata$ImageID)
)

# Change variables to 'factor' type
spe$Image = as.factor(spe$Image)
spe$ImageID = as.factor(spe$ImageID)
spe$DonorID = as.factor(spe$DonorID)

### Assign colour palettes
color_vectors = list()
# DonorID
donor_ids = unique(spe$DonorID)
# Create a color vector that repeats  if there are more than 12 unique DonorIDs
donor_colors = brewer.pal(12, name = "Paired")[as.numeric(donor_ids) %% 12 + 1]
# Set the names of the colors to match the unique DonorIDs
DonorID = setNames(donor_colors, donor_ids)
color_vectors$DonorID = DonorID

# ImageID
image_ids = unique(spe$ImageID)
# Create a color vector that repeats  if there are more than 12 unique ImageIDs
image_colors = brewer.pal(12, name = "Paired")[as.numeric(image_ids) %% 12 + 1]
# Set the names of the colors to match the unique ImageIDs
ImageID = setNames(image_colors, image_ids)
color_vectors$ImageID = ImageID

# Merge colour palettes back into 'spe' object
metadata(spe)$color_vectors = color_vectors
colnames(spe) = paste0(spe$ImageID, "_", spe$CellID)

# Plot current distribution
suppressMessages(print(dittoRidgePlot(spe, var="CD3", group.by="ImageID", assay="counts") + ggtitle("CD3 - before transformation")))

# Plot distribution after arcsinh transforming data
assay(spe, "exprs") = asinh(counts(spe))
suppressMessages(print(dittoRidgePlot(spe, var="CD3", group.by="ImageID", assay="exprs") + ggtitle("CD3 - after arcsinh transformation")))

# Extract the expression matrix and image information
exprs_matrix = assay(spe, "exprs")
images = spe$Image

# Get unique images and channels
unique_images = unique(images)
channels = rownames(exprs_matrix)

#--------------------------------------------------#
# Scale (Z-transform) the arcsinh-transformed data #
#--------------------------------------------------#

# Prepare a matrix to store scaled expressions
base_matrix = matrix(NA, nrow = nrow(exprs_matrix), ncol = ncol(exprs_matrix))
rownames(base_matrix) = channels
colnames(base_matrix) = colnames(exprs_matrix)

# Iterate over each image and channel to scale
for (image in unique_images) {
  image_indices = which(images == image)
  for (channel in channels) {
    channel_data = exprs_matrix[channel, image_indices]
    # Perform z-normalization if there are enough non-NA values
    scaled_data = scale(channel_data)
    base_matrix[channel, image_indices] = scaled_data
  }
}
# Assign the scaled data back to the assay
assay(spe, "scaled_exprs") = base_matrix

# Plot distribution after scaling the data
suppressMessages(print(dittoRidgePlot(spe, var="CD3", group.by="ImageID", assay="scaled_exprs") + ggtitle("CD3 - after scaling")))

#-------------------------------------------------#
# Normalise values between 0 and 1 for each image #
#-------------------------------------------------#

# Prepare a matrix to store normalised expressions
base_matrix = matrix(NA, nrow = nrow(exprs_matrix), ncol = ncol(exprs_matrix))
rownames(base_matrix) = channels
colnames(base_matrix) = colnames(exprs_matrix)

# Iterate over each image and channel to normalise
for (image in unique_images) {
    image_indices = which(images == image)
    for (channel in channels) {
        channel_data = exprs_matrix[channel, image_indices]
        # Apply min-max scaling
        min_val = min(channel_data, na.rm = TRUE)
        max_val = max(channel_data, na.rm = TRUE)
        range_val = max_val - min_val
        # Avoid division by zero if all values in channel_data are the same
        if (range_val != 0) {
            norm_data = (channel_data - min_val) / range_val
        } else {
            norm_data = rep(0, length(channel_data))
        }
        base_matrix[channel, image_indices] = norm_data
    }
}

# Assign the normalised data back to the assay
assay(spe, "norm_exprs") = base_matrix

# Plot distribution after normalising the data
suppressMessages(print(dittoRidgePlot(spe, var="CD3", group.by="ImageID", assay="norm_exprs") + ggtitle("CD3 - after normalising")))

# Add a universal CellID column
colData(spe)$uCellID = 1:length(spe$CellID)
uIDKey = as.data.frame(colData(spe))
uIDKey = uIDKey %>%
  group_by(Image) %>%
  dplyr::filter(row_number()==1) %>%
  select(Image, ImShort, ImageID, uCellID)

# Write a CSV matching ImageIDs with uCellIDs
write.csv(uIDKey, "Image_uCellID_key.csv", row.names = FALSE)
saveRDS(spe, "RDSFiles/spe.rds")
```

## 2. Gating Populations in FlowJo with Help From Mantis

#### a) Export as `.fcs` to Gate in FlowJo

**CHANGE:** This step generates one `.fcs` file for **each** image in a new `FlowJo` folder, which you can then gate on as usual. There is also an `exports` folder created **within** your `FlowJo` folder for you to export your gated populations to.

```{r}
# Set desired assay
assay_name = "scaled_exprs"
```

```{r}
# Load in 'spe' object
spe = readRDS("RDSFiles/spe.rds")

# Re-extract image data and metadata
image = read.csv("5_cellprofiler_output/image.csv") %>%
  select(ImageNumber) %>%
  dplyr::rename(ImageID = ImageNumber)
metadata = as.data.frame(colData(spe)) %>%
  select(ImageID, uCellID, CellID, Area)

# Multiply by 1000 to match scale of visualisation in flowjo.
assayData = as.data.frame(t(assay(spe, assay_name)))

# Combine metadata and assay data
df = cbind(metadata, as.data.frame(spatialCoords(spe)), assayData)
df$ImageID = as.numeric(df$ImageID)
df = df %>%
  left_join(image, by="ImageID") %>%
  group_by(ImageID) %>%
  mutate(Y = max(Y)-Y)

# Convert dataframe to matrix for exporting
data_mat = apply(as.matrix(df), 2, as.numeric)
# Ensure column names are set
colnames(data_mat) = make.names(colnames(data_mat))
# Convert to flowFrame
ff = flowFrame(data_mat)

# Export data as '.fcs' to gate in FlowJo
dir.create("../FlowJo/exports", showWarnings = FALSE, recursive = TRUE)
invisible(write.FCS(ff, file = "../FlowJo/Images.fcs"))
```

#### b) Prepare a `MantisProject` Folder to View Cells in Mantis

You can download Mantis Viewer from [here](https://github.com/CANDELbio/mantis-viewer/releases). It is an application we will use to visualise cells and marker expressions from our IMC images. Mantis Viewer requires a `MantisProject` folder to view images from, which will be generated by the chunk below.

```{r}
# Set desired assay
assay_name = "scaled_exprs"
```

```{r}
# Load in appropriate files
spe = readRDS("RDSFiles/spe.rds")
panel = read.csv("../raw/panel.csv")
image_key = read.csv("Image_uCellID_key.csv")

# Create 'MantisProject' folder
dir.create("../MantisProject", showWarnings = FALSE)

# List TIFFs and CSVs in the '1c_full_images' folder
tiffs = list.files("1c_full_images", pattern = "\\.tiff$")
csvs = list.files("1c_full_images", pattern = "\\.csv$")

# Generate individual channel images as TIFFs and copy them to the 'MantisProject' folder
for (x in seq_along(tiffs)) {
  # Extract the name of the current image from the filename
  cur_img_full = gsub(".{10}$", "", tiffs[x])
  # Obtain corresponding 'ImShort' label
  cur_img = image_key %>%
    dplyr::filter(Image == cur_img_full) %>%
    pull(ImShort)

  # Create a new folder for the current image
  dir.create(paste0("../MantisProject/", cur_img), showWarnings = FALSE)
  # Read TIFF and CSV associated with the current image
  cur_tiff = suppressWarnings(readTIFF(paste0("1c_full_images/", tiffs[x]), all=TRUE))
  cur_csv = read.csv(paste0("1c_full_images/", csvs[x]), header=FALSE)$V1
  # Save each individual channel in the TIFF as a separate image
  for (channel in seq_along(cur_tiff)) {
    metal_tag = cur_csv[channel]
    marker = panel$Target[panel$Metal.Tag == metal_tag]
    filename = paste0("../MantisProject/", cur_img, "/", marker, ".tiff")
    mat = cur_tiff[[channel]]*65535
    write_tif(mat, filename, overwrite=TRUE, msg=FALSE)
  }
}

# Copy segmentation masks to 'MantisProject' folder
seg_list = list.files("3a_segmentation_masks", pattern = "\\.tif$")
for (cur_dir in seg_list) {
  cur_img_full = gsub(".{23}$", "", cur_dir)
  cur_img = image_key %>%
    dplyr::filter(Image == cur_img_full) %>%
    pull(ImShort)
  new_dir = paste0("../MantisProject/", cur_img, "/SegmentationFile.tif")
  file.copy(paste0("3a_segmentation_masks/", cur_dir), new_dir)
}

# Extract the full assay data (markers) and transpose it
markers = rownames(spe)
marker_data = as.data.frame(t(assay(spe, assay_name)))

# Add metadata as columns to marker_data
metadata = as.data.frame(colData(spe))
marker_data = cbind(metadata, marker_data)

for (img in unique(colData(spe)$ImShort)) {
  # Filter the combined data to keep only rows corresponding to images in Mantis
  filtered_data = marker_data[marker_data$ImShort %in% c(img), ]
  # Select only relevant columns
  selected_columns = c("ImShort", "CellID", markers)
  final_data = filtered_data[, selected_columns]
  # Check if all entries in the 'ImShort' column are the same (only one image)
  if (length(unique(final_data$ImShort)) == 1) {
      final_data$ImShort <- NULL
  }
  # Rename the remaining columns if necessary
  colnames(final_data)[colnames(final_data) %in% c("CellID")] = c("Segment ID")
  # Save the final filtered and selected data to a CSV file
  write.csv(final_data, paste0("../MantisProject/", img, "/SegmentFeatures.csv"), row.names = FALSE)
}
```

#### c) Gate in FlowJo with Reference to Mantis Viewer

Use Mantis Viewer to assist your gating in FlowJo, by visually checking your threshold values for each marker to see that they are accurate.

#### d) Import FlowJo Clusters Into Mantis Viewer For Verifying

To check that you are happy with your gating in FlowJo, you can import your populations into Mantis Viewer and view them. This step generates a `FlowJo_CellTypes.csv` file in your `MantisProject` folder, which you can **import** and **view** as a new **population** in Mantis Viewer.

If you wish to change your gating a bit, simply create new gates in FlowJo and re-export your populations as `.csv` files to your `FlowJo/exports` folder. Then, **re-run** the chunk below to **overwrite** the existing `FlowJo_CellTypes.csv` file so you can view your new populations.

```{r}
# Load in 'spe' object
spe = readRDS("RDSFiles/spe.rds")

# List exported CSVs from FlowJo
csvs = list.files(path = "../FlowJo/exports", pattern = "*.csv", full.names = TRUE)

# Combine CSVs from FlowJo and metadata into one DataFrame
csvs = lapply(csvs, add_celltype)
flowjo_df = bind_rows(csvs) %>%
  select(uCellID, celltype_flowjo)
flowjo_df$celltype_flowjo = as.factor(flowjo_df$celltype_flowjo)
metadata = as.data.frame(colData(spe))
flowjo_df = left_join(metadata, flowjo_df, by = c("uCellID"))

# Remove NAs (cells without an assigned population from FlowJo)
celltype_info = flowjo_df[, c("ImShort", "CellID", "celltype_flowjo")] %>%
  na.omit()

# Export FlowJo populations for viewing in Mantis Viewer
celltype_by_image = split(celltype_info, celltype_info$ImShort)
celltypes = lapply(names(celltype_by_image), extract_celltypes)
celltype_df = bind_rows(celltypes)
write.table(celltype_df, "../MantisProject/FlowJo_CellTypes.csv", sep=",",  col.names=FALSE, row.names=FALSE)

# Save FlowJo populations as a '.rds' file
saveRDS(celltype_df, "RDSFiles/flowjo_celltypes.rds")
```

## 3. Clustering and Annotating Cell Populations

#### a) Perform SOM Clustering on FlowJo Populations and View in Mantis Viewer

This step uses FlowSOM to perform SOM clustering on the populations you gated in FlowJo. Before running the code, there are some variables you need to set first:

-   `to_cluster`: a list of FlowJo populations to perform clustering on - these should be named **exactly** as you named them in FlowJo

-   `maxClust`: the maximum number of clusters for FlowSOM to generate - this is **not** the final number of clusters you will end up with

-   `clustNumber`: the **final** number of clusters to keep and display

-   `assay_name`: the name of the assay you wish to take values from

-   `forClust`: a list of markers to perform clustering using - these should be named **exactly** as they are in your `panel.csv` file

-   `batchCol`: if wanting to perform batch correction (using Harmony), set this to the column name that denotes your unique batches (eg. `"DonorID"`) - otherwise, set this to `NULL`

-   `heatmapOrder`: the order in which to display markers on your generated heatmap (from top to bottom) - note this does **not** have to include all markers in your image

Set these variables in the chunk below, then remember to **run** it after.

```{r}
# List of FlowJo populations to perform clustering on
to_cluster = c("T cell")
# Maximum number of clusters to generate
maxClust = 50
# Final number of clusters to keep
clustNumber = 30
# Name of assay you wish to take values from
assay_name = "norm_exprs"
# Markers to perform clustering using
forClust = c("CD4","CD8","Ki67","CCR5","CCR6","CD127","CD45RO",
             "CD45RA","CD69-Cy5","FoxP3-Biotin","HLA-DR")
# Column name denoting unique batches if batch correction is needed
batchCol = NULL
# Order to display markers in on the generated heatmap (does NOT have to include all markers)
heatmapOrder = c("CD4","CD8","Ki67","CCR5","CCR6","CD127","CD45RO",
                 "CD45RA","CD69-Cy5","FoxP3-Biotin","HLA-DR")
```

The code chunk below performs SOM clustering using the variables you have assigned above, then generates 4 graphs that are saved to a new `Graphs/Unannotated` folder:

1.  `Heatmap.png`: shows the marker expression values of individual cells and the clusters they belong to

2.  `Marker_plots.png`: shows the expression values of different markers, each on its own separate UMAP graph

3.  `UMAP_labelled.png`: a single UMAP graph labelled with the different cell clusters

4.  `Zscore.png`: shows how the marker expression values in each cluster compare to each other (as z-scores)

It also generates a `Clustered_CellTypes.csv` file in your `MantisProject` folder, which you can **import** and **view** as new **populations** in Mantis Viewer.

```{r}
# Load in FlowJo populations and 'spe' object
celltype_df = readRDS("RDSFiles/flowjo_celltypes.rds")
spe = readRDS("RDSFiles/spe.rds")

# Select only cells in desired FlowJo populations
desired_cells = celltype_df$celltype_flowjo %in% to_cluster
cells_forClust = celltype_df[desired_cells, ] %>%
  select(-celltype_flowjo) %>%
  distinct()

# Perform clustering
cluster_results = cluster_cells(
  spe,
  cells_forClust,
  forClust,
  batchCol,
  maxClust
)
clustOut = cluster_results$clustOut
ccp = cluster_results$ccp

# Visualize delta area plot
CATALYST:::.plot_delta_area(ccp)

# Save Heatmap, Z-score and UMAP graphs to 'Graphs' folder using clustering results
dir.create("../Graphs", showWarnings = FALSE)
dir.create("../Graphs/Unannotated", showWarnings = FALSE)
dir.create("../Graphs/Annotated", showWarnings = FALSE)
dir.create("../Graphs/Reassigned", showWarnings = FALSE)

spe_subset = makeGraphs(
  clustOutF=clustOut,
  ccpF=ccp,
  ccp_clustNum=clustNumber,
  graph_type="All",
  annotate_labels=TRUE,
  save_path="../Graphs/Unannotated",
  genes=heatmapOrder,
  assayName = assay_name
)
spe_subset$SOM_Cluster = spe_subset$Cluster

# Generate UMAP of individual marker expression values and save to 'Graphs' folder
save_marker_plots(
  spe_subset,
  assayName = assay_name
)

# Save clustered data
saveRDS(spe_subset, "RDSFiles/clustered_spe.rds")

# Export clusters as a CSV to view them in Mantis Viewer
clust_celltypes = as.data.frame(colData(spe_subset))
clust_celltypes = clust_celltypes %>%
  select(Image, CellID, Cluster)
write.table(clust_celltypes, "../MantisProject/Clustered_CellTypes.csv", sep=",",  col.names=FALSE, row.names=FALSE)
saveRDS(clust_celltypes, "RDSFiles/clustered_celltypes.rds")
```

#### b) Annotate Clusters and View Them in Mantis Viewer

After you have viewed and analysed your clusters, you can annotate them by matching the number of each cluster to a corresponding label. For example, you can label Cluster #1 as "CD8+ T" by typing `"1" = "CD8+ T"` in the code chunk below.

This step will generate an `Annotated_CellTypes.csv` file in your `MantisProject` folder, which you can **import** and **view** as new **populations** in Mantis Viewer.

Additionally, it will also generate a new set of graphs using your annotations, which will be saved to a new `Graphs/Annotated` folder.

```{r}
# Load in clustered data from above
spe_subset = readRDS("RDSFiles/clustered_spe.rds")

# Write the number corresponding to the cluster on the left, and the newly annotated name on the right
named_clusters = recode(
  spe_subset$SOM_Cluster,
  "1" = "CD8+ T",
  "2" = "CD4+ T",
  "3" = "Tregs",
  "4" = "CD4+ T",
  "5" = "Cycling cells",
  "6" = "Non-immune",
  "7" = "Epithelial CD8+",
  "8" = "MNP",
  "9" = "Non-immune",
  "10" = "Tregs"
)

# Load in clusters from above and rename them
named_celltypes = readRDS("RDSFiles/clustered_celltypes.rds")
named_celltypes$Cluster = named_clusters
saveRDS(named_celltypes, "RDSFiles/annotated_celltypes.rds")

# Save newly annotated clusters as a CSV to view them in Mantis Viewer
spe_subset$Cluster = named_clusters
write.table(named_celltypes, "../MantisProject/Annotated_CellTypes.csv", sep=",",  col.names=FALSE, row.names=FALSE)

# Save annotated data
saveRDS(spe_subset, "RDSFiles/annotated_spe.rds")

# Save newly annotated Heatmap, Z-score and UMAP graphs to 'Graphs' folder
makeAnnotGraphs(
  speF=spe_subset,
  ccp_clustNum=clustNumber,
  graph_type="All",
  annotate_labels=TRUE,
  save_path="../Graphs/Annotated",
  genes=heatmapOrder
)
```

#### c) Adjust Clusters by Re-assigning Cells If Necessary

After annotating your cell clusters, you can adjust them by re-assigning certain cells to a different population, based on their expression of a specific marker.

The chunk below allows you to visualise the distribution of marker counts for a specific population, so that you can determine an appropriate cut-off for re-assigning cells in the chunk after. To do so, adjust the following parameters in the `visMarker` function:

-   `target`: the population you would like to visualise marker counts for (this should have the **same** **name** as one of your annotations above)

-   `marker`: the marker you would like to visualise the distribution of

-   `value`: the cut-off value you would like to visualise on the graph

For example, the sample code given evaluates "Tregs" cells for their "CD45" expression, with a cut-off of 0.6 drawn on the graph.

```{r}
# Load in annotated data from above
spe_subset = readRDS("RDSFiles/annotated_spe.rds")

# Visualise marker distributions
visMarker(
  spe_subset,
  target="Tregs",
  marker="CD45",
  value=0.6
)
```

Once you have decided on an appropriate cut-off value, you can then re-assign cells based on the specific marker you have looked at. This is done using the `moveCells` function below:

-   `from`: the **original** population to move cells **out of**

-   `to`: the **new** population to move cells **into**

-   `marker`: the specific marker to move cells based on

-   `direction`: should be either "\<" or "\>", indicating whether to move cells with a marker expression **below** or **above** the cut-off value

-   `value`: the cut-off point chosen for the specific marker

For example, the sample code given below moves any cells from **"Tregs"** with CD45 expression less than ("\<") 0.6 to the **"Non-immune"** population:

`spe_subset = moveCells(spe_subset, from="Tregs", to="Non-immune", marker="CD45", direction="<", value=0.6)`

Following this, the number of cells before and after the re-assignment in the **original** population will be shown as a table. For example, the table below indicates that for **Donor #1**, there were 224 cells in the **"Tregs"** population **prior** to the re-assignment and 213 cells **after**.

| DonorID | counts_before | counts_after |
|---------|---------------|--------------|
| 1       | 224           | 213          |

To keep track of your re-assignments a `ClusterModTracker.csv` file will be saved to your `analysis` folder, indicating **all** re-assignments that have been completed:

| from_pop | to_pop     | condition   |
|----------|------------|-------------|
| Tregs    | Non-immune | CD45 \< 0.6 |

Finally, a `Reassigned_CellTypes.csv` file will be generated in your `MantisProject` folder, which you can **import** and **view** as new **populations** in Mantis Viewer. The graphs in your `Graphs/Annotated` folder will also be **updated** to reflect your re-assignments.

```{r}
# Load in annotateed data from above
spe_subset = readRDS("RDSFiles/annotated_spe.rds")

# Re-assign cells
spe_subset = moveCells(
  spe_subset,
  from="Tregs",
  to="Non-immune",
  marker="CD45",
  direction="<",
  value=0.6
)

# Make temporary column the new 'celltype' column
colData(spe_subset)$Cluster = colData(spe_subset)$reassigned_celltype
colData(spe_subset)$reassigned_celltype = NULL

# Save newly annotated Heatmap, Z-score and UMAP graphs to 'Graphs' folder
makeAnnotGraphs(
  speF=spe_subset,
  ccp_clustNum=clustNumber,
  graph_type="All",
  annotate_labels=TRUE,
  save_path="../Graphs/Reassigned",
  genes=heatmapOrder
)

# Save subsetted data
saveRDS(spe_subset, "RDSFiles/reassigned_spe.rds")

# Create new population CSV for viewing in Mantis
reassigned_celltypes = as.data.frame(colData(spe_subset)) %>%
  select(Image, CellID, Cluster)
write.table(reassigned_celltypes, "../MantisProject/Reassigned_CellTypes.csv", sep=",",  col.names=FALSE, row.names=FALSE)
saveRDS(reassigned_celltypes, "RDSFiles/reassigned_celltypes.rds")
```