-
Notifications
You must be signed in to change notification settings - Fork 1
/
5_marker_gene_identification.qmd
193 lines (133 loc) · 5.38 KB
/
5_marker_gene_identification.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
---
title: "Marker gene identification"
engine: knitr
---
We load the required packages:
```{r}
#| output: false
library(Seurat)
library(ggplot2)
library(clustree)
library(patchwork)
library(dplyr)
```
And we load the list created after normalization and scaling, followed by a merge:
```{r}
seu <- readRDS("output/seu_part4.rds")
```
## Marker genes
Finally, we can identify marker genes for each cluster. For that, we move away from the `integrated` assay, and use `SCT` as default again. Because our object is a merge between two slices, each with their own SCT model, we need to correct the counts and data slots. This is done by `PrepSCTFindMakers`. After that, we use `FindAllMarkers` to identify markers for each cluster versus all other spots.
```{r}
#| warning: false
DefaultAssay(seu) <- "SCT"
seu <- PrepSCTFindMarkers(seu)
all_marks <- FindAllMarkers(seu, only.pos = TRUE, min.pct = 0.25, logfc.threshold = 0.25)
```
::: {.callout-important}
## Exercise
Check out the results in `all_marks`. What is the top marker for cluster 4?
:::
::: {.callout-tip collapse="true"}
## Answer
Here's a oneliner for a table representation:
```{r}
all_marks |>
filter(cluster == 4) |>
arrange(desc(avg_log2FC)) |>
head(5) |>
knitr::kable()
```
:::
To get a broad overview of all marker genes we can do the following:
- Find the top 3 marker genes for each cluster. Here, we take an approximation of the test statistic (`avg_log2FC * -log10(p_val_adj + 1e-300)`) to sort the genes.
- Then we create a dotplot of the expression of each gene per cluster
```{r}
#| fig-width: 9
top_markers <- all_marks |>
mutate(order_value = avg_log2FC * -log10(p_val_adj + 1e-300)) |>
group_by(cluster) |>
slice_max(n = 3, order_by = order_value)
DotPlot(seu, features = unique(top_markers$gene)) +
scale_x_discrete(guide = guide_axis(angle = 45))
```
Now, we can check whether the expression pattern corresponds with the cluster, e.g. the top marker of cluster 3:
```{r}
SpatialDimPlot(seu,
cells.highlight = CellsByIdentities(object = seu,
idents = 7)) +
plot_layout(guides='collect') &
theme(legend.position = "none")
SpatialPlot(seu,
features = top_markers$gene[top_markers$cluster == 7][1],
pt.size.factor = 2)
```
::: {.callout-important}
## Exercise
What kind of tissue is cluster 7 you think? Does the mouse brain atlas show similar patterns for this gene?
Compare the expression to the images at [Allen Brain Atlas](http://atlas.brain-map.org/atlas?atlas=2&plate=100883804#atlas=2&plate=100884129&resolution=19.04&x=7671.818403764205&y=4000&zoom=-4&structure=549) to figure out the tissue type.
To compare the expression, go to [mouse.brain-map.org](https://mouse.brain-map.org/), type the gene name in the search box. Select the gene entry in the sagittal plane, click 'View selections' at the left bottom of the page. Select a similar slice (i.e. in the middle of the brain).
:::
::: {.callout-tip collapse="true"}
## Answer
Based on the comparison of the brain atlas it seems to be layer 2/3 of the isocortex.
It has similar expression based on in-situ hybridization from the brain atlas. Also expression in the striatium and medulla correspond.
![](assets/images/Lamp5_sagittal_ISH.png)
:::
::: {.callout-important}
## Exercise
Create the same visualization for cluster 0. What is the top marker? Can you guess what kind of cells this tissue is mostly comprised of?
The marker is expressed in many spots, and therefore it is not part of specific cell groups. Therefore, check out the gene at e.g. wikipedia.
:::
::: {.callout-tip collapse="true"}
## Answer
```{r}
SpatialDimPlot(seu,
cells.highlight = CellsByIdentities(object = seu,
idents = 0)) +
plot_layout(guides='collect') &
theme(legend.position = "none")
SpatialPlot(seu,
features = top_markers$gene[top_markers$cluster == 0][1],
pt.size.factor = 2)
```
The top gene is [Plp1](https://en.wikipedia.org/wiki/Proteolipid_protein_1) which protein is an important component of the myelin sheets of neurons. Therefore cluster 0 seems to represent the white matter of the brain.
:::
## Bonus exercise: spatially variable features
In stead of looking for features that are variable among all spots, you can also identify that mainly vary in space, i.e. while taking spot distance into account:
```{r}
#| warning: FALSE
#| message: FALSE
#| output: FALSE
# take the output of part 3 (non-integrated and non-clustered)
seu_list <- readRDS("output/seu_part3.rds")
# we only do Anterior for now
DefaultAssay(seu_list$Anterior) <- "SCT"
seu_list$Anterior <-
FindSpatiallyVariableFeatures(
seu_list$Anterior,
selection.method = "moransi",
features = VariableFeatures(seu_list$Anterior)
)
spatialFeatures <-
SVFInfo(seu_list$Anterior, method = "moransi", status = TRUE)
spatialFeatures <-
spatialFeatures |> arrange(rank)
```
::: {.callout-important}
## Exercise
Run the code and plot the top spatial variable genes.
The spatial features are ordered already ordered, so you can get them with e.g.
```{r}
#| eval: FALSE
rownames(spatialFeatures)[1:4]
```
Any interesting marker genes in there?
:::
::: {.callout-tip collapse="true"}
## Answer
```{r}
SpatialPlot(seu_list$Anterior,
features = rownames(spatialFeatures)[1:4],
ncol = 2)
```
:::