5_marker_gene_identification.qmd

---
title: "Marker gene identification"
engine: knitr
---

We load the required packages:

```{r}
#| output: false
library(Seurat)
library(ggplot2)
library(clustree)
library(patchwork)
library(dplyr)
```

And we load the list created after normalization and scaling, followed by a merge:

```{r}
seu <- readRDS("output/seu_part4.rds")
```

## Marker genes

Finally, we can identify marker genes for each cluster. For that, we move away from the `integrated` assay, and use `SCT` as default again. Because our object is a merge between two slices, each with their own SCT model, we need to correct the counts and data slots. This is done by `PrepSCTFindMakers`. After that, we use `FindAllMarkers` to identify markers for each cluster versus all other spots. 

```{r}
#| warning: false
DefaultAssay(seu) <- "SCT"
seu <- PrepSCTFindMarkers(seu)
all_marks <- FindAllMarkers(seu, only.pos = TRUE, min.pct = 0.25, logfc.threshold = 0.25)
```

::: {.callout-important}
## Exercise

Check out the results in `all_marks`. What is the top marker for cluster 4? 

::: 

::: {.callout-tip collapse="true"}
## Answer

Here's a oneliner for a table representation:

```{r}
all_marks |>
  filter(cluster == 4) |>
  arrange(desc(avg_log2FC)) |>
  head(5) |>
  knitr::kable()
```
:::

To get a broad overview of all marker genes we can do the following:

-  Find the top 3 marker genes for each cluster. Here, we take an approximation of the test statistic (`avg_log2FC * -log10(p_val_adj + 1e-300)`) to sort the genes.
- Then we create a dotplot of the expression of each gene per cluster

```{r}
#| fig-width: 9
top_markers <- all_marks |>
  mutate(order_value = avg_log2FC * -log10(p_val_adj + 1e-300)) |>
  group_by(cluster) |>
  slice_max(n = 3, order_by = order_value)

DotPlot(seu, features = unique(top_markers$gene)) +
  scale_x_discrete(guide = guide_axis(angle = 45))
```

Now, we can check whether the expression pattern corresponds with the cluster, e.g. the top marker of cluster 3:

```{r}
SpatialDimPlot(seu, 
               cells.highlight = CellsByIdentities(object = seu,
                                                   idents = 7)) + 
  plot_layout(guides='collect') &
  theme(legend.position = "none") 

SpatialPlot(seu,
            features = top_markers$gene[top_markers$cluster == 7][1],
            pt.size.factor = 2) 
```

::: {.callout-important}
## Exercise

What kind of tissue is cluster 7 you think? Does the mouse brain atlas show similar patterns for this gene?

Compare the expression to the images at [Allen Brain Atlas](http://atlas.brain-map.org/atlas?atlas=2&plate=100883804#atlas=2&plate=100884129&resolution=19.04&x=7671.818403764205&y=4000&zoom=-4&structure=549) to figure out the tissue type.

To compare the expression, go to [mouse.brain-map.org](https://mouse.brain-map.org/), type the gene name in the search box. Select the gene entry in the sagittal plane, click 'View selections' at the left bottom of the page. Select a similar slice (i.e. in the middle of the brain).


::: 

::: {.callout-tip collapse="true"}
## Answer

Based on the comparison of the brain atlas it seems to be layer 2/3 of the isocortex. 

It has similar expression based on in-situ hybridization from the brain atlas. Also expression in the striatium and medulla correspond. 

![](assets/images/Lamp5_sagittal_ISH.png)

:::

::: {.callout-important}
## Exercise

Create the same visualization for cluster 0. What is the top marker? Can you guess what kind of cells this tissue is mostly comprised of? 

The marker is expressed in many spots, and therefore it is not part of specific cell groups. Therefore, check out the gene at e.g. wikipedia. 

::: 

::: {.callout-tip collapse="true"}
## Answer

```{r}
SpatialDimPlot(seu, 
               cells.highlight = CellsByIdentities(object = seu,
                                                   idents = 0)) + 
  plot_layout(guides='collect') &
  theme(legend.position = "none") 

SpatialPlot(seu,
            features = top_markers$gene[top_markers$cluster == 0][1],
            pt.size.factor = 2) 
```

The top gene is [Plp1](https://en.wikipedia.org/wiki/Proteolipid_protein_1) which protein is an important component of the myelin sheets of neurons. Therefore cluster 0 seems to represent the white matter of the brain. 

:::


## Bonus exercise: spatially variable features

In stead of looking for features that are variable among all spots, you can also identify that mainly vary in space, i.e. while taking spot distance into account:

```{r}
#| warning: FALSE
#| message: FALSE
#| output: FALSE

# take the output of part 3 (non-integrated and non-clustered)
seu_list <- readRDS("output/seu_part3.rds")

# we only do Anterior for now
DefaultAssay(seu_list$Anterior) <- "SCT"

seu_list$Anterior <-
  FindSpatiallyVariableFeatures(
    seu_list$Anterior,
    selection.method = "moransi",
    features = VariableFeatures(seu_list$Anterior)
  )

spatialFeatures <-
  SVFInfo(seu_list$Anterior, method = "moransi", status = TRUE)

spatialFeatures <-
  spatialFeatures |> arrange(rank)


```

::: {.callout-important}
## Exercise

Run the code and plot the top spatial variable genes. 

The spatial features are ordered already ordered, so you can get them with e.g.

```{r}
#| eval: FALSE
rownames(spatialFeatures)[1:4]
```

Any interesting marker genes in there?

::: 

::: {.callout-tip collapse="true"}
## Answer

```{r}
SpatialPlot(seu_list$Anterior,
            features = rownames(spatialFeatures)[1:4],
            ncol = 2)
```
:::