Skip to content

Commit

Permalink
Day 1 answer key
Browse files Browse the repository at this point in the history
  • Loading branch information
epartan authored Nov 8, 2024
1 parent ef160b8 commit d24d157
Showing 1 changed file with 105 additions and 1 deletion.
106 changes: 105 additions & 1 deletion homework/Day_answer_key.md
Original file line number Diff line number Diff line change
@@ -1 +1,105 @@
## Day 1 Answer key
# Day 1 Answer key

## Aggregating counts by celltype using pseudobulk approach

**Another cell type in this dataset that was particularly interesting to the authors were the Pdgfr α+ adipose progentior cells (APCs).**
**Subset the bulk object to isolate only adipose progenitor cells for the TN and cold7 conditions. Assign it to variable called bulk_APC.** Hint: You may need to review celltypes to determine what this cell type is called in our data.

```
celltypes
# Compare TN vs cold7 in APC cells
bulk_APC <- subset(bulk, subset = (celltype == "AP") & (condition %in% c("TN", "cold7")))
```

**Plot the cell number distribution across samples. How do the numbers compare to VSM cells?**

```
# Visualize number of cells per condition
ggplot([email protected], aes(x = sample, y = n_cells, fill = condition)) +
geom_bar(stat = "identity", color = "black") +
theme_classic() +
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1)) +
labs(x = "Sample name", y = "Number of cells") +
geom_text(aes(label = n_cells), vjust = -0.5)
```

Overall we see far fewer cells, by an order of magnitude (scale goes to 2,000 for VSM but only 600 for APC).
There is also a different distribution: the counts for Sample-10 and Sample-9 go down relative to other samples, while Sample-8 goes up.

**Using the code below, create a DESeq2 object for the Pdgfr α+ APCs data.** There is nothing to submit for this exercise, but please run the code as you will need dds_APC for future exercises.

```
# Get count matrix
APC_counts <- FetchData(bulk_APC, layer = "counts", vars = rownames(bulk_APC))
# Create DESeq2 object
# transpose it to get genes as rows
dds_APC <- DESeqDataSetFromMatrix(t(APC_counts),
colData = [email protected],
design = ~ condition)
dds
```

## DE analysis of pseudobulk data using DESeq2

**Use the dds_APC object to compute the rlog transformed counts for the Pdgfr α+ APCs.**

```
# Transform counts for data visualization
rld_APC <- rlog(dds_APC, blind = TRUE)
```

**Create a PCA plot for the Pdgfr α+ APCs, coloring points by condition. Do samples segregate by condition? Is there more or less variability within group than observed with the VSM cells?**

```
# Plot PCA
plotPCA(rld_APC, intgroup = c("condition")) + theme_classic()
```

Samples still segregate by TN/cold7 but there is greater variaibility than in VSM cells.

**Evaluate the sample similarity using a correlation heatmap. How does this compare with the trends observed in the PCA plot?**

```
# Calculate sample correlation
rld_APC_mat <- assay(rld_APC)
rld_APC_cor <- cor(rld_APC_mat)
# Change sample names to original values
# For nicer plots
rename_samples <- bulk_APC$sample
colnames(rld_APC_cor) <- str_replace_all(colnames(rld_APC_cor), rename_samples)
rownames(rld_APC_cor) <- str_replace_all(rownames(rld_APC_cor), rename_samples)
# Plot heatmap
anno <- [email protected] %>%
select(sample, condition) %>%
remove_rownames() %>%
column_to_rownames("sample")
pheatmap(rld_APC_cor, annotation_col = anno, annotation_row = anno)
```

Unfortunately, the samples do not neatly separate by condition in this celltype. As shown in the PCA plot, three cold7 samples (7,8,15) are closely related on PC1, while the last cold7 sample (16) is more distant and groups with two TN samples (9,10) on PC2.

**Using the code below, run DESeq2 for the Pdgfr α+ APCs data. Following that draw the dispersion plot. Based on this plot do you think there is a reasonable fit to the model?**

```
# Run DESeq2 differential expression analysis
dds_APC <- DESeq(dds_APC)
# Plot gene-wise dispersion estimates to check model fit
plotDispEsts(dds_APC)
```

We do see an inverse relationship between mean and dispersion (line slopes downward from left to right). This indicates that our data is a good fit for the model.

**Generate results for the Pdgfr α+ APCs and save it to a variable called res_APC.** There is nothing to submit for this exercise, but please run the code as you will need res_APC for future exercises.

```
# Results of Wald test
res_APC <- results(dds_APC,
contrast = contrast,
alpha = 0.05)
# Shrink the log2 fold changes to be more appropriate using the apeglm method
res_APC <- lfcShrink(dds_APC,
coef = "condition_cold7_vs_TN",
res = res_APC,
type = "apeglm")
```

0 comments on commit d24d157

Please sign in to comment.