Day 1 answer key

hbctraining · Nov 8, 2024 · d24d157 · d24d157
1 parent ef160b8
commit d24d157
Showing 1 changed file with 105 additions and 1 deletion.
diff --git a/homework/Day_answer_key.md b/homework/Day_answer_key.md
@@ -1 +1,105 @@
-## Day 1 Answer key
+# Day 1 Answer key
+
+## Aggregating counts by celltype using pseudobulk approach
+
+**Another cell type in this dataset that was particularly interesting to the authors were the Pdgfr α+ adipose progentior cells (APCs).**
+**Subset the bulk object to isolate only adipose progenitor cells for the TN and cold7 conditions. Assign it to variable called bulk_APC.** Hint: You may need to review celltypes to determine what this cell type is called in our data.
+
+```
+celltypes
+# Compare TN vs cold7 in APC cells
+bulk_APC <- subset(bulk, subset = (celltype == "AP")  & (condition %in% c("TN", "cold7")))
+```
+
+**Plot the cell number distribution across samples. How do the numbers compare to VSM cells?**
+
+```
+# Visualize number of cells per condition
+ggplot([email protected], aes(x = sample, y = n_cells, fill = condition)) +
+  geom_bar(stat = "identity", color = "black") +
+  theme_classic() +
+  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1)) +
+  labs(x = "Sample name", y = "Number of cells") +
+  geom_text(aes(label = n_cells), vjust = -0.5)
+```
+
+Overall we see far fewer cells, by an order of magnitude (scale goes to 2,000 for VSM but only 600 for APC).
+There is also a different distribution: the counts for Sample-10 and Sample-9 go down relative to other samples, while Sample-8 goes up.
+
+**Using the code below, create a DESeq2 object for the Pdgfr α+ APCs data.** There is nothing to submit for this exercise, but please run the code as you will need dds_APC for future exercises.
+
+```
+# Get count matrix
+APC_counts <- FetchData(bulk_APC, layer = "counts", vars = rownames(bulk_APC))
+
+# Create DESeq2 object
+# transpose it to get genes as rows
+dds_APC <- DESeqDataSetFromMatrix(t(APC_counts),
+                                  colData = [email protected],
+                                  design = ~ condition)
+dds
+```
+
+## DE analysis of pseudobulk data using DESeq2
+
+**Use the dds_APC object to compute the rlog transformed counts for the Pdgfr α+ APCs.**
+
+```
+# Transform counts for data visualization
+rld_APC <- rlog(dds_APC, blind = TRUE)
+```
+
+**Create a PCA plot for the Pdgfr α+ APCs, coloring points by condition. Do samples segregate by condition? Is there more or less variability within group than observed with the VSM cells?**
+
+```
+# Plot PCA
+plotPCA(rld_APC, intgroup = c("condition")) + theme_classic()
+```
+
+Samples still segregate by TN/cold7 but there is greater variaibility than in VSM cells.
+
+**Evaluate the sample similarity using a correlation heatmap. How does this compare with the trends observed in the PCA plot?**
+
+```
+# Calculate sample correlation
+rld_APC_mat <- assay(rld_APC)
+rld_APC_cor <- cor(rld_APC_mat)
+# Change sample names to original values
+# For nicer plots
+rename_samples <- bulk_APC$sample
+colnames(rld_APC_cor) <- str_replace_all(colnames(rld_APC_cor), rename_samples)
+rownames(rld_APC_cor) <- str_replace_all(rownames(rld_APC_cor), rename_samples)
+# Plot heatmap
+anno <- [email protected] %>%
+  select(sample, condition) %>% 
+  remove_rownames() %>% 
+  column_to_rownames("sample")
+pheatmap(rld_APC_cor, annotation_col = anno, annotation_row = anno)
+```
+
+Unfortunately, the samples do not neatly separate by condition in this celltype. As shown in the PCA plot, three cold7 samples (7,8,15) are closely related on PC1, while the last cold7 sample (16) is more distant and groups with two TN samples (9,10) on PC2.
+
+**Using the code below, run DESeq2 for the Pdgfr α+ APCs data. Following that draw the dispersion plot. Based on this plot do you think there is a reasonable fit to the model?**
+
+```
+# Run DESeq2 differential expression analysis
+dds_APC <- DESeq(dds_APC)
+# Plot gene-wise dispersion estimates to check model fit
+plotDispEsts(dds_APC)
+```
+
+We do see an inverse relationship between mean and dispersion (line slopes downward from left to right). This indicates that our data is a good fit for the model.
+
+**Generate results for the Pdgfr α+ APCs and save it to a variable called res_APC.** There is nothing to submit for this exercise, but please run the code as you will need res_APC for future exercises.
+
+```
+# Results of Wald test
+res_APC <- results(dds_APC, 
+                   contrast = contrast,
+                   alpha = 0.05)
+# Shrink the log2 fold changes to be more appropriate using the apeglm method
+res_APC <- lfcShrink(dds_APC,
+                     coef = "condition_cold7_vs_TN",
+                     res  = res_APC,
+                     type = "apeglm")
+```