-
Notifications
You must be signed in to change notification settings - Fork 11
/
15-evaluate_subsampling.Rmd
89 lines (70 loc) · 2.78 KB
/
15-evaluate_subsampling.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
---
title: "Evaluating the recount2 subsampled models"
output: html_notebook
---
**J. Taroni 2018**
In `11-subsample_recount_PLIER.R`, we subsampled the recount2 dataset such that
it contained the same number of samples as the SLE WB compendium (`n = 1640`)
ten times.
We trained a PLIER model on each of the ten randomly selected datasets.
Here, we'll evaluate the ten models in the following ways:
* Sparsity of `U` (prior information coefficient matrix; proxy for "ease
of interpretation")
* Number of latent variables
* Pathway coverage (e.g., what percentage of pathways are associated with an
LV, how many LVs have a pathway significantly associated with them)
## Functions and directory set up
```{r}
`%>%` <- dplyr::`%>%`
source(file.path("util", "plier_util.R"))
```
```{r}
# plot and result directory setup for this notebook
plot.dir <- file.path("plots", "15")
dir.create(plot.dir, recursive = TRUE, showWarnings = FALSE)
results.dir <- file.path("results", "15")
dir.create(results.dir, recursive = TRUE, showWarnings = FALSE)
```
## Main evaluation
```{r}
# directory where the models RDS were saved
subsampled.dir <- file.path("results", "11")
# list files in the directory -- we'll us lapply to generate a list of list
plier.files <- list.files(subsampled.dir, full.names = TRUE)
```
```{r}
# read in models -- each of the files contains a list where PLIER corresponds
# to the PLIER model
model.list <- lapply(plier.files, function(x) readRDS(x)$PLIER)
names(model.list) <- sub(".RDS", "", sub(".*\\/", "", plier.files))
# evaluate models with wrapper function
eval.list <- lapply(model.list, EvalWrapper)
```
```{r}
# reshape list to data.frame for wrangling
eval.df <- reshape2::melt(eval.list)
colnames(eval.df) <- c("value", "pathway_coverage_type", "metric", "model")
# U sparsity -- we'll keep all and significant only in the same data.frame
sparsity.df <- eval.df %>%
dplyr::filter(metric %in% c("all.sparsity", "sig.sparsity")) %>%
dplyr::mutate(sparsity_type = metric) %>%
dplyr::select(c(model, sparsity_type, value))
# number of lvs
num.lvs.df <- eval.df %>%
dplyr::filter(metric == "num.lvs") %>%
dplyr::mutate(num_lvs = value) %>%
dplyr::select(c(model, num_lvs))
# pathway coverage
pathway.df <- eval.df %>%
dplyr::filter(metric == "pathway.coverage") %>%
dplyr::select(c(model, pathway_coverage_type, value))
```
```{r}
# write to file
sparsity.file <- file.path(results.dir, "subsampled_sparsity.tsv")
readr::write_tsv(sparsity.df, sparsity.file)
num.file <- file.path(results.dir, "subsampled_num_lvs.tsv")
readr::write_tsv(num.lvs.df, num.file)
pathway.file <- file.path(results.dir, "subsampled_pathway.tsv")
readr::write_tsv(pathway.df, pathway.file)
```