forked from YuLab-SMU/biomedical-knowledge-mining-book
-
Notifications
You must be signed in to change notification settings - Fork 0
/
16_dplyr.Rmd
140 lines (97 loc) · 3.49 KB
/
16_dplyr.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
# (PART\*) Part III: Miscellaneous topics {-}
# dplyr verbs for manipulating enrichment result {#clusterProfiler-dplyr}
```{r include=F}
library(DOSE)
library(ggplot2)
library(forcats)
library(enrichplot)
theme_set(theme_grey())
select <- clusterProfiler:::select.enrichResult
```
```{r}
library(DOSE)
data(geneList)
de = names(geneList)[1:100]
x = enrichDO(de)
```
## filter
```{r}
filter(x, p.adjust < .05, qvalue < 0.2)
```
## arrange
```{r}
mutate(x, geneRatio = parse_ratio(GeneRatio)) %>%
arrange(desc(geneRatio))
```
## select
```{r}
select(x, -geneID) %>% head
```
## mutate
(ref:richfactorplotscap) Visualizing rich factor of enriched terms using lolliplot.
(ref:richfactorplotcap) **Visualizing rich factor of enriched terms using lolliplot.**
```{r dot-richfactor, fig.width=7, fig.height=7, fig.cap="(ref:richfactorplotcap)", fig.scap="(ref:richfactorplotscap)"}
# k/M
y <- mutate(x, richFactor = Count / as.numeric(sub("/\\d+", "", BgRatio)))
y
library(ggplot2)
library(forcats)
library(enrichplot)
ggplot(y, showCategory = 20,
aes(richFactor, fct_reorder(Description, richFactor))) +
geom_segment(aes(xend=0, yend = Description)) +
geom_point(aes(color=p.adjust, size = Count)) +
scale_color_viridis_c(guide=guide_colorbar(reverse=TRUE)) +
scale_size_continuous(range=c(2, 10)) +
theme_minimal() +
xlab("rich factor") +
ylab(NULL) +
ggtitle("Enriched Disease Ontology")
```
A very similar concept is Fold Enrichment, which is defined as the ratio of two proportions, (k/n) / (M/N). Using `mutate` to add the fold enrichment variable is also easy:
```r
mutate(x, FoldEnrichment = parse_ratio(GeneRatio) / parse_ratio(BgRatio))
```
## slice
We can use `slice` to choose rows by their ordinal position in the enrichment result. Grouped result use the ordinal position with the group.
In the following example, a GSEA result of Reactome pathway was sorted by the absolute values of NES and the result was grouped by the sign of NES. We then extracted first 5 rows of each groups. The result was displayed in Figure \@ref(fig:sliceNES).
(ref:sliceNESscap) Choose pathways by ordinal positions.
(ref:sliceNEScap) **Choose pathways by ordinal positions.**
```{r sliceNES, fig.width=9, fig.height=5, fig.cap="(ref:sliceNEScap)", fig.scap="(ref:sliceNESscap)"}
library(ReactomePA)
x <- gsePathway(geneList)
y <- arrange(x, abs(NES)) %>%
group_by(sign(NES)) %>%
slice(1:5)
library(forcats)
library(ggplot2)
library(ggstance)
library(enrichplot)
ggplot(y, aes(NES, fct_reorder(Description, NES), fill=qvalues), showCategory=10) +
geom_col(orientation='y') +
scale_fill_continuous(low='red', high='blue', guide=guide_colorbar(reverse=TRUE)) +
theme_minimal() + ylab(NULL)
```
## summarise
(ref:pBarplotscap) Distribution of adjusted p-values.
(ref:pBarplotcap) **Distribution of adjusted p-values.**
```{r p-bar, fig.width=8, fig.height=6, fig.cap="(ref:pBarplotcap)", fig.scap="(ref:pBarplotscap)"}
pbar <- function(x) {
pi=seq(0, 1, length.out=11)
mutate(x, pp = cut(p.adjust, pi)) %>%
group_by(pp) %>%
summarise(cnt = n()) %>%
ggplot(aes(pp, cnt)) + geom_col() +
theme_minimal() +
xlab("p value intervals") +
ylab("Frequency") +
ggtitle("p value distribution")
}
x <- enrichDO(de, pvalueCutoff=1, qvalueCutoff=1)
set.seed(2020-09-10)
random_genes <- sample(names(geneList), 100)
y <- enrichDO(random_genes, pvalueCutoff=1, qvalueCutoff=1)
p1 <- pbar(x)
p2 <- pbar(y)
cowplot::plot_grid(p1, p2, ncol=1, labels = LETTERS[1:2])
```