forked from YuLab-SMU/biomedical-knowledge-mining-book
-
Notifications
You must be signed in to change notification settings - Fork 0
/
07_KEGG.Rmd
127 lines (77 loc) · 4.78 KB
/
07_KEGG.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
# KEGG enrichment analysis {#clusterprofiler-kegg}
```{r include=FALSE}
library(knitr)
opts_chunk$set(message=FALSE, warning=FALSE, eval=TRUE, echo=TRUE, cache=TRUE)
library(clusterProfiler)
```
The KEGG FTP service is not freely available for academic use since 2012, and there are many software packages using out-dated KEGG annotation data. The `r Biocpkg("clusterProfiler")` package supports downloading the latest online version of KEGG data using the [KEGG website](https://www.kegg.jp), which is freely available for academic users. Both the KEGG pathway and module are supported in `r Biocpkg("clusterProfiler")`.
## Supported organisms {#clusterProfiler-kegg-supported-organisms}
The `r Biocpkg("clusterProfiler")` package supports all organisms that have KEGG annotation data available in the KEGG database. Users should pass an abbreviation of academic name to the `organism` parameter. The full list of KEGG supported organisms can be accessed via <http://www.genome.jp/kegg/catalog/org_list.html>. [KEGG Orthology](https://www.genome.jp/kegg/ko.html) (KO) Database is also supported by specifying `organism = "ko"`.
The `r Biocpkg("clusterProfiler")` package provides `search_kegg_organism()` function to help searching supported organisms.
```{r}
library(clusterProfiler)
search_kegg_organism('ece', by='kegg_code')
ecoli <- search_kegg_organism('Escherichia coli', by='scientific_name')
dim(ecoli)
head(ecoli)
```
## KEGG pathway over-representation analysis {#clusterprofiler-kegg-pathway-ora}
```{r}
data(geneList, package="DOSE")
gene <- names(geneList)[abs(geneList) > 2]
kk <- enrichKEGG(gene = gene,
organism = 'hsa',
pvalueCutoff = 0.05)
head(kk)
```
Input ID type can be `kegg`, `ncbi-geneid`, `ncbi-proteinid` or `uniprot` (see also [session 16.1.2](#bitr_kegg)). Unlike `enrichGO()`, there is no `readable` parameter for `enrichKEGG()`. However, users can use the [`setReadable()` function](#setReadable) if there is an `OrgDb` available for the species.
## KEGG pathway gene set enrichment analysis {#clusterprofiler-kegg-pathway-gsea}
```{r}
kk2 <- gseKEGG(geneList = geneList,
organism = 'hsa',
minGSSize = 120,
pvalueCutoff = 0.05,
verbose = FALSE)
head(kk2)
```
## KEGG module over-representation analysis {#clusterprofiler-kegg-module-ora}
[KEGG Module](http://www.genome.jp/kegg/module.html) is a collection of manually defined function units. In some situation, KEGG Modules have a more straightforward interpretation.
```{r}
mkk <- enrichMKEGG(gene = gene,
organism = 'hsa',
pvalueCutoff = 1,
qvalueCutoff = 1)
head(mkk)
```
## KEGG module gene set enrichment analysis {#clusterprofiler-kegg-module-gsea}
```{r}
mkk2 <- gseMKEGG(geneList = geneList,
organism = 'hsa',
pvalueCutoff = 1)
head(mkk2)
```
## Visualize enriched KEGG pathways
The `r Biocpkg("enrichplot")` package implements [several methods](#enrichplot) to visualize enriched terms. Most of them are general methods that can be used on GO, KEGG, MSigDb, and other gene set annotations. Here, we introduce the `clusterProfiler::browseKEGG()` and `pathview::pathview()` functions to help users explore enriched KEGG pathways with genes of interest.
To view the KEGG pathway, users can use the `browseKEGG` function, which will open a web browser and highlight enriched genes.
```{r eval=FALSE}
browseKEGG(kk, 'hsa04110')
```
(ref:browseKEGGscap) Explore selected KEGG pathway.
(ref:browseKEGGcap) **Explore selected KEGG pathway.** Differentially expressed genes that are enriched in the selected pathway will be highlighted.
```{r browseKEGG, out.width="100%", echo=FALSE, fig.cap="(ref:browseKEGGcap)", fig.scap="(ref:browseKEGGscap)"}
knitr::include_graphics("figures/browseKEGG.png")
```
Users can also use the `pathview()` function from the `r Biocpkg("pathview")` [@luo_pathview] to visualize enriched KEGG pathways identified by the `r Biocpkg("clusterProfiler")` package [@yu2012].
The following example illustrates how to visualize the "hsa04110" pathway, which was enriched in our previous analysis.
```{r eval=FALSE}
library("pathview")
hsa04110 <- pathview(gene.data = geneList,
pathway.id = "hsa04110",
species = "hsa",
limit = list(gene=max(abs(geneList)), cpd=1))
```
(ref:pathviewscap) Visualze selected KEGG pathway by `pathview()`.
(ref:pathviewcap) **Visualize selected KEGG pathway by `pathview()`.** Gene expression values can be mapped to gradient color scale.
```{r pathview, out.width="100%", echo=FALSE, fig.cap="(ref:pathviewcap)", fig.scap="(ref:pathviewscap)"}
knitr::include_graphics("figures/hsa04110_pathview.png")
```