-
Notifications
You must be signed in to change notification settings - Fork 4
/
README.rmd
227 lines (162 loc) · 8.78 KB
/
README.rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
---
output: github_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
devtools::load_all()
```
# ErmineR <img src="ermineR.png" align="right" style="height:100px"/>
This is an R wrapper for Pavlidis Lab's [ermineJ](http://erminej.msl.ubc.ca/). A tool for gene set enrichment analysis with multifunctionality correction.
Table of Contents
=================
* [Installation](#installation)
* [Usage](#usage)
* [Replicable go terms](#replicable-go-terms)
* [Annotations](#annotations)
* [Examples](#examples)
* [Use GSR with gene scores](#use-gsr-with-gene-scores)
* [Use Precision Recall with gene scores](#use-precision-recall-with-gene-scores)
* [Use ORA with a hitlist](#use-ora-with-a-hitlist)
* [Using your own GO annotations](#using-your-own-go-annotations)
## Installation
ermineR requries 64 bit version of java to function. If you are a Mac user make sure you have the java SDK.
After java is installed you can install ermineR by doing
```r
devtools::install_github('PavlidisLab/ermineR')
```
If ermineR cannot find your java home by itself. Either install rJava or use `Sys.setenv(JAVA_HOME=javaHome)` to point
ermineR to the right path.
Some users report that the ermineJ executable loses its exection privilage after installation. If this happens you will get an error like
```
"Error in (function (annotation = NULL, aspects = c("Molecular Function", :
Something went wrong. Blame the dev
sh: [library installation path]/ermineR/ermineJ-3.1.2/bin/ermineJ.sh: Permission denied "
```
To fix this just do
```
chmod +x [library installation path]/ermineR/ermineJ-3.1.2/bin/ermineJ.sh
```
You may need `sudo` depending on where you install your packages
## Usage
See documentation for `ora`, `roc`, `gsr`, `precRecall` and `corr` to see how to use them.
An explanation of what each method does is given. We recommend users start with the `precRecall` (for gene ranking-based enrichment analysis) or `ora` (for hit-list over-representation analysis).
### Replicable go terms
GO terms are updated frequently so results [can differ between versions](https://gotrack.msl.ubc.ca/). The default
option of all ermineR functions is to get the latest GO version however this means
you may get different results when you repeat the experiment later. If you want
to use a specific version of GO, ermineR provides functions to deal with that.
* `goToday`: Downloads the latest version of go to a path you provide
* `getGoDates`: Lists all dates where a go version is available, from the most recent to oldest
* `goAtDate`: Given a valid date, downloads the Go version from a specific date to a file path you provide
To use a specific version of GO, make sure to set `geneSetDescription` argument
of all ermineR functions to the file path where you saved the go terms
### Annotations
ErmineR requires annotation files to work. These files include gene identifiers
and their Go annotations, along with some optional information. By default, ermineR supports annotation files generated by [Gemma](https://gemma.msl.ubc.ca/home.html).
And will automatically download them if you provide a valid annotation name. You can get
a list of valid annotation names using `listGemmaAnnotations()`. As a general rule,
if your platform has an identifier in GEO, the identifier that starts with "GPL" is used as
the Gemma identifier as well. There are also generic annotation files available that
contain all genes from a species. These are typically named something like "Generic_human".
You can manually download these annotation files from https://gemma.msl.ubc.ca/annots/
or by using the `gemma.R::get_platform_annotations` function. ErmineR typically uses "noParents" versions
of these files since parent terms are derived using the ontology file acquired from
GO.
### Examples
#### Use GSR with gene scores
Here we will use a mock scores file located in our tests directory. The score file is specifically created to be enriched in genes with the term GO:0051082.
```{r}
scores = read.table("tests/testthat/testFiles/pValues", header=T, row.names = 1)
head(scores)
```
This scores file only includes scores for 118 genes. The file was generated using GPL96's probesets so that is the annotation
we'll be using. Any gene that is not reperesented by the score file will be ignored.
```{r}
gsrOut = gsr(annotation = 'GPL96',
scores = scores,
scoreColumn = 1,
iterations = 10000,
bigIsBetter = FALSE,
logTrans = TRUE)
head(gsrOut$results) %>% knitr::kable()
```
#### Use Precision Recall with gene scores
We will use the same scores file from the example above
```{r}
precRecallOut = precRecall(annotation = 'GPL96',
scores = scores,
scoreColumn = 1,
iterations = 10000,
bigIsBetter = FALSE,
logTrans = TRUE)
head(precRecallOut$results) %>% knitr::kable()
```
#### Use ORA with a hitlist
```{r,message=FALSE}
library(dplyr)
# genes for GO:0051082
hitlist = c("AAMP", "AFG3L2", "AHSP", "AIP", "AIPL1", "APCS", "BBS12",
"CALR", "CALR3", "CANX", "CCDC115", "CCT2", "CCT3", "CCT4", "CCT5",
"CCT6A", "CCT6B", "CCT7", "CCT8", "CCT8L1P", "CCT8L2", "CDC37",
"CDC37L1", "CHAF1A", "CHAF1B", "CLGN", "CLN3", "CLPX", "CRYAA",
"CRYAB", "DNAJA1", "DNAJA2", "DNAJA3", "DNAJA4", "DNAJB1", "DNAJB11",
"DNAJB13", "DNAJB2", "DNAJB4", "DNAJB5", "DNAJB6", "DNAJB8",
"DNAJC4", "DZIP3", "ERLEC1", "ERO1B", "FYCO1", "GRPEL1", "GRPEL2",
"GRXCR2", "HEATR3", "HSP90AA1", "HSP90AA2P", "HSP90AA4P", "HSP90AA5P",
"HSP90AB1", "HSP90AB2P", "HSP90AB3P", "HSP90AB4P", "HSP90B1",
"HSP90B2P", "HSPA1A", "HSPA1B", "HSPA1L", "HSPA2", "HSPA5", "HSPA6",
"HSPA8", "HSPA9", "HSPB6", "HSPD1", "HSPE1", "HTRA2", "LMAN1",
"MDN1", "MKKS", "NAP1L4", "NDUFAF1", "NPM1", "NUDC", "NUDCD2",
"NUDCD3", "PDRG1", "PET100", "PFDN1", "PFDN2", "PFDN4", "PFDN5",
"PFDN6", "PIKFYVE", "PPIA", "PPIB", "PTGES3", "RP2", "RUVBL2",
"SCAP", "SCG5", "SERPINH1", "SHQ1", "SIL1", "SPG7", "SRSF10",
"SRSF12", "ST13", "SYVN1", "TAPBP", "TCP1", "TMEM67", "TOMM20",
"TOR1A", "TRAP1", "TTC1", "TUBB4B", "UGGT1", "ZFYVE21")
oraOut = ora(annotation = 'Generic_human',
hitlist = hitlist)
head(oraOut$results) %>% knitr::kable()
```
#### Using your own GO annotations
If you want to use your own GO annotations instead of getting files provided by
Pavlidis Lab, you can use `makeAnnotation` after turning your annotations into
a list. See the example below
```{r, message = FALSE}
library('org.Hs.eg.db') # get go terms from bioconductor
goAnnots = as.list(org.Hs.egGO)
goAnnots = goAnnots %>% lapply(names)
goAnnots %>% head
```
The goAnnots object we created has go terms per entrez ID. Similar lists can be
obtained from other species db packages in bioconductor and some array annotation
packages. We will now use the `makeAnnotation` function to create our annotation
file. This file will have the names of this list (entrez IDs) as gene identifiers
so any score or hitlist file you provide should have the entrez IDs as well.
`makeAnnotation` only needs the list with gene identifiers and go terms to work.
But if you want to have a complete annotation file you can also provide gene symbols
and gene names. Gene names have no effect on the analysis. Gene symbols matter if
you are [providing custom gene sets](http://erminej.msl.ubc.ca/help/input-files/gene-sets/)
and using "Option 2" or if same genes are represented by multiple gene
identifiers (eg. probes). Gene symbols will also be returned in the `GeneMembers`
column of the output. If they are not provided, gene IDs will also be used as gene symbols
Here we'll set them both for good measure.
```{r}
geneSymbols = as.list(org.Hs.egSYMBOL) %>% unlist
geneName = as.list(org.Hs.egGENENAME) %>% unlist
annotation = makeAnnotation(goAnnots,
symbol = geneSymbols,
name = geneName,
output = NULL, # you can choose to save the annotation to a file
return = TRUE) # if you only want to save it to a file, you don't need to return
```
Now that we have the annotation object, we can use it to run an analysis. We'll try
to generate a hitlist only composed of genes annotated with GO:0051082.
```{r}
mockHitlist = goAnnots %>% sapply(function(x){'GO:0051082' %in% x}) %>%
{goAnnots[.]} %>%
names
mockHitlist %>% head
oraOut = ora(annotation = annotation,
hitlist = mockHitlist)
head(oraOut$results) %>% knitr::kable()
```
We can see GO:0051082 is the top scoring hit as expected.