-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathREADME.Rmd
141 lines (99 loc) · 4.27 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%",
error = TRUE
)
```
# aPPR
<!-- badges: start -->
[![R-CMD-check](https://github.com/RoheLab/aPPR/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/RoheLab/aPPR/actions/workflows/R-CMD-check.yaml)
[![Codecov test coverage](https://codecov.io/gh/RoheLab/aPPR/branch/main/graph/badge.svg)](https://app.codecov.io/gh/RoheLab/aPPR?branch=main)
<!-- badges: end -->
`aPPR` helps you calculate approximate personalized pageranks from large graphs, including those that can only be queried via an API. `aPPR` additionally performs degree correction and regularization, allowing you to recover blocks from stochastic blockmodels.
To learn more about `aPPR` you can:
1. Glance through slides from the [JSM2021](https://github.com/alexpghayes/JSM2021) talk
2. Read the accompanying [paper][chen]
### Installation
You can install the development version from [GitHub](https://github.com/) with:
``` r
# install.packages("devtools")
devtools::install_github("RoheLab/aPPR")
```
### Find the personalized pagerank of a node in an `igraph` graph
```{r igraph-example, message = FALSE}
library(aPPR)
library(igraph)
set.seed(27)
erdos_renyi_graph <- sample_gnp(n = 100, p = 0.5)
erdos_tracker <- appr(
erdos_renyi_graph, # the graph to work with
seeds = "5", # name of seed node (character)
epsilon = 0.0005 # desired approximation quality (see ?appr)
)
erdos_tracker
```
You can access the Personalized PageRanks themselves via the `stats` field of `Tracker` objects.
```{r}
erdos_tracker$stats
```
Sometimes you may wish to limit computation time by limiting the number of nodes to visit, which you can do as follows:
```{r igraph-example2}
limited_visits_tracker <- appr(
erdos_renyi_graph,
seeds = "5",
epsilon = 1e-10,
max_visits = 20 # max unique nodes to visit during approximation
)
limited_visits_tracker
```
### Find the personalized pagerank of a Twitter user using `rtweet`
```{r rtweet-example}
ftrevorc_ppr <- appr(
rtweet_graph(),
"ftrevorc",
epsilon = 1e-4,
max_visits = 5
)
ftrevorc_ppr
```
### Logging
`aPPR` uses [`logger`](https://daroczig.github.io/logger/) for displaying information to the user. By default, `aPPR` is quite verbose. You can control verbosity by loading `logger` and setting the logging threshold.
```{r logging-example-1, eval = FALSE}
library(logger)
# hide basically all messages (not recommended)
log_threshold(FATAL, namespace = "aPPR")
appr(
erdos_renyi_graph, # the graph to work with
seeds = "5", # name of seed node (character)
epsilon = 0.0005 # desired approximation quality (see ?appr)
)
```
If you submit a bug report, please please please include a log file using the TRACE threshold. You can set up this kind of detailed logging via the following:
```{r log-file-example, eval = FALSE}
set.seed(528491) # be sure to set seed for bug reports
log_appender(
appender_file(
"/path/to/logfile.log" ## TODO: choose a path to log to
),
namespace = "aPPR"
)
log_threshold(TRACE, namespace = "aPPR")
tracker <- appr(
rtweet_graph(),
seed = c("hadleywickham", "gvanrossum"),
epsilon = 1e-6
)
```
### Ethical considerations
People have a right to choose how public and discoverable their information is. `aPPR` will often lead you to accounts that interesting, but also small and out of sight. Do not change the public profile or attention towards these the people running these accounts, or any other accounts, without their permission.
### References
1. Chen, Fan, Yini Zhang, and Karl Rohe. “Targeted Sampling from Massive Block Model Graphs with Personalized PageRank.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 82, no. 1 (February 2020): 99–126. https://doi.org/10.1111/rssb.12349. [arxiv][chen]
2. Andersen, Reid, Fan Chung, and Kevin Lang. “Local Graph Partitioning Using PageRank Vectors.” In 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS’06), 475–86. Berkeley, CA, USA: IEEE, 2006. https://doi.org/10.1109/FOCS.2006.44.
[chen]: https://arxiv.org/abs/1910.12937