forked from geek-lunch/workshops
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Visualizing_extreme_data.Rmd
143 lines (102 loc) · 4.42 KB
/
Visualizing_extreme_data.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
---
title: "Sweet insets for extreme data"
author: "Florent Déry"
date: "26/02/2020"
output: pdf_document
---
Sometimes, we have rare extreme data in a dataset that are informative, but enlarge so much the scale of the response variable we are looking at that we can't see nothing among non-extreme variables.
There is some possibilities when using `base` functions to generate figures (e.g. breaks in axis: through `R` package `plotrix`), but `ggplot2` fans are left with nothing except anger and resentment. However, some turn-arounds exist, such as including insets showing the non-extreme variables in a plot that contains all values.
This mini-tutorial shoes how to do this with `ggplot2` figures combined to `grid` package, often already installed by default on `R`. Make sure you have both packages to run what follows.
## 1. Generate fake data with extreme values
![Me in X-Country ski]("/figures/fast_learner_fast_skier.png")
This dataset (`df`) is fictive and represent the number of falls/hour (`y`) you take depending on your experience in Cross-Country skiing (`x` ; weeks). I hope that everybody was or will be better than that when they first try it.
```{r generate_data, message=FALSE, warning=FALSE, echo=TRUE}
df=data.frame(
y=c(
sample(c(0:400), 100, replace=T),
sample(c(3:7), 20, replace=T),
sample(c(1:4), 20, replace=T),
sample(c(2:3), 20, replace=T),
sample(c(0:2), 20, replace=T),
sample(c(0:1), 20, replace=T)
),
x=c(
rep(0, times=100),
rep(1, times=20),
rep(2, times=20),
rep(3, times=20),
rep(4, times=20),
rep(5, times=20)
)
)
```
## 2. Generate the base figure
```{r load_packages, message=FALSE, warning=FALSE}
require(ggplot2)
```
```{r plot_main, echo=TRUE, warning=FALSE}
mainplot <- ggplot(data = df, aes(y = y, x = x) ) +
geom_jitter(width = 0.1, height = 0) +
geom_smooth()+
geom_rect(data=df,aes(xmin=0.8,
xmax = 5.2,
ymin = -5,
ymax = 12), color = "black", fill=NA,
linetype = 2)+
xlab('Time since you first X-country skied (weeks)')+
ylab('Number of falls per hour of X-country skiing')+
theme_bw()+
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
axis.line = element_line(colour="black", linetype="solid"),
panel.background = element_rect(fill = "white",colour = "white", size = 0.5, linetype = "solid"),
panel.border = element_blank())
mainplot
```
From that, We don't see if there is an amelioration or not after your first trial, Although we are still amazed that you kept trying after your first time! Ouh... You definitely know how to get back up.
## 3. Generate the inset figure
```{r plot_inset, eval=FALSE, echo=TRUE, warning=FALSE}
inset=ggplot(data=df[df$x>0,],
aes(x=x, y=y)) +
geom_jitter(width = 0.05, height = 0) +
geom_smooth(method = lm, se = F)+
xlab('')+
ylab('')+
theme_bw()
```
Ok, So now we have a better Idea! Let's combine both!
## Adding an inset ggplot figure to another main ggplot figure using `grid`!
```{r combine_inset_main_plots, eval=FALSE, echo=TRUE, warning=FALSE}
png(file="fast_learner_fast_skier.png",
w=7,
h=7,
res=720,
units = 'in')# Where we save our figure as a PNG
grid::grid.newpage()
v1<-grid::viewport(width = 1, height = 1, x = 0.5, y = 0.5) # Viewport 1 (plot area for the main plot)
v2<-grid::viewport(width = 0.6, height = 0.5, x = 0.65, y = 0.71) # Viewport 2 (plot area for the inset plot)
print(mainplot,vp=v1)
print(inset,vp=v2)
dev.off()#clear device and save figure
```
![Ski learning curve]("/figures/fast_learner_fast_skier.png")
## Adding an inset ggplot figure to another main ggplot figure using `cowplot`
And even adding arrows and rectangle!
```{r combine_with_cowplot, eval=FALSE, echo=TRUE}
require(cowplot)
combined=ggdraw()+
draw_plot(mainplot, x=0,y=0,height=1, width = 1)+
draw_plot(inset, x=0.35,y=.45, height = 0.5, width=0.6)+
draw_line(x= c(.26, 0.4),
y= c(0.18, 0.52),
linetype = 2,
arrow = arrow(length = unit( .01, "npc"), type = "closed")
)+
draw_line(x=c(0.95,0.94),
y=c(0.18,.52),
linetype = 2,
arrow = arrow(length = unit( .01, "npc"), type = "closed"))
ggsave(combined, filename ="fast_learner_fast_skier2.png",
width = 7,
height = 7) #alternate way to save it
```