-
Notifications
You must be signed in to change notification settings - Fork 685
/
Copy pathext-springs.qmd
1600 lines (1309 loc) · 64.3 KB
/
ext-springs.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
# A case study {#sec-spring1}
```{r}
#| echo: false
#| message: false
#| results: asis
source("common.R")
status("polishing")
```
```{r}
#| include: false
library(grid)
```
@sec-extensions provided a high-level overview of creating ggplot2 extensions, building on the discussion of ggplot2 internals in @sec-internals.
In this chapter we'll take that knowledge a step further, providing a deeper dive into the process of developing full-featured extensions.
To do this, we'll take a single example --- building a new geom that looks like a spring --- and follow it all the way through the development process.
This is a carefully crafted example.
You're unlikely to actually want to use springs to visualise your data, so ggplot2 doesn't already provide a spring geom.
However, springs are just complicated enough to illustrate the most important parts of the process.
This makes them ideal for our purposes!
We'll develop the extension in five phases:
1. We'll start as simple as possible by using the existing `geom_path()`, pairing it with a new stat (@sec-spring-stat).
2. The new stat only allows fixed diameter and tension, so we'll next allow these to be used as aesthetics (@sec-spring2).
3. A stat is a great place to start but has some fundamental restrictions so we'll next convert our work to a proper geom (@sec-spring3).
4. Geoms can only use dimensions relative to the data, and can't use absolute sizes like 2cm, so next we'll show you how to draw the spring with grid (@sec-spring4).
5. We'll finish up by providing a custom scale and legend to pair with the geom (@sec-spring5).
Once you've worked your way through this chapter, we highly recommend browsing the ggplot2 source code to look at how other stats and geoms are implemented.
They will often be more complicated than what you need, but they'll give you a sense of the things you can do.
## What is a spring? {#sec-intuitive-spring}
Developing an extension usually starts with an idea of what you want to draw.
In this case, we want to draw a spring between two points, so we need some code that will draw a plausible-looking spring!
There are probably many ways to do this, but one simple approach is to draw a circle while moving the "pen" in one direction.
Here's a data set that defines a circle using 100 points:
```{r}
circle <- tibble(
radians = seq(0, 2 * pi, length.out = 100),
x = cos(radians),
y = sin(radians),
index = 1:100,
type = "circle"
)
ggplot(circle, aes(x = x, y = y, alpha = -index)) +
geom_path(show.legend = FALSE) +
coord_equal()
```
To transform this circle into a spring that stretches along the x axis using dplyr, we might do something like this:
```{r}
spring <- circle %>%
mutate(
motion = seq(0, 1, length.out = 100),
x = x + motion,
type = "spring"
)
ggplot(spring, aes(x = x, y = y, alpha = -index)) +
geom_path(show.legend = FALSE) +
coord_equal()
```
In this case our "spring" has only looped around once -- and doesn't look much like an actual spring -- but if we were to continue tracing the circle while moving along the x axis we'd end up with a spring with multiple loops.
The faster we move the "pen", the more we will stretch the spring.
This gives us some insight into the two parameters that characterise our springs:
- The `diameter` of the spring, defined by the size of the circle.
- The `tension` of the spring, defined by how fast we move along `x`.
Although this is probably not a physically correct parameterisation of springs in the real world, it's good enough for our purposes.
Now that we have a method for drawing springs, it's worth spending a little time thinking about what a geom based on this method will require.
The code we've written up to this point is perfectly fine for a single plot, but there are new questions to consider when creating an extension:
- How will we specify the diameter of a spring?
- How do we keep the circles circular even as we change the aspect ratio of the plot?
- Can we map diameter and tension to variables in the data?
- Should diameter and tension be parameters that must be the same for all springs in a layer or should they be scaled aesthetics that can vary from spring to spring?
- If we plan to distribute our spring geom to other R users, do we want to depend on the dplyr package?
We'll consider these questions as we work through the chapter.
## Part 1: A stat {#sec-spring-stat}
Let's start turning this idea into a ggplot2 extension.
Because we're creating an extension that draws a new ggplot2 layer, we need to decide whether the ggproto object we create should be a `Stat` or a `Geom`.
Perhaps surprisingly, this decision isn't guided by whether we want to end up with `geom_spring()` or `stat_spring()`: there are a lot of `Stat` extensions that are used via a `geom_*()` constructor.
A better way to think about this decision is to consider if we can use an existing geom with transformed data.
In that case we can use a `Stat` which is usually simpler to code than a `Geom`.
The code we wrote in the last section fits this description nicely.
All we're doing is drawing a path, but we're circling around instead of going in a straight line.
That suggests we can use a `Stat` that transforms the data and then uses `GeomPath` to take care of the actual drawing.
### Building functionality
Whenever you are developing a new `Stat` a sensible strategy is to begin by writing the data transformation function, and then once it's working incorporating it into a ggproto `Stat`.
In this case, we're going to need a `create_spring()` function that takes a start point, an end point, a diameter, and a tension.
More precisely:
- Our start point will be defined by arguments `x` and `y`.
- Our end point will be defined by arguments `xend` and `yend`.
- The `diameter` argument will be used to scale the size of our circle.
- Defining `tension` is slightly trickier. The quantity we actually want to express is "how far the spring moves relative to the size of the circles". So we'll define `tension` to refer to the total distance moved from the start point to the end point, divided by the size of the circles.[^ext-springs-1]
- We'll also have a parameter `n` to give the number of points used per revolution, defining the visual fidelity of the spring.
[^ext-springs-1]: If you have a background in statistics, you'll recognise this as roughly analogous to how a z-statistic is calculated.
We can now write code for our `create_spring()` function:
```{r}
create_spring <- function(x,
y,
xend,
yend,
diameter = 1,
tension = 0.75,
n = 50) {
# Validate the input arguments
if (tension <= 0) {
rlang::abort("`tension` must be larger than zero.")
}
if (diameter == 0) {
rlang::abort("`diameter` can not be zero.")
}
if (n == 0) {
rlang::abort("`n` must be greater than zero.")
}
# Calculate the direct length of the spring path
length <- sqrt((x - xend)^2 + (y - yend)^2)
# Calculate the number of revolutions and points we need
n_revolutions <- length / (diameter * tension)
n_points <- n * n_revolutions
# Calculate the sequence of radians and the x and y offset values
radians <- seq(0, n_revolutions * 2 * pi, length.out = n_points)
x <- seq(x, xend, length.out = n_points)
y <- seq(y, yend, length.out = n_points)
# Create and return the transformed data frame
data.frame(
x = cos(radians) * diameter/2 + x,
y = sin(radians) * diameter/2 + y
)
}
```
This function preserves the logic of the spring code we wrote in @sec-intuitive-spring, but it does a few new things that matter a lot when writing extensions:
- It is precise in specifying the parameters that define the spring.
- It explicitly checks the input, and uses `rlang::abort()` to throw an error if the user passes an invalid value to the function.
- It uses base R functions to do the work: there is no dplyr code in this function because we don't want our `Stat` to depend on dplyr.[^ext-springs-2]
[^ext-springs-2]: If you have experience developing packages you might wonder about the choice to use `rlang::abort()` rather than using the base `stop()` function.
We could certainly have chosen to use the base R function here, but since ggplot2 itself uses the rlang package it makes very little difference in this case.
One nice thing about writing `create_spring()` as a function is that we can test it out[^ext-springs-3] to convince ourselves that the logic works:
[^ext-springs-3]: If we were planning to bundle this code as an R package, we could expand on this and write formal unit tests for `create_spring()` using the testthat package.
```{r}
spring <- create_spring(
x = 4, y = 2, xend = 10, yend = 6,
diameter = 2, tension = 0.6, n = 50
)
ggplot(spring) +
geom_path(aes(x = x, y = y)) +
coord_equal()
```
### Creating the stat
Now that we have our transformation function, our next task is to encapsulate it into a `Stat`.
To do this, we'll take what we learned about creating `Stat` objects in @sec-new-stats and extend it a little.
Our first step is to write some code that creates a subclass of `Stat` that we'll call `StatSpring`:
```{r}
#| eval: false
StatSpring <- ggproto("StatSpring", Stat)
```
This creates a new `Stat` subclass named `StatSpring`.
This class doesn't do anything interesting at this point: the only thing this code does so far is give the class a name.[^ext-springs-4]
To make this useful, we'll need to specify the methods that will build in the functionality we desire.
In @sec-new-stats we created a `Stat` by overriding the default method `compute_group()` and the default field for `required_aes`,[^ext-springs-5] but `Stat` objects have many properties you can modify
. If we print the `Stat` object, we can see a list of those properties
:
[^ext-springs-4]: By convention ggproto classes always use CamelCase for naming, and the new class is always saved into a variable with the same name.
[^ext-springs-5]: As we mentioned earlier, ggproto doesn't make a strong distinction between methods and fields.
`Stat` objects expect `compute_group()` to be a function, so we refer to `compute_group()` as a method because that is the standard terminology in object oriented programming.
In contrast, `Stat` expects `required_aes` to be a variable, so we call it a field.
```{r}
Stat
```
You can modify almost any of these: the only ones that you shouldn't touch are `aesthetics` and `parameters`, which are intended for internal use only.
For our `StatSpring` example, the three methods/fields that we'll need to specify are `setup_data()`, `compute_panel()`, and `required_aes`.
We'll go through this in more detail in the next section, but to help you see what we're aiming for, here's the complete code for our stat:
```{r}
StatSpring <- ggproto("StatSpring", Stat,
# Edit the input data to ensure the group identifiers are unique
setup_data = function(data, params) {
if (anyDuplicated(data$group)) {
data$group <- paste(data$group, seq_len(nrow(data)), sep = "-")
}
data
},
# Construct data for this panel by calling create_spring()
compute_panel = function(data,
scales,
diameter = 1,
tension = 0.75,
n = 50) {
cols_to_keep <- setdiff(names(data), c("x", "y", "xend", "yend"))
springs <- lapply(
seq_len(nrow(data)),
function(i) {
spring_path <- create_spring(
data$x[i],
data$y[i],
data$xend[i],
data$yend[i],
diameter = diameter,
tension = tension,
n = n
)
cbind(spring_path, unclass(data[i, cols_to_keep]))
}
)
do.call(rbind, springs)
},
# Specify which aesthetics are required input
required_aes = c("x", "y", "xend", "yend")
)
```
We can print any of these methods with a command such as `StatSpring$compute_panel` or `StatSpring$setup_data`.
### Methods
Let's take a closer look at the methods defined for our `StatSpring`.
As discussed in @sec-new-stats the most important methods for a stat are the three `compute_*` methods.
One of these must always be defined, usually `compute_group()` or `compute_panel()`.
As a rule of thumb, if the stat operates on multiple rows we start by implementing a `compute_group()` method, and if the stat operates on single rows we implement a `compute_panel()` method.
Our spring stat is the latter kind: each spring is defined by a single row of the original data, so we'll use the `compute_panel()` method which receives all the data for a single panel.
As you can see by looking at source code for our `compute_panel()` method, we're doing a bit more than simply calling our `create_spring()` function:
```{r}
#| results: hide
function(data, scales, diameter = 1, tension = 0.75, n = 50) {
cols_to_keep <- setdiff(names(data), c("x", "y", "xend", "yend"))
springs <- lapply(
seq_len(nrow(data)),
function(i) {
spring_path <- create_spring(
data$x[i],
data$y[i],
data$xend[i],
data$yend[i],
diameter = diameter,
tension = tension,
n = n
)
cbind(spring_path, unclass(data[i, cols_to_keep]))
}
)
do.call(rbind, springs)
}
```
We use `lapply()` to loop over each row of the data and create the points required to draw the corresponding spring.
For each such spring, we use `cbind()` to combine the spring data with all the non-position columns of the input row.
This is very important, since otherwise the aesthetic mappings to e.g. color and size would be lost.
Finally, because the output of `lapply()` is a list of data frames (one per spring), we use `rbind()` to combine these into a single data frame that gets returned.
When defining a new stat, it is very common to specify one or both of the `setup_data()` and `setup_params()` methods.
These methods are called at the very beginning of the plot building process, so you can use them to do early checks and modifications of the parameters and data.
For our `StatSpring` example, we use the `setup_data()` method to ensure that each input row has a unique group aesthetic.
This is important because we're going to draw our springs with `GeomPath`, and we need to make sure that the data frame output by the stat has a unique identifier for each spring.
Doing so ensures that the geom draws each spring as a distinct path, and won't draw any connecting lines between different springs.
Again, there are some subtle details to call attention to in the implementation:
```{r}
#| results: hide
function(data, params) {
if (anyDuplicated(data$group)) {
data$group <- paste(data$group, seq_len(nrow(data)), sep = "-")
}
data
}
```
Notice that this implementation preserves the original value of `data$group`, appending a unique id if needed.
This is important because the group aesthetic is sometimes used to carry metadata, and we don't want to lose that information.
The final part of our new class is the `required_aes` field.
This is a character vector that gives the names of aesthetics that the user *must* provide to the stat.
In this case, we need to make sure the user specifies four position aesthetics: x and y define where the spring starts, while xend and yend define where it ends.
The `required_aes` field, along with `default_aes` and `non_missing_aes`, also defines the aesthetics that this stat understands.
Any aesthetics that don't appear in these fields (or in the fields of the corresponding geom) will generate a warning and the mapping will be ignored.
### Constructors
Now that we have our `StatSpring` ggproto object, it's time to write constructor functions that the user will interact with.
Strictly speaking we don't need to do this, because `geom_path(stat = "spring")` will already work, but it's good practice to write constructor functions for the convenience of your users.
In addition, the constructor function provides a good place to document the new functionality.
Perhaps surprisingly, stat objects are almost always paired with a `geom_*()` constructor because most ggplot2 users are accustomed to adding geoms, not stats, when building up a plot.
The constructor itself is mostly boilerplate code that wraps a call to `layer()`; just take care to match the argument order and naming used in the ggplot2's constructors so you don't surprise your users.
```{r}
geom_spring <- function(mapping = NULL,
data = NULL,
stat = "spring",
position = "identity",
...,
diameter = 1,
tension = 0.75,
n = 50,
arrow = NULL,
lineend = "butt",
linejoin = "round",
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE
) {
layer(
data = data,
mapping = mapping,
stat = stat,
geom = GeomPath,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(
diameter = diameter,
tension = tension,
n = n,
arrow = arrow,
lineend = lineend,
linejoin = linejoin,
na.rm = na.rm,
...
)
)
}
```
For the sake of completeness you should also create a `stat_*()` constructor function.
There are no surprises here: `stat_spring()` is very similar to `geom_spring()` except that it provides a default geom instead of a default stat.
```{r}
stat_spring <- function(mapping = NULL,
data = NULL,
geom = "path",
position = "identity",
...,
diameter = 1,
tension = 0.75,
n = 50,
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE) {
layer(
data = data,
mapping = mapping,
stat = StatSpring,
geom = geom,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(
diameter = diameter,
tension = tension,
n = n,
na.rm = na.rm,
...
)
)
}
```
### Testing the stat
Now that everything is in place, we can test out our new layer:
```{r}
df <- tibble(
x = runif(5, max = 10),
y = runif(5, max = 10),
xend = runif(5, max = 10),
yend = runif(5, max = 10),
class = sample(letters[1:2], 5, replace = TRUE)
)
ggplot(df) +
geom_spring(aes(x = x, y = y, xend = xend, yend = yend)) +
coord_equal()
```
This looks pretty good.
Users can call our `geom_spring()` constructor function and get sensible results.
Better yet, because we've written a new stat, we get a number of features for free, like scaling and faceting:
```{r}
#| fig-width: 8
#| fig-height: 4
ggplot(df) +
geom_spring(
aes(x, y, xend = xend, yend = yend, colour = class),
linewidth = 1
) +
coord_equal() +
facet_wrap(~ class)
```
Users also have the option of calling the `stat_spring()` constructor, which can be helpful if for some reason they want to draw the springs with points rather than paths:
```{r}
#| fig-width: 8
#| fig-height: 4
ggplot(df) +
stat_spring(
aes(x, y, xend = xend, yend = yend, colour = class),
geom = "point",
n = 15
) +
coord_equal() +
facet_wrap(~ class)
```
### Post-mortem
We have now successfully created our first extension.
It works, but it has some limitations that we now need to think about.
One shortcoming of our implementation is that diameter and tension are constants that can only be set for the full layer.
These settings feel more like aesthetics and it would be nice if their values could be mapped to a variable in the data.
We'll discuss solutions to this problem in @sec-spring2 and @sec-spring3.
## Part 2: Adding aesthetics {#sec-spring2}
The stat we created in the last section treats the `diameter` and `tension` as constant arguments: they're not aesthetics and the user can't map them onto a variable in the data.
We can fix this by making a few small changes to the `StatSpring` code:
```{r}
StatSpring <- ggproto("StatSpring", Stat,
setup_data = function(data, params) {
if (anyDuplicated(data$group)) {
data$group <- paste(data$group, seq_len(nrow(data)), sep = "-")
}
data
},
compute_panel = function(data, scales, n = 50) {
cols_to_keep <- setdiff(names(data), c("x", "y", "xend", "yend"))
springs <- lapply(seq_len(nrow(data)), function(i) {
spring_path <- create_spring(
data$x[i],
data$y[i],
data$xend[i],
data$yend[i],
data$diameter[i],
data$tension[i],
n
)
cbind(spring_path, unclass(data[i, cols_to_keep]))
})
do.call(rbind, springs)
},
required_aes = c("x", "y", "xend", "yend"),
optional_aes = c("diameter", "tension")
)
```
The main difference with our previous attempt is that the `diameter` and `tension` arguments to `compute_panel()` have gone away, and they're now taken from the data (just like `x`, `y`, etc).
This has a downside that we'll fix in @sec-spring3: we can no longer set fixed aesthetics.
Because of this, we'll need to remove those arguments from the constructor function:
```{r}
geom_spring <- function(mapping = NULL,
data = NULL,
stat = "spring",
position = "identity",
...,
n = 50,
arrow = NULL,
lineend = "butt",
linejoin = "round",
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE) {
layer(
data = data,
mapping = mapping,
stat = stat,
geom = GeomPath,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(
n = n,
arrow = arrow,
lineend = lineend,
linejoin = linejoin,
na.rm = na.rm,
...
)
)
}
```
The `stat_spring()` constructor would require the same kind of change.
All that is left is to test our new implementation out:
```{r}
df <- tibble(
x = runif(5, max = 10),
y = runif(5, max = 10),
xend = runif(5, max = 10),
yend = runif(5, max = 10),
class = sample(letters[1:2], 5, replace = TRUE),
tension = runif(5),
diameter = runif(5, 0.5, 1.5)
)
ggplot(df, aes(x, y, xend = xend, yend = yend)) +
geom_spring(aes(tension = tension, diameter = diameter))
```
It appears to work.
However, as we expected, it's no longer possible to set `diameter` and `tension` as parameters:
```{r}
ggplot(df, aes(x, y, xend = xend, yend = yend)) +
geom_spring(diameter = 0.5)
```
### Post-mortem
In this section we further developed our spring stat so that `diameter` and `tension` can be used as aesthetics, varying across springs.
Unfortunately, there's a major downside: these features no longer can be set globally.
We're still also missing a way to control the scaling of the two aesthetics.
Fixing both these problems requires the same next step: move our implementation away from `Stat` and towards a proper `Geom`.
## Part 3: A geom {#sec-spring3}
In many cases a Stat-centred approach is sufficient, for example, many of the graphic primitives provided by the [ggforce](https://ggforce.data-imaginist.com) package are Stats.
But we need to go further with the spring geom because the `tension` and `diameter` aesthetics need to be specified in units that are unrelated to the coordinate system.
Consequently, we'll rewrite our geom to be a proper `Geom` extension.
### Geom extensions
As discussed in @sec-extensions, there are many similarities between `Stat` and `Geom` extensions.
The biggest difference is that `Stat` extensions return a modified version of the input data, whereas `Geom` extensions return graphical objects.
In some cases creating a new geom requires you to use the grid package (we'll cover this in @sec-spring4), but often you don't have to.
Much like stat objects, geom objects in ggproto have several methods and fields you can modify.
You can see the list by printing the object:
```{r}
Geom
```
### Creating the geom
In much the same way that a stat uses the `compute_layer()`, `compute_panel()`, and `compute_group()` methods to transform the data, a geom uses `draw_layer()`, `draw_panel()`, and `draw_group()` to create graphical representations of the data.
In the same way that we created `StatSpring` by writing a `compute_panel()` method to do the heavy lifting, we'll create `GeomSpring` by writing a `draw_panel()` method:
```{r}
GeomSpring <- ggproto("GeomSpring", Geom,
# Ensure that each row has a unique group id
setup_data = function(data, params) {
if (is.null(data$group)) {
data$group <- seq_len(nrow(data))
}
if (anyDuplicated(data$group)) {
data$group <- paste(data$group, seq_len(nrow(data)), sep = "-")
}
data
},
# Transform the data inside the draw_panel() method
draw_panel = function(data,
panel_params,
coord,
n = 50,
arrow = NULL,
lineend = "butt",
linejoin = "round",
linemitre = 10,
na.rm = FALSE) {
# Transform the input data to specify the spring paths
cols_to_keep <- setdiff(names(data), c("x", "y", "xend", "yend"))
springs <- lapply(seq_len(nrow(data)), function(i) {
spring_path <- create_spring(
data$x[i],
data$y[i],
data$xend[i],
data$yend[i],
data$diameter[i],
data$tension[i],
n
)
cbind(spring_path, unclass(data[i, cols_to_keep]))
})
springs <- do.call(rbind, springs)
# Use the draw_panel() method from GeomPath to do the drawing
GeomPath$draw_panel(
data = springs,
panel_params = panel_params,
coord = coord,
arrow = arrow,
lineend = lineend,
linejoin = linejoin,
linemitre = linemitre,
na.rm = na.rm
)
},
# Specify the default and required aesthetics
required_aes = c("x", "y", "xend", "yend"),
default_aes = aes(
colour = "black",
linewidth = 0.5,
linetype = 1L,
alpha = NA,
diameter = 1,
tension = 0.75
)
)
```
Despite the length of this code, most of it is familiar:
- The `setup_data()` methods are essentially the same: in both cases they ensure that every row in the input data has a unique group identifier.
- The `draw_panel()` method for our `GeomSpring` object is very similar to the `compute_panel()` method.
The main difference is that our `draw_panel()` method has an extra step: it passes the computed spring coordinates to `GeomPath$draw_panel()`.
Because springs are just fancy paths, the `GeomPath$draw_panel()` method works perfectly well here.
- Unlike the `StatSpring` code that we wrote earlier, the `GeomSpring` code uses the `default_aes` field to provide default values for any aesthetic that the user does not specify.
One aspect to this code may surprise developers who are used to object-oriented design from other languages.
Calling the method of a kindred object directly, as we do when invoking `GeomPath$draw_panel()` from within `GeomSpring$draw_panel()`, is not considered good practice in other object-oriented systems.
However, because ggproto objects are stateless (@sec-ggproto-style), this is exactly as safe as subclassing `GeomPath` and calling the parent method.
You can see this approach all over the place in the ggplot2 source code.
### A constructor
As with our earlier attempts the final step is to write a constructor function `geom_spring()`.
The code is not much different to earlier versions: we use `GeomSpring` instead of `GeomPath`, and we use the identity stat instead of `StatSpring`.
```{r}
geom_spring <- function(mapping = NULL,
data = NULL,
stat = "identity",
position = "identity",
...,
n = 50,
arrow = NULL,
lineend = "butt",
linejoin = "round",
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE) {
layer(
data = data,
mapping = mapping,
stat = stat,
geom = GeomSpring,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(
n = n,
arrow = arrow,
lineend = lineend,
linejoin = linejoin,
na.rm = na.rm,
...
)
)
}
```
### Testing the geom
We now have a proper geom with working default aesthetics and the ability to setting aesthetics as parameters:
```{r}
#| layout-ncol: 2
#| fig-width: 4
ggplot(df, aes(x, y, xend = xend, yend = yend)) +
geom_spring(aes(tension = tension, diameter = diameter))
ggplot(df, aes(x, y, xend = xend, yend = yend)) +
geom_spring(diameter = 0.5)
```
It does have some limitations still, because the units for `diameter` and `tension` are expressed relative to the scale of the raw data.
The actual diameter of a spring with `diameter = 0.5` will be different depending on the axis limits, and if the x and y axes are not on the same scale the shape of the spring will be distorted.
You can see this in the example below:
```{r}
ggplot() +
geom_spring(aes(x = 0, y = 0, xend = 3, yend = 20))
```
The same underlying problem means that the diameter of the spring is expressed in coordinate space.
This makes it difficult to define a meaningful default because the absolute size of the spring diameter changes when the scale of the data changes:
```{r}
ggplot() +
geom_spring(aes(x = 0, y = 0, xend = 100, yend = 80))
```
We'll address this issue in @sec-spring4.
### Post-mortem
In this section we finally created our own `Geom` extension.
This is often the natural conclusion to the development of new layer, but not always.
Sometimes you'll find that the `Stat` approach works perfectly well for your purposes, and it has the advantage that you can use the stat with multiple geoms.
The final choice is up to you as the developer, and should be guided by how you expect people to use the layer.
Perhaps surprisingly, we haven't talked about what goes on inside the `draw_*()` methods yet.
Our `GeomSpring` object relies on the `draw_panel()` method from `GeomPath` to do the work of creating the graphical output.
This is quite common.
For example, even the relatively complex [`GeomBoxplot`](https://github.com/tidyverse/ggplot2/blob/master/R/geom-boxplot.r) just uses the draw methods from `GeomPoint()`, `GeomSegment` and `GeomCrossbar`.
If you need to go deeper, you'll need to learn a little about grid.
Creating grid grobs is an advanced technique, needed by relatively few geoms.
But creating a grid grob gives you the power to use absolute units for diameter (e.g. 1cm) and to adjust the display of the geom based on the size of the output device.
We'll turn to that next.
## An introduction to grid
The grid package provides the underlying graphics system upon which ggplot2 is built.
It's one of two quite different drawing systems that are included in base R: base graphics and grid.
Base graphics has an imperative "pen-on-paper" model: every function immediately draws something on the graphics device.
Much like ggplot2 itself, grid takes a more declarative approach where you build up a description of the graphic as an object, which is later rendered.
This declarative approach allows us to create objects that exists independently of the graphic device and can be passed around, analysed, and modified.
Importantly, parts of a graphical object can refer to other parts, which allows you to do things like define this rectangle to have width equal to the length of that string of text, and so on.
As a ggplot2 developer you will find that you can achieve a lot without ever needing to interact with grid directly, but there are situations where it is impossible to achieve what you want without going down to the grid level.
The two most common situations are:
1. You need to create graphical objects that are positioned correctly on the coordinate system, but where some part of their appearance has a fixed absolute size.
In our case this would be the spring correctly going between two points in the plot, but the diameter being defined in cm instead of relative to the coordinate system.
2. You need graphical objects that are updated during resizing.
This could e.g. be the position of labels such as in the ggrepel package or the `geom_mark_*()` geoms in ggforce.
A comprehensive introduction to grid is far more than we can cover in this book, but to help you get started we'll give you the absolute minimum vocabulary to understand how ggplot2 uses grid.
We'll introduce core concepts like grobs, viewports, graphical parameters, and units but please read *R Graphics* by @murrell:2018 to get the full details.
### Grobs
To understand how grid works, the first thing we need to talk about are **grobs**.
Grobs (**gr**aphic **ob**jects) are the atomic representations of graphical elements in grid, and include types like points, lines, circles, rectangles, and text.
The grid package provides functions like `pointsGrob()`, `linesGrob()`, `circleGrob()`, `rectGrob()`, and `textGrob()` that create graphical objects without drawing anything to the graphics device.
These functions are vectorised, allowing a single point grob to represent multiple points, for instance:
```{r}
library(grid)
circles <- circleGrob(
x = c(0.1, 0.4, 0.7),
y = c(0.5, 0.3, 0.6),
r = c(0.1, 0.2, 0.3)
)
circles
```
Note that this code doesn't draw anything: it's just a description of a set of circles.
To draw it, we first call `grid.newpage()` to clear the current graphics device then `grid.draw()`:
```{r}
grid.newpage()
grid.draw(circles)
```
grid also provides `grobTree()`, which construct composite objects from multiple atomic grobs.
Here's an illustration:
```{r}
labels <- textGrob(
label = c("small", "medium", "large"),
x = c(0.1, 0.4, 0.7),
y = c(0.5, 0.3, 0.6),
)
composite <- grobTree(circles, labels)
grid.newpage()
grid.draw(composite)
```
It is also possible to define your own grobs.
You can define a new primitive grob class using `grob()` or a new composite class using `gTree()`, then specify special behaviour for your new class.
We'll see an example of this in a moment.
### Viewports
The second key concept in grid is the idea of a **viewport**.
A viewport is a rectangular plotting region that supplies its own coordinate system for grobs that are drawn within it, and can also provide a tabular grid in which other viewports an be nested.
An individual grob can have its own viewport or, if none is provided, it will inherit one.
While we won't need to consider viewports when building the grob for our springs, they're an important concept that powers much of the high-level layout of ggplot2 graphics so we'll very briefly introduce them here.
In the example below we use `viewport()` to define two different viewports, one with default parameters, and second one that is rotated around the midpoint by 15 degrees:
```{r}
vp_default <- viewport()
vp_rotated <- viewport(angle = 15)
```
This time around, when we create our composite grobs, we'll explicitly assign them to specific viewports by setting the `vp` argument:
```{r}
composite_default <- grobTree(circles, labels, vp = vp_default)
composite_rotated <- grobTree(circles, labels, vp = vp_rotated)
```
When we plot these two grobs, we can see the effect of the viewport: although `composite_default` and `composite_rotated` are comprised of the same two primitive grobs (i.e., `circles` and `labels`), they belong to different viewports so they look different when the plot is drawn:
```{r}
grid.newpage()
grid.draw(composite_default)
grid.draw(composite_rotated)
```
ggplot2 automatically generates most of the viewports that you'll need for plotting, but it's important to understand the basic idea.
### Graphical parameters
The next concept we need to understand is the idea of **graphical parameters**.
When we defined the `circles` and `labels` grobs, we only specified some of their properties.
For example, we said nothing about colour or transparency, so these properties are all set to their default values.
The `gpar()` function in grid allows you to specify graphical parameters as distinct objects:
```{r}
gp_blue <- gpar(col = "blue", fill = NA)
gp_orange <- gpar(col = "orange", fill = NA)
```
The `gp_blue` and `gp_orange` objects provide lists of graphical settings that can now be applied to any grob we like using the `gp` argument:
```{r}
grob1 <- grobTree(circles, labels, vp = vp_default, gp = gp_blue)
grob2 <- grobTree(circles, labels, vp = vp_rotated, gp = gp_orange)
```
When we plot these two grobs, they inherit the settings provided by the graphical parameters as well as the viewports to which they are assigned:
```{r}
grid.newpage()
grid.draw(grob1)
grid.draw(grob2)
```
### Units
The last core concept that we need to discuss is the **unit** system.
The grid package allows you to specify the positions (e.g. `x` and `y`) and dimensions (e.g. `length` and `width`) of grobs and viewports using a flexible specification.
In the grid unit system there are three qualitatively different styles of unit:
- Absolute units, e.g. centimeters, inches, and points, refer to physical sizes.
- Relative units e.g. `npc` which represents a proportion of the current viewport size.
- Units defined by strings or other grobs, e.g. `strwidth`, `grobwidth`.
The `unit()` function is the main function we use when specifying units: `unit(1, "cm")` is 1 centimeter, whereas `unit(0.5, "npc")` is half the size of the relevant viewport.
The unit system supports arithmetic operations that are only resolved at draw time, which makes it possible to combine different types of units: `unit(0.5, "npc") + unit(1, "cm")` defines a point one centimeter to the right of the center of the current viewport.
### Building grob classes
Now that we have a basic understanding of grid, let's attempt to create our own "surprise" grob class: objects that are circles if they are smaller than 3cm, but transform into squares whenever they are larger than 3cm.
This is not the most useful kind of graphical object, but it's useful for illustrating the flexibility of the grid system.
The first step is to write our own constructor function using `grob()` or `gTree()`, depending on whether we are creating a primitive or composite object.
We begin by creating a "thin" constructor function:
```{r}
surpriseGrob <- function(x,
y,
size,
default.units = "npc",
name = NULL,
gp = gpar(),
vp = NULL) {
# Ensure that input arguments are units
if (!is.unit(x)) x <- unit(x, default.units)
if (!is.unit(y)) y <- unit(y, default.units)
if (!is.unit(size)) size <- unit(size, default.units)
# Construct the surprise grob subclass as a gTree
gTree(
x = x,
y = y,
size = size,
name = name,
gp = gp,
vp = vp,
cl = "surprise"
)
}
```
This function doesn't do very much.
All it does is ensure that the `x`, `y`, and `size` arguments are grid units, and sets the class name to be "surprise".
To define the behaviour of our grob, we need to specify methods for one or both of the generic functions `makeContext()` and `makeContent()`:
- `makeContext()` is called when the parent grob is rendered and allows you to control the viewport of the grob.
We won't need to use that for our surprise grob.
- `makeContent()` is called every time the drawing region is resized and allows you to customise the look of the grob based on the size or other aspect.
Because these generic functions use the S3 object oriented programming system, we can define our method simply by appending the class name to the end of the function name.
That is, the `makeContent()` method for our surprise grob is defined by creating a function called `makeContent.surprise()` that takes a grob as input and returns a modified grob as output:
```{r}
makeContent.surprise <- function(x) {
x_pos <- x$x
y_pos <- x$y
size <- convertWidth(x$size, unitTo = "cm", valueOnly = TRUE)
# Figure out if the given sizes are bigger or smaller than 3 cm
circles <- size < 3
# Create a circle grob for the small ones
if (any(circles)) {
circle_grob <- circleGrob(
x = x_pos[circles],
y = y_pos[circles],
r = unit(size[circles] / 2, "cm")
)
} else {
circle_grob <- NULL
}
# Create a rect grob for the large ones
if (any(!circles)) {
square_grob <- rectGrob(
x = x_pos[!circles],
y = y_pos[!circles],
width = unit(size[!circles], "cm"),
height = unit(size[!circles], "cm")
)
} else {
square_grob <- NULL
}
# Add the circle and rect grob as children of our input grob
setChildren(x, gList(circle_grob, square_grob))
}
```
Some of the functions we've called here are new, but they all reuse the core concepts that we discussed earlier.
Specifically:
- `convertWidth()` is used to convert grid units from one type to another.
- `gList()` creates a list of grobs.
- `setChildren()` specifies the grobs that belong to a gTree composite grob.
The effect of this function is to ensure that every time the grob is rendered the absolute size of each shape is recalculated.
All shapes smaller than 3cm become circles, and all shapes larger than 3cm become squares.
To see how this plays out, lets call our new function:
```{r}
surprises <- surpriseGrob(
x = c(0.25, 0.45, 0.75),
y = c(0.5, 0.5, 0.5),
size = c(0.05, 0.15, 0.25)
)
```