generated from r4ds/bookclub-template
-
Notifications
You must be signed in to change notification settings - Fork 25
/
18_Expressions.Rmd
844 lines (615 loc) · 20.1 KB
/
18_Expressions.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
# Expressions
**Learning objectives:**
* Understand the idea of the abstract syntax tree (AST).
* Discuss the data structures that underlie the AST:
* Constants
* Symbols
* Calls
* Explore the idea behind parsing.
* Explore some details of R's grammar.
* Discuss the use or recursive functions to compute on the language.
* Work with three other more specialized data structures:
* Pairlists
* Missing arguments
* Expression vectors
```{r, message = FALSE, warning = FALSE}
library(rlang)
library(lobstr)
```
## Introduction
> To compute on the language, we first need to understand its structure.
* This requires a few things:
* New vocabulary.
* New tools to inspect and modify expressions.
* Approach the use of the language with new ways of thinking.
* One of the first new ways of thinking is the distinction between an operation and its result.
```{r, error = TRUE}
y <- x * 10
```
* We can capture the intent of the code without executing it using the rlang package.
```{r}
z <- rlang::expr(y <- x * 10)
z
```
* We can then evaluate the expression using the **base::eval** function.
```{r}
x <- 4
base::eval(expr(y <- x * 10))
y
```
### Evaluating multiple expressions
* The function `expression()` allows for multiple expressions, and in some ways it acts similarly to the way files are `source()`d in. That is, we `eval()`uate all of the expressions at once.
* `expression()` returns a vector and can be passed to `eval()`.
```{r}
z <- expression(x <- 4, x * 10)
eval(z)
is.atomic(z)
is.vector(z)
```
* `exprs()` does not evaluate everything at once. To evaluate each expression, the individual expressions must be evaluated in a loop.
```{r}
for (i in exprs(x <- 4, x * 10)) {
print(i)
print(eval(i))
}
```
## Abstract Syntax Tree (AST)
* Expressions are objects that capture the structure of code without evaluating it.
* Expressions are also called abstract syntax trees (ASTs) because the structure of code is hierarchical and can be naturally represented as a tree.
* Understanding this tree structure is crucial for inspecting and modifying expressions.
* Branches = Calls
* Leaves = Symbols and constants
```{r,eval=FALSE}
f(x, "y", 1)
```
![](images/simple.png)
### With `lobstr::ast():`
```{r}
lobstr::ast(f(x, "y", 1))
```
* Some functions might also contain more calls like the example below:
```{r,eval=FALSE}
f(g(1, 2), h(3, 4, i())):
```
![](images/complicated.png)
```{r}
lobstr::ast(f(g(1, 2), h(3, 4, i())))
```
* Read the **hand-drawn diagrams** from left-to-right (ignoring vertical position)
* Read the **lobstr-drawn diagrams** from top-to-bottom (ignoring horizontal position).
* The depth within the tree is determined by the nesting of function calls.
* Depth also determines evaluation order, **as evaluation generally proceeds from deepest-to-shallowest, but this is not guaranteed because of lazy evaluation**.
### Infix calls
> Every call in R can be written in tree form because any call can be written in prefix form.
An infix operator is a function where the function name is placed between its arguments. Prefix form is when then function name comes before the arguments, which are enclosed in parentheses. [Note that the name infix comes from the words prefix and suffix.]
```{r}
y <- x * 10
`<-`(y, `*`(x, 10))
```
* A characteristic of the language is that infix functions can always be written as prefix functions; therefore, all function calls can be represented using an AST.
![](images/prefix.png)
```{r}
lobstr::ast(y <- x * 10)
```
```{r}
lobstr::ast(`<-`(y, `*`(x, 10)))
```
* There is no difference between the ASTs for the infix version vs the prefix version, and if you generate an expression with prefix calls, R will still print it in infix form:
```{r}
rlang::expr(`<-`(y, `*`(x, 10)))
```
## Expression
* Collectively, the data structures present in the AST are called expressions.
* These include:
1. Constants
2. Symbols
3. Calls
4. Pairlists
### Constants
* Scalar constants are the simplest component of the AST.
* A constant is either **NULL** or a **length-1** atomic vector (or scalar)
* e.g., `TRUE`, `1L`, `2.5`, `"x"`, or `"hello"`.
* We can test for a constant with `rlang::is_syntactic_literal()`.
* Constants are self-quoting in the sense that the expression used to represent a constant is the same constant:
```{r}
identical(expr(TRUE), TRUE)
identical(expr(1), 1)
identical(expr(2L), 2L)
identical(expr("x"), "x")
identical(expr("hello"), "hello")
```
### Symbols
* A symbol represents the name of an object.
* `x`
* `mtcars`
* `mean`
* In base R, the terms symbol and name are used interchangeably (i.e., `is.name()` is identical to `is.symbol()`), but this book used symbol consistently because **"name"** has many other meanings.
* You can create a symbol in two ways:
1. by capturing code that references an object with `expr()`.
2. turning a string into a symbol with `rlang::sym()`.
```{r}
expr(x)
```
```{r}
sym("x")
```
* A symbol can be turned back into a string with `as.character()` or `rlang::as_string()`.
* `as_string()` has the advantage of clearly signalling that you’ll get a character vector of length 1.
```{r}
as_string(expr(x))
```
* We can recognize a symbol because it is printed without quotes
```{r}
expr(x)
```
* `str()` tells you that it is a symbol, and `is.symbol()` is TRUE:
```{r}
str(expr(x))
```
```{r}
is.symbol(expr(x))
```
* The symbol type is not vectorised, i.e., a symbol is always length 1.
* If you want multiple symbols, you’ll need to put them in a list, using `rlang::syms()`.
Note that `as_string()` will not work on expressions which are not symbols.
```{r}
#| error: true
as_string(expr(x+y))
```
### Calls
* A call object represents a captured function call.
* Call objects are a special type of list.
* The first component specifies the function to call (usually a symbol, i.e., the name fo the function).
* The remaining elements are the arguments for that call.
* Call objects create branches in the AST, because calls can be nested inside other calls.
* You can identify a call object when printed because it looks just like a function call.
* Confusingly `typeof()` and `str()` print language for call objects (where we might expect it to return that it is a "call" object), but `is.call()` returns TRUE:
```{r}
lobstr::ast(read.table("important.csv", row.names = FALSE))
```
```{r}
x <- expr(read.table("important.csv", row.names = FALSE))
```
```{r}
typeof(x)
```
```{r}
is.call(x)
```
### Subsetting
* Calls generally behave like lists.
* Since they are list-like, you can use standard subsetting tools.
* The first element of the call object is the function to call, which is usually a symbol:
```{r}
x[[1]]
```
```{r}
is.symbol(x[[1]])
```
* The remainder of the elements are the arguments:
```{r}
is.symbol(x[-1])
as.list(x[-1])
```
* We can extract individual arguments with [[ or, if named, $:
```{r}
x[[2]]
```
```{r}
x$row.names
```
* We can determine the number of arguments in a call object by subtracting 1 from its length:
```{r}
length(x) - 1
```
* Extracting specific arguments from calls is challenging because of R’s flexible rules for argument matching:
* It could potentially be in any location, with the full name, with an abbreviated name, or with no name.
* To work around this problem, you can use `rlang::call_standardise()` which standardizes all arguments to use the full name:
```{r}
rlang::call_standardise(x)
```
* But If the function uses ... it’s not possible to standardise all arguments.
* Calls can be modified in the same way as lists:
```{r}
x$header <- TRUE
x
```
### Function position
* The first element of the call object is the function position. This contains the function that will be called when the object is evaluated, and is usually a symbol.
```{r}
lobstr::ast(foo())
```
* While R allows you to surround the name of the function with quotes, the parser converts it to a symbol:
```{r}
lobstr::ast("foo"())
```
* However, sometimes the function doesn’t exist in the current environment and you need to do some computation to retrieve it:
* For example, if the function is in another package, is a method of an R6 object, or is created by a function factory. In this case, the function position will be occupied by another call:
```{r}
lobstr::ast(pkg::foo(1))
```
```{r}
lobstr::ast(obj$foo(1))
```
```{r}
lobstr::ast(foo(1)(2))
```
![](images/call-call.png)
### Constructing
* You can construct a call object from its components using `rlang::call2()`.
* The first argument is the name of the function to call (either as a string, a symbol, or another call).
* The remaining arguments will be passed along to the call:
```{r}
call2("mean", x = expr(x), na.rm = TRUE)
```
```{r}
call2(expr(base::mean), x = expr(x), na.rm = TRUE)
```
* Infix calls created in this way still print as usual.
```{r}
call2("<-", expr(x), 10)
```
## Parsing and grammar
* **Parsing** - The process by which a computer language takes a string and constructs an expression. Parsing is governed by a set of rules known as a grammar.
* We are going to use `lobstr::ast()` to explore some of the details of R’s grammar, and then show how you can transform back and forth between expressions and strings.
* **Operator precedence** - Conventions used by the programming language to resolve ambiguity.
* Infix functions introduce two sources of ambiguity.
* The first source of ambiguity arises from infix functions: what does 1 + 2 * 3 yield? Do you get 9 (i.e., (1 + 2) * 3), or 7 (i.e., 1 + (2 * 3))? In other words, which of the two possible parse trees below does R use?
![](images/ambig-order.png)
* Programming languages use conventions called operator precedence to resolve this ambiguity. We can use `ast()` to see what R does:
```{r}
lobstr::ast(1 + 2 * 3)
```
* PEMDAS (or BEDMAS or BODMAS, depending on where in the world you grew up) is pretty clear on what to do. Other operator precedence isn't as clear.
* There’s one particularly surprising case in R:
* ! has a much lower precedence (i.e., it binds less tightly) than you might expect.
* This allows you to write useful operations like:
```{r}
lobstr::ast(!x %in% y)
```
* **R has over 30 infix operators divided into 18 precedence** groups.
* While the details are described in `?Syntax`, very few people have memorized the complete ordering.
* If there’s any confusion, use parentheses!
```{r}
# override PEMDAS
lobstr::ast((1 + 2) * 3)
```
### Associativity
* The second source of ambiguity is introduced by repeated usage of the same infix function.
```{r}
1 + 2 + 3
# What does R do first?
(1 + 2) + 3
# or
1 + (2 + 3)
```
* In this case it doesn't matter. Other places it might, like in `ggplot2`.
* In R, most operators are left-associative, i.e., the operations on the left are evaluated first:
```{r}
lobstr::ast(1 + 2 + 3)
```
* There are two exceptions to the left-associative rule:
1. exponentiation
2. assignment
```{r}
lobstr::ast(2 ^ 2 ^ 3)
```
```{r}
lobstr::ast(x <- y <- z)
```
### Parsing and deparsing
* Parsing - turning characters you've typed into an AST (i.e., from strings to expressions).
* R usually takes care of parsing code for us.
* But occasionally you have code stored as a string, and you want to parse it yourself.
* You can do so using `rlang::parse_expr()`:
```{r}
x1 <- "y <- x + 10"
x1
is.call(x1)
```
```{r}
x2 <- rlang::parse_expr(x1)
x2
is.call(x2)
```
* `parse_expr()` always returns a single expression.
* If you have multiple expression separated by `;` or `,`, you’ll need to use `rlang::parse_exprs()` which is the plural version of `rlang::parse_expr()`. It returns a list of expressions:
```{r}
x3 <- "a <- 1; a + 1"
```
```{r}
rlang::parse_exprs(x3)
```
* If you find yourself parsing strings into expressions often, **quasiquotation** may be a safer approach.
* More about quasiquaotation in Chapter 19.
* The inverse of parsing is deparsing.
* **Deparsing** - given an expression, you want the string that would generate it.
* Deparsing happens automatically when you print an expression.
* You can get the string with `rlang::expr_text()`:
* Parsing and deparsing are not symmetric.
* Parsing creates the AST which means that we lose backticks around ordinary names, comments, and whitespace.
```{r}
cat(expr_text(expr({
# This is a comment
x <- `x` + 1
})))
```
## Using the AST to solve more complicated problems
* Here we focus on what we learned to perform recursion on the AST.
* Two parts of a recursive function:
* Recursive case: handles the nodes in the tree. Typically, you’ll do something to each child of a node, usually calling the recursive function again, and then combine the results back together again. For expressions, you’ll need to handle calls and pairlists (function arguments).
* Base case: handles the leaves of the tree. The base cases ensure that the function eventually terminates, by solving the simplest cases directly. For expressions, you need to handle symbols and constants in the base case.
### Two helper functions
* First, we need an `epxr_type()` function to return the type of expression element as a string.
```{r}
expr_type <- function(x) {
if (rlang::is_syntactic_literal(x)) {
"constant"
} else if (is.symbol(x)) {
"symbol"
} else if (is.call(x)) {
"call"
} else if (is.pairlist(x)) {
"pairlist"
} else {
typeof(x)
}
}
```
```{r}
expr_type(expr("a"))
expr_type(expr(x))
expr_type(expr(f(1, 2)))
```
* Second, we need a wrapper function to handle exceptions.
```{r}
switch_expr <- function(x, ...) {
switch(expr_type(x),
...,
stop("Don't know how to handle type ", typeof(x), call. = FALSE)
)
}
```
* Lastly, we can write a basic template that walks the AST using the `switch()` statement.
```{r,}
recurse_call <- function(x) {
switch_expr(x,
# Base cases
symbol = ,
constant = ,
# Recursive cases
call = ,
pairlist =
)
}
```
### Specific use cases for `recurse_call()`
### Example 1: Finding F and T
* Using `F` and `T` in our code rather than `FALSE` and `TRUE` is bad practice.
* Say we want to walk the AST to find times when we use `F` and `T`.
* Start off by finding the type of `T` vs `TRUE`.
```{r}
expr_type(expr(TRUE))
expr_type(expr(T))
```
* With this knowledge, we can now write the base cases of our recursive function.
* The logic is as follows:
* A constant is never a logical abbreviation and a symbol is an abbreviation if it is "F" or "T":
```{r}
logical_abbr_rec <- function(x) {
switch_expr(x,
constant = FALSE,
symbol = as_string(x) %in% c("F", "T")
)
}
```
```{r}
logical_abbr_rec(expr(TRUE))
logical_abbr_rec(expr(T))
```
* It's best practice to write another wrapper, assuming every input you receive will be an expression.
```{r}
logical_abbr <- function(x) {
logical_abbr_rec(enexpr(x))
}
logical_abbr(T)
logical_abbr(FALSE)
```
#### Next step: code for the recursive cases
* Here we want to do the same thing for calls and for pairlists.
* Here's the logic: recursively apply the function to each subcomponent, and return `TRUE` if any subcomponent contains a logical abbreviation.
* This is simplified by using the `purrr::some()` function, which iterates over a list and returns `TRUE` if the predicate function is true for any element.
```{r}
logical_abbr_rec <- function(x) {
switch_expr(x,
# Base cases
constant = FALSE,
symbol = as_string(x) %in% c("F", "T"),
# Recursive cases
call = ,
# Are we sure this is the correct function to use?
# Why not logical_abbr_rec?
pairlist = purrr::some(x, logical_abbr_rec)
)
}
logical_abbr(mean(x, na.rm = T))
logical_abbr(function(x, na.rm = T) FALSE)
```
### Example 2: Finding all variables created by assignment
* Listing all the variables is a little more complicated.
* Figure out what assignment looks like based on the AST.
```{r}
ast(x <- 10)
```
* Now we need to decide what data structure we're going to use for the results.
* Easiest thing will be to return a character vector.
* We would need to use a list if we wanted to return symbols.
### Dealing with the base cases
```{r}
find_assign_rec <- function(x) {
switch_expr(x,
constant = ,
symbol = character()
)
}
find_assign <- function(x) find_assign_rec(enexpr(x))
find_assign("x")
find_assign(x)
```
### Dealing with the recursive cases
* Here is the function to flatten pairlists.
```{r}
flat_map_chr <- function(.x, .f, ...) {
purrr::flatten_chr(purrr::map(.x, .f, ...))
}
flat_map_chr(letters[1:3], ~ rep(., sample(3, 1)))
```
* Here is the code needed to identify calls.
```{r}
find_assign_rec <- function(x) {
switch_expr(x,
# Base cases
constant = ,
symbol = character(),
# Recursive cases
pairlist = flat_map_chr(as.list(x), find_assign_rec),
call = {
if (is_call(x, "<-")) {
as_string(x[[2]])
} else {
flat_map_chr(as.list(x), find_assign_rec)
}
}
)
}
find_assign(a <- 1)
find_assign({
a <- 1
{
b <- 2
}
})
```
### Make the function more robust
* Throw cases at it that we think might break the function.
* Write a function to handle these cases.
```{r}
find_assign_call <- function(x) {
if (is_call(x, "<-") && is_symbol(x[[2]])) {
lhs <- as_string(x[[2]])
children <- as.list(x)[-1]
} else {
lhs <- character()
children <- as.list(x)
}
c(lhs, flat_map_chr(children, find_assign_rec))
}
find_assign_rec <- function(x) {
switch_expr(x,
# Base cases
constant = ,
symbol = character(),
# Recursive cases
pairlist = flat_map_chr(x, find_assign_rec),
call = find_assign_call(x)
)
}
find_assign(a <- b <- c <- 1)
find_assign(system.time(x <- print(y <- 5)))
```
* This approach certainly is more complicated, but it's important to start simple and move up.
## Specialised data structures
* Pairlists
* Missing arguments
* Expression vectors
### Pairlists
* Pairlists are a remnant of R’s past and have been replaced by lists almost everywhere.
* The only place you are likely to see pairlists in R is when working with calls to the function, as the formal arguments to a function are stored in a pairlist:
```{r}
f <- expr(function(x, y = 10) x + y)
```
```{r}
args <- f[[2]]
args
```
```{r}
typeof(args)
```
* Fortunately, whenever you encounter a pairlist, you can treat it just like a regular list:
```{r}
pl <- pairlist(x = 1, y = 2)
```
```{r}
length(pl)
```
```{r}
pl$x
```
### Missing arguments
* Empty symbols
* To create an empty symbol, you need to use `missing_arg()` or `expr()`.
```{r}
missing_arg()
typeof(missing_arg())
```
* Empty symbols don't print anything.
* To check, we need to use `rlang::is_missing()`
```{r}
is_missing(missing_arg())
```
* These are usually present in function formals:
```{r}
f <- expr(function(x, y = 10) x + y)
args <- f[[2]]
is_missing(args[[1]])
```
### Expression vectors
* An expression vector is just a list of expressions.
* The only difference is that calling `eval()` on an expression evaluates each individual expression.
* Instead, it might be more advantageous to use a list of expressions.
* Expression vectors are only produced by two base functions:
`expression()` and `parse()`:
```{r}
exp1 <- parse(text = c("
x <- 4
x
"))
exp1
```
```{r}
exp2 <- expression(x <- 4, x)
exp2
```
```{r}
typeof(exp1)
typeof(exp2)
```
- Like calls and pairlists, expression vectors behave like lists:
```{r}
length(exp1)
exp1[[1]]
```
## Meeting Videos
### Cohort 1
`r knitr::include_url("https://www.youtube.com/embed/2NixH3QAerQ")`
### Cohort 2
`r knitr::include_url("https://www.youtube.com/embed/mYOUgzoRcjI")`
### Cohort 3
`r knitr::include_url("https://www.youtube.com/embed/5RLCRFli6QI")`
`r knitr::include_url("https://www.youtube.com/embed/F8df5PMNC8Y")`
### Cohort 4
`r knitr::include_url("https://www.youtube.com/embed/tSVBlAP5DIY")`
### Cohort 5
`r knitr::include_url("https://www.youtube.com/embed/Jc_R4yFsYeE")`
### Cohort 6
`r knitr::include_url("https://www.youtube.com/embed/K8w28ee3CR8")`
### Cohort 7
`r knitr::include_url("https://www.youtube.com/embed/XPs-TI4BYjk")`
`r knitr::include_url("https://www.youtube.com/embed/8LPw_VTBsmQ")`
<details>
<summary>Meeting chat log</summary>
```
00:50:48 Stone: https://www.r-bloggers.com/2018/10/quasiquotation-in-r-via-bquote/
00:58:26 iPhone: See ya next week!
```
</details>