forked from rstudio-education/hopr
-
Notifications
You must be signed in to change notification settings - Fork 0
/
programs.rmd
955 lines (697 loc) · 38.6 KB
/
programs.rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
# Programs {#programs}
In this chapter, you will build a real, working slot machine that you can play by running an R function. When you're finished, you'll be able to play it like this:
```r
play()
## 0 0 DD
## $0
play()
## 7 7 7
## $80
```
The `play` function will need to do two things. First, it will need to randomly generate three symbols; and, second, it will need to calculate a prize based on those symbols.
The first step is easy to simulate. You can randomly generate three symbols with the `sample` function—just like you randomly "rolled" two dice in [Project 1: Weighted Dice]. The following function generates three symbols from a group of common slot machine symbols: diamonds (`DD`), sevens (`7`), triple bars (`BBB`), double bars (`BB`), single bars (`B`), cherries (`C`), and zeroes (`0`). The symbols are selected randomly, and each symbol appears with a different probability:
```r
get_symbols <- function() {
wheel <- c("DD", "7", "BBB", "BB", "B", "C", "0")
sample(wheel, size = 3, replace = TRUE,
prob = c(0.03, 0.03, 0.06, 0.1, 0.25, 0.01, 0.52))
}
```
You can use `get_symbols` to generate the symbols used in your slot machine:
```r
get_symbols()
## "BBB" "0" "C"
get_symbols()
## "0" "0" "0"
get_symbols()
## "7" "0" "B"
```
`get_symbols` uses the probabilities observed in a group of video lottery terminals from Manitoba, Canada. These slot machines became briefly controversial in the 1990s, when a reporter decided to test their payout rate. The machines appeared to pay out only 40 cents on the dollar, even though the manufacturer claimed they would pay out 92 cents on the dollar. The original data collected on the machines and a description of the controversy is available online in [a journal article by W. John Braun](http://bit.ly/jse_Braun). The controversy died down when additional testing showed that the manufacturer was correct.
The Manitoba slot machines use the complicated payout scheme shown in Table \@ref(tab:prizes). A player will win a prize if he gets:
* Three of the same type of symbol (except for three zeroes)
* Three bars (of mixed variety)
* One or more cherries
Otherwise, the player receives no prize.
The monetary value of the prize is determined by the exact combination of symbols and is further modified by the presence of diamonds. Diamonds are treated like "wild cards," which means they can be considered any other symbol if it would increase a player's prize. For example, a player who rolls `7` `7` `DD` would earn a prize for getting three sevens. There is one exception to this rule, however: a diamond cannot be considered a cherry unless the player also gets one real cherry. This prevents a dud roll like, `0` `DD` `0` from being scored as `0` `C` `0`.
Diamonds are also special in another way. Every diamond that appears in a combination doubles the amount of the final prize. So `7` `7` `DD` would actually be scored _higher_ than `7` `7` `7`. Three sevens would earn you $80, but two sevens and a diamond would earn you $160. One seven and two diamonds would be even better, resulting in a prize that has been doubled twice, or $320. A jackpot occurs when a player rolls `DD` `DD` `DD`. Then a player earns $100 doubled three times, which is $800.
Table: (\#tab:prizes) Each play of the slot machine costs $1. A player's symbols determine how much they win. Diamonds (`DD`) are wild, and each diamond doubles the final prize. * = any symbol.
|Combination|Prize($)
|-----------|--------
|`DD DD DD`|100
|`7 7 7`|80
|`BBB BBB BBB`|40
|`BB BB BB`|25
|`B B B`|10
|`C C C`|10
|Any combination of bars|5
|`C C *`|5
|`C * C`|5
|`* C C`|5
|`C * *`|2
|`* C *`|2
|`* * C`|2
To create your `play` function, you will need to write a program that can take the output of `get_symbols` and calculate the correct prize based on Table \@ref(tab:prizes).
In R, programs are saved either as R scripts or as functions. We'll save your program as a function named `score`. When you are finished, you will be able to use `score` to calculate a prize like this:
```r
score(c("DD", "DD", "DD"))
## 800
```
After that it will be easy to create the full slot machine, like this:
```r
play <- function() {
symbols <- get_symbols()
print(symbols)
score(symbols)
}
```
```{block2, type = "rmdnote"}
The `print` command prints its output to the console window, which makes `print` a useful way to display messages from within the body of a function.
```
You may notice that `play` calls a new function, `print`. This will help `play` display the three slot machine symbols, since they do not get returned by the last line of the function. The `print` command prints its output to the console window -- even if R calls it from within a function.
In [Project 1: Weighted Dice], I encouraged you to write all of your R code in an R script, a text file where you can compose and save code. That advice will become very important as you work through this chapter. Remember that you can open an R script in RStudio by going to the menu bar and clicking on File > New File > R Script.
## Strategy
Scoring slot-machine results is a complex task that will require a complex algorithm. You can make this, and other coding tasks, easier by using a simple strategy:
* Break complex tasks into simple subtasks.
* Use concrete examples.
* Describe your solutions in English, then convert them to R.
Let's start by looking at how you can divide a program into subtasks that are simple to work with.
A program is a set of step-by-step instructions for your computer to follow. Taken together, these instructions may accomplish something very sophisticated. Taken apart, each individual step will likely be simple and straightforward.
You can make coding easier by identifying the individual steps or subtasks within your program. You can then work on each subtask separately. If a subtask seems complicated, try to divide it again into even subtasks that are even more simple. You can often reduce an R program into substasks so simple that each can be performed with a preexisting function.
R programs contain two types of subtasks: sequential steps and parallel cases.
### Sequential Steps
One way to subdivide a program is into a series of sequential steps. The `play` function takes the approach, shown in Figure \@ref(fig:subdivide1). First, it generates three symbols (step 1), then it displays them in the console window (step 2), and then it scores them (step 3):
```r
play <- function() {
# step 1: generate symbols
symbols <- get_symbols()
# step 2: display the symbols
print(symbols)
# step 3: score the symbols
score(symbols)
}
```
To have R execute steps in sequence, place the steps one after another in an R script or function body.
```{r subdivide1, echo = FALSE, fig.cap = "The play function uses a series of steps."}
knitr::include_graphics("images/hopr_0701.png")
```
### Parallel Cases
Another way to divide a task is to spot groups of similar cases within the task. Some tasks require different algorithms for different groups of input. If you can identify those groups, you can work out their algorithms one at a time.
For example, `score` will need to calculate the prize one way if `symbols` contains three of a kind (In that case, `score` will need to match the common symbol to a prize). `score` will need to calculate the prize a second way if the symbols are all bars (In that case, `score` can just assign a prize of $5). And, finally, `score` will need to calculate the prize in a third way if the symbols do not contain three of a kind or all bars (In that case, `score` must count the number of cherries present). `score` will never use all three of these algorithms at once; it will always choose just one algorithm to run based on the combination of symbols.
Diamonds complicate all of this because diamonds can be treated as wild cards. Let's ignore that for now and focus on the simpler case where diamonds double the prize but are not wilds. `score` can double the prize as necessary after it runs one of the following algorithms, as shown in Figure \@ref(fig:subdivide2).
Adding the `score` cases to the `play` steps reveals a strategy for the complete slot machine program, as shown in Figure \@ref(fig:subdivide3).
We've already solved the first few steps in this strategy. Our program can get three slot machine symbols with the `get_symbols` function. Then it can display the symbols with the `print` function. Now let's examine how the program can handle the parallel score cases.
```{r subdivide2, echo = FALSE, fig.cap = "The score function must distinguish between parallel cases."}
knitr::include_graphics("images/hopr_0702.png")
```
```{r subdivide3, echo = FALSE, fig.cap = "The complete slot machine simulation will involve subtasks that are arranged both in series and in parallel."}
knitr::include_graphics("images/hopr_0703.png")
```
## if Statements
Linking cases together in parallel requires a bit of structure; your program faces a fork in the road whenever it must choose between cases. You can help the program navigate this fork with an `if` statement.
An `if` statement tells R to do a certain task for a certain case. In English you would say something like, "If this is true, do that." In R, you would say:
```r
if (this) {
that
}
```
The `this` object should be a logical test or an R expression that evaluates to a single `TRUE` or `FALSE`. If `this` evaluates to `TRUE`, R will run all of the code that appears between the braces that follow the `if` statement (i.e., between the `{` and `}` symbols). If `this` evaluates to `FALSE`, R will skip the code between the braces without running it.
For example, you could write an `if` statement that ensures some object, `num`, is positive:
```r
if (num < 0) {
num <- num * -1
}
```
If `num < 0` is `TRUE`, R will multiply `num` by negative one, which will make `num` positive:
```r
num <- -2
if (num < 0) {
num <- num * -1
}
num
## 2
```
If `num < 0` is `FALSE`, R will do nothing and `num` will remain as it is—positive (or zero):
```r
num <- 4
if (num < 0) {
num <- num * -1
}
num
## 4
```
The condition of an `if` statement must evaluate to a _single_ `TRUE` or `FALSE`. If the condition creates a vector of `TRUE`s and `FALSE`s (which is easier to make than you may think), your `if` statement will print a warning message and use only the first element of the vector. Remember that you can condense vectors of logical values to a single `TRUE` or `FALSE` with the functions `any` and `all`.
You don't have to limit your `if` statements to a single line of code; you can include as many lines as you like between the braces. For example, the following code uses many lines to ensure that `num` is positive. The additional lines print some informative statements if `num` begins as a negative number. R will skip the entire code block—`print` statements and all—if `num` begins as a positive number:
```r
num <- -1
if (num < 0) {
print("num is negative.")
print("Don't worry, I'll fix it.")
num <- num * -1
print("Now num is positive.")
}
## "num is negative."
## "Don't worry, I'll fix it."
## "Now num is positive."
num
## 1
```
Try the following quizzes to develop your understanding of `if` statements.
```{exercise, name = "Quiz A"}
What will this return?
```
```r
x <- 1
if (3 == 3) {
x <- 2
}
x
```
```{solution}
The code will return the number 2. `x` begins as 1, and then R encounters the `if` statement. Since the condition evaluates to `TRUE`, R will run `x <- 2`, changing the value of `x`.
```
```{exercise, name = "Quiz B"}
What will this return?
```
```r
x <- 1
if (TRUE) {
x <- 2
}
x
```
```{solution}
This code will also return the number 2. It works the same as the code in Quiz A, except the condition in this statement is already `TRUE`. R doesn't even need to evaluate it. As a result, the code inside the `if` statement will be run, and `x` will be set to 2.
```
```{exercise, name = "Quiz C"}
What will this return?
```
```r
x <- 1
if (x == 1) {
x <- 2
if (x == 1) {
x <- 3
}
}
x
```
```{solution}
Once again, the code will return the number 2. `x` starts out as 1, and the condition of the first `if` statement will evaluate to `TRUE`, which causes R to run the code in the body of the `if` statement. First, R sets `x` equal to 2, then R evaluates the second `if` statement, which is in the body of the first. This time `x == 1` will evaluate to `FALSE` because `x` now equals 2. As a result, R ignores `x <- 3` and exits both `if` statements.
```
## else Statements
`if` statements tell R what to do when your condition is _true_, but you can also tell R what to do when the condition is _false_. `else` is a counterpart to `if` that extends an `if` statement to include a second case. In English, you would say, "If this is true, do plan A; else do plan B." In R, you would say:
```r
if (this) {
Plan A
} else {
Plan B
}
```
When `this` evaluates to `TRUE`, R will run the code in the first set of braces, but not the code in the second. When `this` evaluates to `FALSE`, R will run the code in the second set of braces, but not the first. You can use this arrangement to cover all of the possible cases. For example, you could write some code that rounds a decimal to the nearest integer.
Start with a decimal:
```r
a <- 3.14
```
Then isolate the decimal component with `trunc`:
```r
dec <- a - trunc(a)
dec
## 0.14
```
```{block2, type = "rmdnote"}
`trunc` takes a number and returns only the portion of the number that appears to the left of the decimal place (i.e., the integer part of the number).
```
```{block2, type = "rmdnote"}
`a - trunc(a)` is a convenient way to return the decimal part of `a`.
```
Then use an `if else` tree to round the number (either up or down):
```r
if (dec >= 0.5) {
a <- trunc(a) + 1
} else {
a <- trunc(a)
}
a
## 3
```
If your situation has more than two mutually exclusive cases, you can string multiple `if` and `else` statements together by adding a new `if` statement immediately after `else`. For example:
```r
a <- 1
b <- 1
if (a > b) {
print("A wins!")
} else if (a < b) {
print("B wins!")
} else {
print("Tie.")
}
## "Tie."
```
R will work through the `if` conditions until one evaluates to `TRUE`, then R will ignore any remaining `if` and `else` clauses in the tree. If no conditions evaluate to `TRUE`, R will run the final `else` statement.
If two `if` statements describe mutually exclusive events, it is better to join the `if` statements with an `else if` than to list them separately. This lets R ignore the second `if` statement whenever the first returns a `TRUE`, which saves work.
You can use `if` and `else` to link the subtasks in your slot-machine function. Open a fresh R script, and copy this code into it. The code will be the skeleton of our final `score` function. Compare it to the flow chart for `score` in Figure \@ref(fig:subdivide2):
```r
if ( # Case 1: all the same <1>) {
prize <- # look up the prize <3>
} else if ( # Case 2: all bars <2> ) {
prize <- # assign $5 <4>
} else {
# count cherries <5>
prize <- # calculate a prize <7>
}
# count diamonds <6>
# double the prize if necessary <8>
```
Our skeleton is rather incomplete; there are many sections that are just code comments instead of real code. However, we've reduced the program to eight simple subtasks:
**<1>** - Test whether the symbols are three of a kind.
**<2>** - Test whether the symbols are all bars.
**<3>** - Look up the prize for three of a kind based on the common symbol.
**<4>** - Assign a prize of $5.
**<5>** - Count the number of cherries.
**<6>** - Count the number of diamonds.
**<7>** - Calculate a prize based on the number of cherries.
**<8>** - Adjust the prize for diamonds.
If you like, you can reorganize your flow chart around these tasks, as in Figure \@ref(fig:subdivide4). The chart will describe the same strategy, but in a more precise way. I'll use a diamond shape to symbolize an `if else` decision.
```{r subdivide4, echo = FALSE, fig.cap = "score can navigate three cases with two if else decisions. We can also break some of our tasks into two steps."}
knitr::include_graphics("images/hopr_0704.png")
```
Now we can work through the subtasks one at a time, adding R code to the `if` tree as we go. Each subtask will be easy to solve if you set up a concrete example to work with and try to describe a solution in English before coding in R.
The first subtask asks you to test whether the symbols are three of a kind. How should you begin writing the code for this subtask?
You know that the final `score` function will look something like this:
```r
score <- function(symbols) {
# calculate a prize
prize
}
```
Its argument, `symbols`, will be the output of `get_symbols`, a vector that contains three character strings. You could start writing `score` as I have written it, by defining an object named `score` and then slowly filling in the body of the function. However, this would be a bad idea. The eventual function will have eight separate parts, and it will not work correctly until _all_ of those parts are written (and themselves work correctly). This means you would have to write the entire `score` function before you could test any of the subtasks. If `score` doesn't work—which is very likely—you will not know which subtask needs fixed.
You can save yourself time and headaches if you focus on one subtask at a time. For each subtask, create a concrete example that you can test your code on. For example, you know that `score` will need to work on a vector named `symbols` that contains three character strings. If you make a real vector named `symbols`, you can run the code for many of your subtasks on the vector as you go:
```r
symbols <- c("7", "7", "7")
```
If a piece of code does not work on `symbols`, you will know that you need to fix it before you move on. You can change the value of `symbols` from subtask to subtask to ensure that your code works in every situation:
```r
symbols <- c("B", "BB", "BBB")
symbols <- c("C", "DD", "0")
```
Only combine your subtasks into a `score` function once each subtask works on a concrete example. If you follow this plan, you will spend more time using your functions and less time trying to figure out why they do not work.
After you set up a concrete example, try to describe how you will do the subtask in English. The more precisely you can describe your solution, the easier it will be to write your R code.
Our first subtask asks us to "test whether the symbols are three of a kind." This phrase does not suggest any useful R code to me. However, I could describe a more precise test for three of a kind: three symbols will be the same if the first symbol is equal to the second and the second symbol is equal to the third. Or, even more precisely:
_A vector named `symbols` will contain three of the same symbol if the first element of `symbols` is equal to the second element of `symbols` and the second element of `symbols` is equal to the third element of `symbols`_.
```{exercise, name = "Write a Test"}
Turn the preceding statement into a logical test written in R. Use your knowledge of logical tests, Boolean operators, and subsetting from [R Notation]. The test should work with the vector `symbols` and return a `TRUE` _if and only if_ each element in `symbols` is the same. Be sure to test your code on `symbols`.
```
```{solution}
Here are a couple of ways to test that `symbols` contains three of the same symbol. The first method parallels the English suggestion above, but there are other ways to do the same test. There is no right or wrong answer, so long as your solution works, which is easy to check because you've created a vector named `symbols`:
```
```r
symbols
## "7" "7" "7"
symbols[1] == symbols[2] & symbols[2] == symbols[3]
## TRUE
symbols[1] == symbols[2] & symbols[1] == symbols[3]
## TRUE
all(symbols == symbols[1])
## TRUE
```
As your vocabulary of R functions broadens, you'll think of more ways to do basic tasks. One method that I like for checking three of a kind is:
```r
length(unique(symbols) == 1)
```
The `unique` function returns every unique term that appears in a vector. If your `symbols` vector contains three of a kind (i.e., one unique term that appears three times), then `unique(symbols)` will return a vector of length `1`.
Now that you have a working test, you can add it to your slot-machine script:
```r
same <- symbols[1] == symbols[2] && symbols[2] == symbols[3]
if (same) {
prize <- # look up the prize
} else if ( # Case 2: all bars ) {
prize <- # assign $5
} else {
# count cherries
prize <- # calculate a prize
}
# count diamonds
# double the prize if necessary
```
```{block2, type = "rmdnote"}
`&&` and `||` behave like `&` and `|` but can sometimes be more efficient. The double operators will not evaluate the second test in a pair of tests if the first test makes the result clear. For example, if `symbols[1]` does not equal `symbols[2]` in the next expression, `&&` will not evaluate `symbols[2] == symbols[3]`; it can immediately return a `FALSE` for the whole expression (because `FALSE & TRUE` and `FALSE & FALSE` both evaluate to `FALSE`). This efficiency can speed up your programs; however, double operators are not appropriate everywhere. `&&` and `||` are not vectorized, which means they can only handle a single logical test on each side of the operator.
```
The second prize case occurs when all the symbols are a type of bar, for example, `B`, `BB`, and `BBB`. Let's begin by creating a concrete example to work with:
```r
symbols <- c("B", "BBB", "BB")
```
```{exercise, name = "Test for All Bars"}
Use R's logical and Boolean operators to write a test that will determine whether a vector named `symbols` contains only symbols that are a type of bar. Check whether your test works with our example `symbols` vector. Remember to describe how the test should work in English, and then convert the solution to R.
```
```{solution}
As with many things in R, there are multiple ways to test whether `symbols` contains all bars. For example, you could write a very long test that uses multiple Boolean operators, like this:
```
```r
(symbols[1] == "B" | symbols[1] == "BB" | symbols[1] == "BBB") &
(symbols[2] == "B" | symbols[2] == "BB" | symbols[2] == "BBB") &
(symbols[3] == "B" | symbols[3] == "BB" | symbols[3] == "BBB")
## TRUE
```
However, this is not a very efficient solution, because R has to run nine logical tests (and you have to type them). You can often replace multiple `|` operators with a single `%in%`. Also, you can check that a test is true for each element in a vector with `all`. These two changes shorten the preceding code to:
```r
all(symbols %in% c("B", "BB", "BBB"))
## TRUE
```
Let's add this code to our script:
```r
same <- symbols[1] == symbols[2] && symbols[2] == symbols[3]
bars <- symbols %in% c("B", "BB", "BBB")
if (same) {
prize <- # look up the prize
} else if (all(bars)) {
prize <- # assign $5
} else {
# count cherries
prize <- # calculate a prize
}
# count diamonds
# double the prize if necessary
```
You may have noticed that I split this test up into two steps, `bars` and `all(bars)`. That's just a matter of personal preference. Wherever possible, I like to write my code so it can be read with function and object names conveying what they do.
You also may have noticed that our test for Case 2 will capture some symbols that should be in Case 1 because they contain three of a kind:
```r
symbols <- c("B", "B", "B")
all(symbols %in% c("B", "BB", "BBB"))
## TRUE
```
That won't be a problem, however, because we've connected our cases with `else if` in the `if` tree. As soon as R comes to a case that evaluates to `TRUE`, it will skip over the rest of the tree. Think of it this way: each `else` tells R to only run the code that follows it _if none of the previous conditions have been met_. So when we have three of the same type of bar, R will evaluate the code for Case 1 and then skip the code for Case 2 (and Case 3).
Our next subtask is to assign a prize for `symbols`. When the `symbols` vector contains three of the same symbol, the prize will depend on which symbol there are three of. If there are three `DD`s, the prize will be $100; if there are three `7`s, the prize will be $80; and so on.
This suggests another `if` tree. You could assign a prize with some code like this:
```r
if (same) {
symbol <- symbols[1]
if (symbol == "DD") {
prize <- 800
} else if (symbol == "7") {
prize <- 80
} else if (symbol == "BBB") {
prize <- 40
} else if (symbol == "BB") {
prize <- 5
} else if (symbol == "B") {
prize <- 10
} else if (symbol == "C") {
prize <- 10
} else if (symbol == "0") {
prize <- 0
}
}
```
While this code will work, it is a bit long to write and read, and it may require R to perform multiple logical tests before delivering the correct prize. We can do better with a different method.
## Lookup Tables
Very often in R, the simplest way to do something will involve subsetting. How could you use subsetting here? Since you know the exact relationship between the symbols and their prizes, you can create a vector that captures this information. This vector can store symbols as names and prize values as elements:
```r
payouts <- c("DD" = 100, "7" = 80, "BBB" = 40, "BB" = 25,
"B" = 10, "C" = 10, "0" = 0)
payouts
## DD 7 BBB BB B C 0
## 100 80 40 25 10 10 0
```
Now you can extract the correct prize for any symbol by subsetting the vector with the symbol's name:
```r
payouts["DD"]
## DD
## 100
payouts["B"]
## B
## 10
```
If you want to leave behind the symbol's name when subsetting, you can run the `unname` function on the output:
```r
unname(payouts["DD"])
## 100
```
`unname` returns a copy of an object with the names attribute removed.
`payouts` is a type of _lookup table_, an R object that you can use to look up values. Subsetting `payouts` provides a simple way to find the prize for a symbol. It doesn't take many lines of code, and it does the same amount of work whether your symbol is `DD` or `0`. You can create lookup tables in R by creating named objects that can be subsetted in clever ways.
Sadly, our method is not quite automatic; we need to tell R which symbol to look up in `payouts`. Or do we? What would happen if you subsetted `payouts` by `symbols[1]`? Give it a try:
```r
symbols <- c("7", "7", "7")
symbols[1]
## "7"
payouts[symbols[1]]
## 7
## 80
symbols <- c("C", "C", "C")
payouts[symbols[1]]
## C
## 10
```
You don't need to know the exact symbol to look up because you can tell R to look up whichever symbol happens to be in `symbols`. You can find this symbol with `symbols[1]`, `symbols[2]`, or `symbols[3]`, because each contains the same symbol in this case. You now have a simple automated way to calculate the prize when `symbols` contains three of a kind. Let's add it to our code and then look at Case 2:
```r
same <- symbols[1] == symbols[2] && symbols[2] == symbols[3]
bars <- symbols %in% c("B", "BB", "BBB")
if (same) {
payouts <- c("DD" = 100, "7" = 80, "BBB" = 40, "BB" = 25,
"B" = 10, "C" = 10, "0" = 0)
prize <- unname(payouts[symbols[1]])
} else if (all(bars)) {
prize <- # assign $5
} else {
# count cherries
prize <- # calculate a prize
}
# count diamonds
# double the prize if necessary
```
Case 2 occurs whenever the symbols are all bars. In that case, the prize will be $5, which is easy to assign:
```r
same <- symbols[1] == symbols[2] && symbols[2] == symbols[3]
bars <- symbols %in% c("B", "BB", "BBB")
if (same) {
payouts <- c("DD" = 100, "7" = 80, "BBB" = 40, "BB" = 25,
"B" = 10, "C" = 10, "0" = 0)
prize <- unname(payouts[symbols[1]])
} else if (all(bars)) {
prize <- 5
} else {
# count cherries
prize <- # calculate a prize
}
# count diamonds
# double the prize if necessary
```
Now we can work on the last case. Here, you'll need to know how many cherries are in `symbols` before you can calculate a prize.
```{exercise, name = "Find C's"}
How can you tell which elements of a vector named `symbols` are a `C`? Devise a test and try it out.
```
```{block2, type = "rmdnote"}
**Challenge**
How might you count the number of `C`s in a vector named `symbols`? Remember R's coercion rules.
```
```{solution}
As always, let's work with a real example:
```
```r
symbols <- c("C", "DD", "C")
```
One way to test for cherries would be to check which, if any, of the symbols are a `C`:
```r
symbols == "C"
## TRUE FALSE TRUE
```
It'd be even more useful to count how many of the symbols are cherries. You can do this with `sum`, which expects numeric input, not logical. Knowing this, R will coerce the `TRUE`s and `FALSE`s to `1`s and `0`s before doing the summation. As a result, `sum` will return the number of `TRUE`s, which is also the number of cherries:
```r
sum(symbols == "C")
## 2
```
You can use the same method to count the number of diamonds in `symbols`:
```r
sum(symbols == "DD")
## 1
```
Let's add both of these subtasks to the program skeleton:
```r
same <- symbols[1] == symbols[2] && symbols[2] == symbols[3]
bars <- symbols %in% c("B", "BB", "BBB")
if (same) {
payouts <- c("DD" = 100, "7" = 80, "BBB" = 40, "BB" = 25,
"B" = 10, "C" = 10, "0" = 0)
prize <- unname(payouts[symbols[1]])
} else if (all(bars)) {
prize <- 5
} else {
cherries <- sum(symbols == "C")
prize <- # calculate a prize
}
diamonds <- sum(symbols == "DD")
# double the prize if necessary
```
Since Case 3 appears further down the `if` tree than Cases 1 and 2, the code in Case 3 will only be applied to players that do not have three of a kind or all bars. According to the slot machine's payout scheme, these players will win $5 if they have two cherries and $2 if they have one cherry. If the player has no cherries, she gets a prize of $0. We don't need to worry about three cherries because that outcome is already covered in Case 1.
As in Case 1, you could write an `if` tree that handles each combination of cherries, but just like in Case 1, this would be an inefficient solution:
```r
if (cherries == 2) {
prize <- 5
} else if (cherries == 1) {
prize <- 2
} else {}
prize <- 0
}
```
Again, I think the best solution will involve subsetting. If you are feeling ambitious, you can try to work this solution out on your own, but you will learn just as quickly by mentally working through the following proposed solution.
We know that our prize should be $0 if we have no cherries, $2 if we have one cherry, and $5 if we have two cherries. You can create a vector that contains this information. This will be a very simple lookup table:
```r
c(0, 2, 5)
```
Now, like in Case 1, you can subset the vector to retrieve the correct prize. In this case, the prize's aren't identified by a symbol name, but by the number of cherries present. Do we have that information? Yes, it is stored in `cherries`. We can use basic integer subsetting to get the correct prize from the prior lookup table, for example, `c(0, 2, 5)[1]`.
`cherries` isn't exactly suited for integer subsetting because it could contain a zero, but that's easy to fix. We can subset with `cherries + 1`. Now when `cherries` equals zero, we have:
```r
cherries + 1
## 1
c(0, 2, 5)[cherries + 1]
## 0
```
When `cherries` equals one, we have:
```r
cherries + 1
## 2
c(0, 2, 5)[cherries + 1]
## 2
```
And when `cherries` equals two, we have:
```r
cherries + 1
## 3
c(0, 2, 5)[cherries + 1]
## 5
```
Examine these solutions until you are satisfied that they return the correct prize for each number of cherries. Then add the code to your script, as follows:
```r
same <- symbols[1] == symbols[2] && symbols[2] == symbols[3]
bars <- symbols %in% c("B", "BB", "BBB")
if (same) {
payouts <- c("DD" = 100, "7" = 80, "BBB" = 40, "BB" = 25,
"B" = 10, "C" = 10, "0" = 0)
prize <- unname(payouts[symbols[1]])
} else if (all(bars)) {
prize <- 5
} else {
cherries <- sum(symbols == "C")
prize <- c(0, 2, 5)[cherries + 1]
}
diamonds <- sum(symbols == "DD")
# double the prize if necessary
```
```{block2, type = "rmdimportant"}
**Lookup Tables Versus if Trees**
This is the second time we've created a lookup table to avoid writing an `if` tree. Why is this technique helpful and why does it keep appearing? Many `if` trees in R are essential. They provide a useful way to tell R to use different algorithms in different cases. However, `if` trees are not appropriate everywhere.
`if` trees have a couple of drawbacks. First, they require R to run multiple tests as it works its way down the `if` tree, which can create unnecessary work. Second, as you'll see in [Speed], it can be difficult to use `if` trees in vectorized code, a style of code that takes advantage of R's programming strengths to create fast programs. Lookup tables do not suffer from either of these drawbacks.
You won't be able to replace every `if` tree with a lookup table, nor should you. However, you can usually use lookup tables to avoid assigning variables with `if` trees. As a general rule, use an `if` tree if each branch of the tree runs different _code_. Use a lookup table if each branch of the tree only assigns a different _value_.
To convert an `if` tree to a lookup table, identify the values to be assigned and store them in a vector. Next, identify the selection criteria used in the conditions of the `if` tree. If the conditions use character strings, give your vector names and use name-based subsetting. If the conditions use integers, use integer-based subsetting.
```
The final subtask is to double the prize once for every diamond present. This means that the final prize will be some multiple of the current prize. For example, if no diamonds are present, the prize will be:
```r
prize * 1 # 1 = 2 ^ 0
```
If one diamond is present, it will be:
```r
prize * 2 # 2 = 2 ^ 1
```
If two diamonds are present, it will be:
```r
prize * 4 # 4 = 2 ^ 2
```
And if three diamonds are present, it will be:
```r
prize * 8 # 8 = 2 ^ 3
```
Can you think of an easy way to handle this? How about something similar to these examples?
```{exercise, name = "Adjust for Diamonds"}
Write a method for adjusting `prize` based on `diamonds`. Describe a solution in English first, and then write your code.
```
```{solution}
Here is a concise solution inspired by the previous pattern. The adjusted prize will equal:
```
```r
prize * 2 ^ diamonds
```
which gives us our final `score` script:
```r
same <- symbols[1] == symbols[2] && symbols[2] == symbols[3]
bars <- symbols %in% c("B", "BB", "BBB")
if (same) {
payouts <- c("DD" = 100, "7" = 80, "BBB" = 40, "BB" = 25,
"B" = 10, "C" = 10, "0" = 0)
prize <- unname(payouts[symbols[1]])
} else if (all(bars)) {
prize <- 5
} else {
cherries <- sum(symbols == "C")
prize <- c(0, 2, 5)[cherries + 1]
}
diamonds <- sum(symbols == "DD")
prize * 2 ^ diamonds
```
## Code Comments
You now have a working score script that you can save to a function. Before you save your script, though, consider adding comments to your code with a `#`. Comments can make your code easier to understand by explaining _why_ the code does what it does. You can also use comments to break long programs into scannable chunks. For example, I would include three comments in the `score` code:
```r
# identify case
same <- symbols[1] == symbols[2] && symbols[2] == symbols[3]
bars <- symbols %in% c("B", "BB", "BBB")
# get prize
if (same) {
payouts <- c("DD" = 100, "7" = 80, "BBB" = 40, "BB" = 25,
"B" = 10, "C" = 10, "0" = 0)
prize <- unname(payouts[symbols[1]])
} else if (all(bars)) {
prize <- 5
} else {
cherries <- sum(symbols == "C")
prize <- c(0, 2, 5)[cherries + 1]
}
# adjust for diamonds
diamonds <- sum(symbols == "DD")
prize * 2 ^ diamonds
```
Now that each part of your code works, you can wrap it into a function with the methods you learned in [Writing Your Own Functions](#write-functions). Either use RStudio's Extract Function option in the menu bar under Code, or use the `function` function. Ensure that the last line of the function returns a result (it does), and identify any arguments used by your function. Often the concrete examples that you used to test your code, like `symbols`, will become the arguments of your function. Run the following code to start using the `score` function:
```r
score <- function (symbols) {
# identify case
same <- symbols[1] == symbols[2] && symbols[2] == symbols[3]
bars <- symbols %in% c("B", "BB", "BBB")
# get prize
if (same) {
payouts <- c("DD" = 100, "7" = 80, "BBB" = 40, "BB" = 25,
"B" = 10, "C" = 10, "0" = 0)
prize <- unname(payouts[symbols[1]])
} else if (all(bars)) {
prize <- 5
} else {
cherries <- sum(symbols == "C")
prize <- c(0, 2, 5)[cherries + 1]
}
# adjust for diamonds
diamonds <- sum(symbols == "DD")
prize * 2 ^ diamonds
}
```
Once you have defined the `score` function, the `play` function will work as well:
```r
play <- function() {
symbols <- get_symbols()
print(symbols)
score(symbols)
}
```
Now it is easy to play the slot machine:
```r
play()
## "0" "BB" "B"
## 0
play()
## "DD" "0" "B"
## 0
play()
## "BB" "BB" "B"
## 25
```
## Summary
An R program is a set of instructions for your computer to follow that has been organized into a sequence of steps and cases. This may make programs seem simple, but don't be fooled: you can create complicated results with the right combination of simple steps (and cases).
As a programmer, you are more likely to be fooled in the opposite way. A program may seem impossible to write when you know that it must do something impressive. Do not panic in these situations. Divide the job before you into simple tasks, and then divide the tasks again. You can visualize the relationship between tasks with a flow chart if it helps. Then work on the subtasks one at a time. Describe solutions in English, then convert them to R code. Test each solution against concrete examples as you go. Once each of your subtasks works, combine your code into a function that you can share and reuse.
R provides tools that can help you do this. You can manage cases with `if` and `else` statements. You can create a lookup table with objects and subsetting. You can add code comments with `#`. And you can save your programs as a function with `function`.
Things often go wrong when people write programs. It will be up to you to find the source of any errors that occur and to fix them. It should be easy to find the source of your errors if you use a stepwise approach to writing functions, writing—and then testing—one bit at a time. However, if the source of an error eludes you, or you find yourself working with large chunks of untested code, consider using R's built in debugging tools, described in [Debugging R Code].
The next two chapters will teach you more tools that you can use in your programs. As you master these tools, you will find it easier to write R programs that let you do whatever you wish to your data. In [S3], you will learn how to use R's S3 system, an invisible hand that shapes many parts of R. You will use the system to build a custom class for your slot machine output, and you will tell R how to display objects that have your class.