-
Notifications
You must be signed in to change notification settings - Fork 172
/
inference-paired-means.qmd
995 lines (851 loc) · 40.1 KB
/
inference-paired-means.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
# Inference for comparing paired means {#sec-inference-paired-means}
```{r}
#| include: false
source("_common.R")
```
::: {.chapterintro data-latex=""}
In [Chapter -@sec-inference-two-means] analysis was done to compare the average population value across two different groups.
Recall that one of the important conditions in doing a two-sample analysis is that the two groups are independent.
Here, independence across groups means that knowledge of the observations in one group does not change what we would expect to happen in the other group.
But what happens if the groups are **dependent**?
Sometimes dependency is not something that can be addressed through a statistical method.
However, a particular dependency, **pairing**, can be modeled quite effectively using many of the same tools we have already covered in this text.
:::
Paired data represent a particular type of experimental structure where the analysis is somewhat akin to a one-sample analysis (see [Chapter -@sec-inference-one-mean]) but has other features that resemble a two-sample analysis (see [Chapter -@sec-inference-two-means]).
As with a two-sample analysis, quantitative measurements are made on each of two different levels of the explanatory variable.
However, because the observational unit is **paired** across the two groups, the two measurements are subtracted such that only the difference is retained.
@tbl-pairedexamples presents some examples of studies where paired designs were implemented.
```{r}
#| label: tbl-pairedexamples
#| tbl-cap: |
#| Examples of studies where a paired design is used to measure the
#| difference in the measurement over two conditions.
#| tbl-pos: H
paired_study_examples <- tribble(
~variable, ~col1, ~col2, ~col3,
"Car", "Smooth Turn vs Quick Spin", "amount of tire tread after 1,000 miles", "difference in tread",
"Textbook", "UCLA vs Amazon", "price of new text", "difference in price",
"Individual person", "Pre-course vs Post-course", "exam score", "difference in score"
)
paired_study_examples |>
kbl(
linesep = "", booktabs = TRUE,
col.names = c("Observational unit", "Comparison groups", "Measurement", "Value of interest")
) |>
kable_styling(
bootstrap_options = c("striped", "condensed"),
latex_options = c("striped"),
full_width = FALSE
) |>
column_spec(1:3, width = "8em")
```
::: {.important data-latex=""}
**Paired data.**
\index{paired data}
Two sets of observations are *paired* if each observation in one set has a special correspondence or connection with exactly one observation in the other dataset.
:::
It is worth noting that if mathematical modeling is chosen as the analysis tool, paired data inference on the difference in measurements will be identical to the one-sample mathematical techniques described in [Chapter -@sec-inference-one-mean].
However, recall from [Chapter -@sec-inference-one-mean] that with pure one-sample data, the computational tools for hypothesis testing are not easy to implement and were not presented (although the bootstrap was presented as a computational approach for constructing a one sample confidence interval).
With paired data, the randomization test fits nicely with the structure of the experiment and is presented here.
```{r}
#| include: false
terms_chp_21 <- c("paired data")
```
## Randomization test for the mean paired difference
Consider an experiment done to measure whether tire brand Smooth Turn or tire brand Quick Spin has longer tread wear (in cm).
That is, after 1,000 miles on a car, which brand of tires has more tread, on average?
### Observed data
The observed data represent 25 tread measurements (in cm) taken on 25 tires of Smooth Turn and 25 tires of Quick Spin.
The study used a total of 25 cars, so on each car, one tire was of Smooth Turn and one was of Quick Spin.
The mean tread for the Quick Spin tires was 0.308 cm and the mean tread for the Smooth Turn tires was 0.310 cm.
@fig-tiredata presents the observed data, calculations on tread remaining (in cm).
The Smooth Turn manufacturer looks at the box plots and says:
> *Clearly the tread on Smooth Turn tires is higher, on average, than the tread on Quick Spin tires after 1,000 miles of driving.*
The Quick Spin manufacturer is skeptical and retorts:
> *But with only 25 cars, it seems that the variability in road conditions (sometimes one tire hits a pothole, etc.) could be what leads to the small difference in average tread amount.*
```{r}
#| label: fig-tiredata
#| fig-cap: |
#| Box plots of the tire tread data (in cm) and the brand of tire from which
#| the original measurements came.
#| fig-alt: |
#| Box plots of the amount of tire tread for each of the two
#| brands of tires with data values superimposed over the box plots.
#| Each superimposed dot represents a car that drove with both types
#| of tires. A grey line connects each car across the two box plots
#| indicating that Smooth Turn has more tire wear than Quick Spin.
set.seed(47)
brandA <- rnorm(25, 0.310, 0.003)
brandB <- rnorm(25, 0.308, 0.003)
car <- c(paste("car", 1:25))
miny <- min(brandA, brandB) - .003
maxy <- max(brandA, brandB) + .003
tires <- tibble(
tread = c(brandA, brandB),
car = rep(car, 2),
brand = c(rep("Smooth Turn", 25), rep("Quick Spin", 25))
)
orig_means <- tires |>
group_by(brand) |>
summarize(mean_tread = round(mean(tread), 4)) |>
mutate(
mean_label = c("bar(x)[QS]", "bar(x)[ST]"),
mean_label = paste(mean_label, "==", mean_tread)
)
ggplot(tires, aes(
x = brand, y = tread,
color = brand, shape = brand
)) +
geom_boxplot(
show.legend = FALSE,
outlier.shape = "triangle"
) +
geom_text(aes(label = car),
color = "grey",
hjust = rep(c(-0.15, 1.3), each = 25),
show.legend = FALSE, size = 4
) +
geom_line(aes(group = car), color = "grey", linewidth = 0.25) +
geom_point(show.legend = FALSE) +
geom_text(
data = orig_means,
aes(label = mean_label, y = 0.318),
parse = TRUE, show.legend = FALSE
) +
scale_color_manual(values = c(IMSCOL["blue", "full"], IMSCOL["red", "full"])) +
labs(
x = "Tire brand",
y = NULL,
title = "Original data"
)
```
We'd like to be able to systematically distinguish between what the Smooth Turn manufacturer sees in the plot and what the Quick Spin manufacturer sees in the plot.
Fortunately for us, we have an excellent way to simulate the natural variability (from road conditions, etc.) that can lead to tires being worn at different rates.
### Variability of the statistic
A randomization test will identify whether the differences seen in the box plot of the original data in @fig-tiredata could have happened just by chance variability.
As before, we will simulate the variability in the study under the assumption that the null hypothesis is true.
In this study, the null hypothesis is that average tire tread wear is the same across Smooth Turn and Quick Spin tires.
- $H_0: \mu_{diff} = 0,$ the average tread wear is the same for the two tire brands.
- $H_A: \mu_{diff} \ne 0,$ the average tread wear is different across the two tire brands.
When observations are paired, the randomization process randomly assigns the tire brand to each of the observed tread values.
Note that in the randomization test for the two-sample mean setting (see @sec-rand2mean) the explanatory variable was *also* randomly assigned to the responses.
The change in the paired setting, however, is that the assignment happens *within* an observational unit (here, a car).
Remember, if the null hypothesis is true, it will not matter which brand is put on which tire because the overall tread wear will be the same across pairs.
@fig-tiredata4 and @fig-tiredata5 show that the random assignment of group (tire brand) happens within a single car.
That is, every single car will still have one tire of each type.
In the first randomization, it just so happens that the 4th car's tire brands were swapped and the 5th car's tire brands were not swapped.
```{r}
#| label: fig-tiredata4
#| fig-cap: |
#| The 4th car: the tire brand was randomly permuted, and in the randomization
#| calculation, the measurements (in cm) ended up in different groups.
#| fig-alt: |
#| Line plot connecting the tread for the 4th car in the dataset. The first
#| plot is the original data and the second plot is the permuted data where
#| the groups happened to get permuted randomly.
#| fig-asp: 0.5
set.seed(47)
permdata <- tires |>
group_by(car) |>
mutate(random_brand = sample(brand))
origplot4 <- tires |>
filter(car == "car 4") |>
ggplot(aes(
x = brand, y = tread,
color = brand, shape = brand
)) +
geom_line(aes(group = car), color = "grey") +
geom_point(size = 3, show.legend = FALSE) +
geom_text(aes(label = car),
color = "darkgrey",
hjust = rep(c(-0.15, 1.3), each = 1),
show.legend = FALSE, size = 6
) +
ylim(c(miny, maxy)) +
labs(
x = "Brand of tire",
y = NULL,
color = NULL, shape = NULL,
title = "Original data"
) +
scale_color_manual(values = c(IMSCOL["blue", "full"], IMSCOL["red", "full"]))
shuffbrand4 <- permdata |>
filter(car == "car 4") |>
ggplot(aes(
x = brand, y = tread,
color = random_brand, shape = random_brand
)) +
geom_line(aes(group = car), color = "grey") +
geom_point(size = 3) +
geom_text(aes(label = car),
color = "darkgrey",
hjust = rep(c(-0.15, 1.3), each = 1),
show.legend = FALSE, size = 6
) +
ylim(c(miny, maxy)) +
scale_color_manual(values = c(IMSCOL["blue", "full"], IMSCOL["red", "full"])) +
theme_void() +
annotate(
"text",
x = 1.5, y = 0.315,
label = "Shuffled assignment\nof tire brand",
size = 4
) +
theme(
legend.position = c(0.6, 0.1),
legend.background = element_rect(color = "white")
) +
labs(color = NULL, shape = NULL)
origplot4 + shuffbrand4
```
```{r}
#| label: fig-tiredata5
#| fig-cap: |
#| The 5th car: the tire brand was randomly permuted to stay the same! In the
#| randomization calculation, the measurements (in cm) ended up in the
#| original groups.
#| fig-alt: |
#| Line plot connecting the tread for the 5th car in the dataset. The first
#| plot is the original data and the second plot is the permuted data where
#| the groups happened to stay connected to the original measurements.
#| fig-asp: 0.5
origplot5 <- tires |>
filter(car == "car 5") |>
ggplot(aes(
x = brand, y = tread,
color = brand, shape = brand
)) +
geom_line(aes(group = car), color = "grey") +
geom_point(size = 3, show.legend = FALSE) +
geom_text(aes(label = car),
color = "darkgrey",
hjust = rep(c(-0.15, 1.3), each = 1),
show.legend = FALSE, size = 6
) +
ylim(c(miny, maxy)) +
labs(
x = "Brand of tire",
y = NULL,
color = NULL, shape = NULL,
title = "Original data"
) +
scale_color_manual(values = c(IMSCOL["blue", "full"], IMSCOL["red", "full"]))
shuffbrand5 <- permdata |>
filter(car == "car 5") |>
ggplot(aes(
x = brand, y = tread,
color = random_brand, shape = random_brand
)) +
geom_line(aes(group = car), color = "grey") +
geom_point(size = 3) +
geom_text(aes(label = car),
color = "darkgrey",
hjust = rep(c(-0.15, 1.3), each = 1),
show.legend = FALSE, size = 6
) +
ylim(c(miny, maxy)) +
scale_color_manual(values = c(IMSCOL["blue", "full"], IMSCOL["red", "full"])) +
theme_void() +
annotate(
"text",
x = 1.5, y = 0.315,
label = "Shuffled assignment\nof tire brand",
size = 4
) +
theme(
legend.position = c(0.6, 0.1),
legend.background = element_rect(color = "white")
) +
labs(color = NULL, shape = NULL)
origplot5 + shuffbrand5
```
We can put the shuffled assignments for all the cars into one plot as seen in @fig-tiredataPerm-2.
```{r}
#| label: fig-tiredataPerm
#| fig-cap: |
#| Tire tread (in cm) by brand, original and shuffled. As evidenced by the
#| colors, some of the cars kept their original tire assignments and some
#| cars swapped the tire assignments.
#| fig-subcap:
#| - Brand of tire is the original brand.
#| - Brand of tire is the shuffled brand assignment.
#| fig-alt: |
#| Line plot connecting the tread for the all of the cars in the
#| dataset. The first plot is the original data and the second plot is
#| the permuted data where some of the brands are connect to the
#| original tread measurements and some of the brands have been swapped
#| across the two tread measurements, within a car.
#| out-width: 100%
#| layout-ncol: 2
#| fig-width: 5
#| fig-asp: 0.8
origplot <- tires |>
ggplot(aes(
x = brand, y = tread,
color = brand, shape = brand
)) +
geom_line(aes(group = car), color = "grey") +
geom_point(size = 3, show.legend = FALSE) +
geom_text(aes(label = car),
color = "grey",
hjust = rep(c(-0.15, 1.3), each = nrow(tires) / 2),
show.legend = FALSE, size = 3
) +
ylim(c(miny, maxy)) +
labs(
x = "Brand of tire",
y = NULL,
color = NULL, shape = NULL,
title = "Original data"
) +
scale_color_manual(values = c(IMSCOL["blue", "full"], IMSCOL["red", "full"]))
shuffbrand <- permdata |>
ggplot(aes(
x = brand, y = tread,
color = random_brand, shape = random_brand
)) +
geom_line(aes(group = car), color = "grey") +
geom_point(size = 3) +
geom_text(aes(label = car),
color = "grey",
hjust = rep(c(-0.15, 1.3), each = nrow(tires) / 2),
show.legend = FALSE, size = 3
) +
ylim(c(miny, maxy)) +
scale_color_manual(values = c(IMSCOL["blue", "full"], IMSCOL["red", "full"])) +
theme_void() +
theme(
legend.position = c(0.6, 0.1),
legend.background = element_rect(color = "white")
) +
labs(
title = "Shuffled data",
color = NULL, shape = NULL
)
origplot
shuffbrand
```
The next step in the randomization test is to sort the brands so that the assigned brand value on the x-axis aligns with the assigned group from the randomization.
@fig-tiredataPermSort-1 shows the same randomized groups, as seen in @fig-tiredataPerm-2 previously.
However, @fig-tiredataPermSort-2 sorts the randomized groups so that we can measure the variability *across* groups as compared to the variability *within* groups.
```{r}
#| label: fig-tiredataPermSort
#| fig-cap: |
#| Tire tread (in cm) by brand.
#| fig-subcap:
#| - Randomized brand assignment
#| - Randomized brand assignment sorted by brand.
#| out-width: 100%
#| fig-alt: |
#| Scatterplot of tread on the y axis and tire brand on the
#| x axis. The left panel has the observed tread values matched to
#| the original tire brand for the x axis location. The
#| observations are colored based on the permuted tire brand
#| assigned in the randomization. The right panel has the tread
#| values matched to the permuted tire brand, so some observations
#| have swapped orientation. The observations are also colored by
#| the permuted tire brand assigned in the randomization. In the
#| right panel, the two tire brands seem equivalent with respect
#| to tire wear.
#| layout-ncol: 2
#| fig-width: 5
#| fig-asp: 0.8
permed_means <- permdata |>
group_by(random_brand) |>
summarize(mean_tread = round(mean(tread), 4)) |>
mutate(mean_label = c("bar(x)[QSsim1]", "bar(x)[STsim1]")) |>
mutate(mean_label = paste(mean_label, "==", mean_tread))
shuffbrandfull <- permdata |>
ggplot(aes(
x = random_brand, y = tread,
color = random_brand, shape = random_brand
)) +
geom_text(aes(label = car),
color = "grey",
hjust = rep(c(-0.15, 1.3), each = 25),
show.legend = FALSE, size = 3
) +
geom_line(aes(group = car), color = "grey") +
ylim(c(miny, maxy)) +
geom_point(size = 3, show.legend = FALSE) +
geom_text(
data = permed_means,
aes(label = mean_label, y = 0.318),
parse = TRUE, show.legend = FALSE
) +
labs(
x = "Randomized brand of tire",
y = NULL,
title = "Sorted data"
) +
scale_color_manual(values = c(IMSCOL["blue", "full"], IMSCOL["red", "full"]))
shuffbrand
shuffbrandfull
```
@fig-tiredatarand1 presents a second randomization of the data.
Notice that the two observations from the same car are linked with a grey line; some of the tread values have been randomly assigned to the other tire brand, while some are still connected to their original tire brands.
```{r}
#| label: fig-tiredatarand1
#| fig-cap: |
#| A second randomization where the brand is randomly swapped (or not)
#| across the two tread wear measurements (in cm) from the same car.
#| fig-alt: |
#| Box plots and scatterplot with tire brand on the x-axis and
#| treat on the y-axis. The points are assigned to the x-axis brand
#| given by the permutation, but the plot differs from previous figures
#| in that it is a second permutation of the brands. Again, the two
#| permuted brands seem equivalent with respect to tire wear.
set.seed(47)
tires1 <- tires |>
group_by(car) |>
mutate(random_brand = sample(brand))
permed_means <- tires1 |>
group_by(random_brand) |>
summarize(mean_tread = round(mean(tread), 4)) |>
mutate(mean_label = c("bar(x)[QSsim2]", "bar(x)[STsim2]")) |>
mutate(mean_label = paste(mean_label, "==", mean_tread))
ggplot(
tires1, aes(
x = random_brand, y = tread,
color = random_brand, shape = random_brand
)
) +
geom_boxplot(
show.legend = FALSE, color = "black",
outlier.shape = "triangle"
) +
geom_line(aes(group = car), color = "grey") +
geom_point(size = 2, show.legend = FALSE) +
geom_text(
data = permed_means,
aes(label = mean_label, y = 0.318),
parse = TRUE, show.legend = FALSE
) +
scale_color_manual(values = c(IMSCOL["blue", "full"], IMSCOL["red", "full"])) +
labs(
x = "Brand of tires, randomly assigned (2)",
y = NULL
)
```
@fig-tiredatarand2 presents yet another randomization of the data.
Again, the same observations are linked by a grey line, and some of the tread values have been randomly assigned to the opposite tire brand than they were originally (while some are still connected to their original tire brands).
```{r}
#| label: fig-tiredatarand2
#| fig-cap: |
#| An additional randomization where the brand is randomly swapped (or not)
#| across the two tread wear measurements (in cm) from the same car.
#| fig-alt: |
#| Box plots and scatterplot with tire brand on the x-axis and
#| treat on the y-axis. The points are assigned to the x-axis brand
#| given by the permutation, but the plot differs from previous figures
#| in that it is a second permutation of the brands. The additional
#| permutation demonstrates that the box plots
#| continue to change for each permutation yet the tire tread is
#| equivalent across the permuted groups.
set.seed(4747)
tires2 <- tires |>
group_by(car) |>
mutate(random_brand = sample(brand))
permed_means <- tires2 |>
group_by(random_brand) |>
summarize(mean_tread = round(mean(tread), 4)) |>
mutate(mean_label = c("bar(x)[QSsim3]", "bar(x)[STsim3]")) |>
mutate(mean_label = paste(mean_label, "==", mean_tread))
ggplot(
tires2,
aes(
x = random_brand, y = tread,
color = random_brand, shape = random_brand
)
) +
geom_boxplot(
show.legend = FALSE, color = "black",
outlier.shape = "triangle"
) +
geom_line(aes(group = car), color = "grey") +
geom_point(size = 2, show.legend = FALSE) +
geom_text(
data = permed_means,
aes(label = mean_label, y = 0.318),
parse = TRUE, show.legend = FALSE
) +
scale_color_manual(values = c(IMSCOL["blue", "full"], IMSCOL["red", "full"])) +
labs(
x = "Brand of tires, randomly assigned (3)",
y = NULL
)
```
### Observed statistic vs. null statistics
By repeating the randomization process, we can create a distribution of the average of the differences in tire treads, as seen in @fig-pair-randomize.
As expected (because the differences were generated under the null hypothesis), the center of the histogram is zero.
A line has been drawn at the observed difference which is well outside the majority of the null differences simulated from natural variability by mixing up which the tire received Smooth Turn and which received Quick Spin.
Because the observed statistic is so far away from the natural variability of the randomized differences, we are convinced that there is a difference between Smooth Turn and Quick Spin.
Our conclusion is that the extra amount of average tire tread in Smooth Turn is due to more than just natural variability: we reject $H_0$ and conclude that $\mu_{ST} \ne \mu_{QS}.$
```{r}
#| label: fig-pairrandfull
#| fig-cap: process of randomizing across pairs.
#| warning: false
#| out-width: 75%
#| eval: false
#| echo: false
include_graphics("images/pairrandfull.png")
```
```{r}
#| label: fig-pair-randomize
#| fig-cap: |
#| Histogram of 1,000 mean differences with tire brand randomly assigned across
#| the two tread measurements (in cm) per pair.
#| fig-alt: |
#| Histogram of the average difference in tire wear, Quick Spin
#| minus Smooth Turn over 1000 different permutations. The histogram is
#| centered at 0 and spreads to approximately -0.0025 and 0.0025. A
#| red line at approximately 0.002 indicates the observed difference in
#| average trend from the original data.
set.seed(474756)
tires_shuffled <- tires |>
pivot_wider(
id_cols = car, names_from = brand, names_prefix = "brand_",
values_from = tread
) |>
mutate(diff_tread = `brand_Quick Spin` - `brand_Smooth Turn`) |>
specify(response = diff_tread) |>
hypothesize(null = "paired independence") |>
generate(1000, type = "permute") |>
calculate(stat = "mean")
ggplot(tires_shuffled, aes(x = stat)) +
geom_histogram(binwidth = 0.0002) +
labs(
x = "Mean of randomized differences of tire wear\n(Quick Spin - Smooth Turn)",
title = "1,000 means of randomized differences",
y = "Count"
) +
geom_vline(
xintercept = mean(brandA) - mean(brandB),
color = IMSCOL["red", "full"]
)
```
## Bootstrap confidence interval for the mean paired difference
For both the bootstrap and the mathematical models applied to paired data, the analysis is virtually identical to the one-sample approach given in [Chapter -@sec-inference-one-mean].
The key to working with paired data (for bootstrapping and mathematical approaches) is to consider the measurement of interest to be the difference in measured values across the pair of observations.
\index{bootstrap!CI paired difference}\index{bootstrap CI paired difference}
```{r}
#| include: false
terms_chp_21 <- c(terms_chp_21, "bootstrap CI paired difference")
```
### Observed data
In an earlier edition of this textbook, we found that Amazon prices were, on average, lower than those of the UCLA Bookstore for UCLA courses in 2010.
It's been several years, and many stores have adapted to the online market, so we wondered, how is the UCLA Bookstore doing today?
We sampled 201 UCLA courses.
Of those, 68 required books could be found on Amazon.
A portion of the dataset from these courses is shown in @tbl-textbooksDF, where prices are in US dollars.
::: {.data data-latex=""}
The [`ucla_textbooks_f18`](http://openintrostat.github.io/openintro/reference/ucla_textbooks_f18.html) data can be found in the [**openintro**](http://openintrostat.github.io/openintro) R package.
:::
```{r}
#| label: tbl-textbooksDF
#| tbl-cap: Four cases from the `ucla_textbooks_f18` dataset.
#| tbl-pos: H
ucla_textbooks_f18 |>
select(subject, course_num, bookstore_new, amazon_new) |>
mutate(price_diff = bookstore_new - amazon_new) |>
filter(!is.na(bookstore_new) & !is.na(amazon_new)) |>
head(4) |>
kbl(linesep = "", booktabs = TRUE) |>
kable_styling(
bootstrap_options = c("striped", "condensed"),
latex_options = c("striped"), full_width = FALSE
)
```
\index{paired data}
Each textbook has two corresponding prices in the dataset: one for the UCLA Bookstore and one for Amazon.
When two sets of observations have this special correspondence, they are said to be **paired**.
\clearpage
### Variability of the statistic
Following the example of bootstrapping the one-sample statistic, the observed *differences* can be bootstrapped in order to understand the variability of the average difference from sample to sample.
Remember, the differences act as a single value to bootstrap.
That is, the original dataset would include the list of 68 price differences, and each resample will also include 68 price differences (some repeated through the bootstrap resampling process).
The bootstrap procedure for paired differences is quite similar to the procedure applied to the one-sample statistic case in @sec-boot1mean.
In @fig-pairboot, two 99% confidence intervals for the difference in the cost of a new book at the UCLA bookstore compared with Amazon have been calculated.
The bootstrap percentile confidence interval is computed using the 0.5 percentile and 99.5 percentile bootstrapped differences and is found to be (\$0.25, \$7.87).
::: {.guidedpractice data-latex=""}
Using the histogram of bootstrapped difference in means, estimate the standard error of the mean of the sample differences, $\bar{x}_{diff}.$[^21-inference-paired-means-1]
:::
[^21-inference-paired-means-1]: The bootstrapped differences in sample means vary roughly from 0.7 to 7.5, a range of \$6.80.
Although the bootstrap distribution is not symmetric, we use the empirical rule (that with bell-shaped distributions, most observations are within two standard errors of the center), the standard error of the mean differences is approximately \$1.70.
You might note that the standard error calculation given in @sec-mathpaired is $SE(\bar{x}_{diff}) = \sqrt{s^2_{diff}/n_{diff}} = \sqrt{13.4^2/68} = \$1.62$ (values from @sec-mathpaired), very close to the bootstrap approximation.
The bootstrap SE interval is found by computing the SE of the bootstrapped differences $(SE_{\overline{x}_{diff}} = \$1.64)$ and the normal multiplier of $z^{\star} = 2.58.$ The averaged difference is $\bar{x} = \$3.58.$ The 99% confidence interval is: $\$3.58 \pm 2.58 \times \$ 1.64 = (\$-0.65, \$7.81).$
The confidence intervals seem to indicate that the UCLA bookstore price is, on average, higher than the Amazon price, as the majority of the confidence interval is positive.
However, if the analysis required a strong degree of certainty (e.g., 99% confidence), and the bootstrap SE interval was most appropriate (given a second course in statistics the nuances of the methods can be investigated), the results of which book seller is higher is not well determined (because the bootstrap SE interval overlaps zero).
That is, the 99% bootstrap SE interval gives potential for UCLA bookstore to be lower, on average, than Amazon (because of the possible negative values for the true mean difference in price).
```{r}
#| label: fig-pairboot
#| fig-cap: |
#| Bootstrap distribution for the average difference in new book
#| price at the UCLA bookstore versus Amazon. 99% confidence intervals
#| are superimposed using blue dashed (bootstrap percentile interval) and
#| red dotted (bootstrap SE interval) lines.
#| fig-alt: |
#| Histogram showing the distribution of the average
#| bootstrapped difference of price, UCLA minus Amazon. The center of
#| the distribution is given at approximately $3.75. Two bootstrap
#| intervals are given. The percentile interval is approximately $0.25
#| to $8.50. The SE interval is approximately -$0.50 to $7.75.
diff_price <- ucla_textbooks_f18 |>
select(subject, course_num, bookstore_new, amazon_new) |>
mutate(price_diff = bookstore_new - amazon_new) |>
filter(!is.na(bookstore_new) & !is.na(amazon_new)) |>
specify(response = price_diff) |>
calculate(stat = "mean")
boot_diff <- ucla_textbooks_f18 |>
select(subject, course_num, bookstore_new, amazon_new) |>
mutate(price_diff = bookstore_new - amazon_new) |>
filter(!is.na(bookstore_new) & !is.na(amazon_new)) |>
specify(response = price_diff) |>
generate(reps = 1000, type = "bootstrap") |>
calculate(stat = "mean")
ci_perc_diff <- boot_diff |>
get_confidence_interval(level = 0.99, type = "percentile")
ci_se_diff <- boot_diff |>
get_confidence_interval(
level = 0.99, type = "se",
point_estimate = diff_price
)
boot_diff |>
infer::visualize(bins = 20) +
infer::shade_confidence_interval(ci_perc_diff,
color = IMSCOL["blue", "full"],
fill = NULL, linewidth = 1, linetype = "dashed"
) +
infer::shade_confidence_interval(ci_se_diff,
color = IMSCOL["red", "full"],
fill = NULL, linewidth = 1, linetype = "dotted"
) +
geom_vline(xintercept = diff_price$stat, linewidth = 1) +
geom_line(aes(y = replicate, x = stat, color = "a", linetype = "a"), alpha = 0, linewidth = 1) + # bogus code
geom_line(aes(y = replicate, x = stat, color = "b", linetype = "b"), alpha = 0, linewidth = 1) + # bogus code
geom_line(aes(y = replicate, x = stat, color = "c", linetype = "c"), alpha = 0) + # bogus code
ylim(c(0, 200)) +
guides(color = guide_legend(override.aes = list(alpha = 1))) +
scale_color_manual(
name = NULL,
values = c("a" = IMSCOL["blue", "full"], "b" = IMSCOL["red", "full"], "c" = IMSCOL["black", "full"]),
labels = c("Percentile\ninterval\n", "SE\ninterval\n", "Observed\nstatistic"),
guide = "legend"
) +
scale_linetype_manual(
name = NULL,
values = c("a" = "dashed", "b" = "dotted", "c" = "solid"),
labels = c("Percentile\ninterval\n", "SE\ninterval\n", "Observed\nstatistic"),
guide = "legend"
) +
labs(
x = "Mean of bootstrapped differences of price\n(UCLA - Amazon)",
title = "1,000 means of bootstrapped differences",
y = "Count"
) +
theme(
legend.position = c(0.925, 0.8),
legend.background = element_rect(color = "white")
)
```
\clearpage
## Mathematical model for the mean paired difference {#sec-mathpaired}
Thinking about the differences as a single observation on an observational unit changes the paired setting into the one-sample setting.
The mathematical model for the one-sample case is covered in @sec-one-mean-math.
\vspace{-2mm}
### Observed data
:::: {layout="[0.55, -0.05, 0.4]"}
:::{#text}
To analyze paired data, it is often useful to look at the difference in outcomes of each pair of observations.
In the textbook data, we look at the differences in prices, which is represented as the `price_difference` variable in the dataset.
Here the differences are taken as
$$\text{UCLA Bookstore price} - \text{Amazon price}$$
It is important that we always subtract using a consistent order; here Amazon prices are always subtracted from UCLA prices.
The first difference shown in @tbl-textbooksDF is computed as $47.97 - 47.45 = 0.52.$ Similarly, the second difference is computed as $14.26 - 13.55 = 0.71,$ and the third is $13.50 - 12.53 = 0.97.$ A histogram of the differences is shown in @fig-diffInTextbookPricesF18.
:::
:::{#plot}
```{r}
#| label: fig-diffInTextbookPricesF18
#| fig-cap: Histogram of the differences in prices for each book sampled.
#| fig-alt: |
#| Histogram of the differences in prices for each book samples,
#| UCLA minus Amazon. The prices differences range from -$10 to $80 with a
#| strong right skew.
#| out-width: 100%
#| fig-width: 4
ucla_textbooks_f18 <- ucla_textbooks_f18 |>
select(subject, course_num, bookstore_new, amazon_new) |>
filter(!is.na(bookstore_new), !is.na(amazon_new)) |>
mutate(price_diff = bookstore_new - amazon_new)
ggplot(ucla_textbooks_f18, aes(x = price_diff)) +
geom_histogram(binwidth = 10) +
scale_x_continuous(
labels = label_dollar(accuracy = 1),
limits = c(-20, 100),
breaks = seq(-20, 100, 20)
) +
labs(
x = "UCLA Bookstore Price - Amazon Price (USD)",
y = "Count"
)
```
:::
::::
### Variability of the statistic
To analyze a paired dataset, we simply analyze the differences.
@tbl-textbooksSummaryStats provides the data summaries from the textbook data.
Note that instead of reporting the prices separately for UCLA and Amazon, the summary statistics are given by the mean of the differences, the standard deviation of the differences, and the total number of pairs (i.e., differences).
The parameter of interest is also a single value, $\mu_{diff},$ so we can use the same $t$-distribution techniques we applied in @sec-one-mean-math directly onto the observed differences.
```{r}
#| label: tbl-textbooksSummaryStats
#| tbl-cap: Summary statistics for the 68 price differences.
#| tbl-pos: H
ucla_textbooks_f18 |>
summarise(
n = n(),
mean = mean(price_diff),
sd = sd(price_diff)
) |>
kbl(
linesep = "", booktabs = TRUE,
col.names = c("n", "Mean", "SD"),
align = "ccc", digits = 2
) |>
kable_styling(
bootstrap_options = c("striped", "condensed"),
latex_options = c("striped"), full_width = FALSE
) |>
column_spec(1, width = "8em") |>
column_spec(2, width = "8em") |>
column_spec(3, width = "8em")
```
::: {.workedexample data-latex=""}
Set up a hypothesis test to determine whether, on average, there is a difference between Amazon's price for a book and the UCLA bookstore's price.
Also, check the conditions for whether we can move forward with the test using the $t$-distribution.
------------------------------------------------------------------------
We are considering two scenarios:
- $H_0:$ $\mu_{diff} = 0.$ There is no difference in the average textbook prices.
- $H_A:$ $\mu_{diff} \neq 0.$ There is a difference in average prices.
Next, we check the independence and normality conditions.
This is a simple random sample, so assuming the textbooks are independent seems reasonable.
While there are some outliers, $n = 68$ and none of the outliers are particularly extreme, so the normality of $\bar{x}$ is satisfied.
With these conditions satisfied, we can move forward with the $t$-distribution.
:::
\clearpage
### Observed statistic vs. null statistics
As mentioned previously, the methods applied to a difference will be identical to the one-sample techniques.
Therefore, the full hypothesis test framework is presented as guided practices.
::: {.important data-latex=""}
**The test statistic for assessing a paired mean is a T.**
\index{T score!paired difference}\index{paired difference!T score}
The T score is a ratio of how the sample mean difference varies from zero as compared to how the observations vary.
$$T = \frac{\bar{x}_{diff} - 0 }{s_{diff}/\sqrt{n_{diff}}}$$
When the null hypothesis is true and the conditions are met, T has a t-distribution with $df = n_{diff} - 1.$
Conditions:
- Independently sampled pairs.
- Large samples and no extreme outliers.
:::
```{r}
#| include: false
terms_chp_21 <- c(terms_chp_21, "T score paired difference")
```
::: {.workedexample data-latex=""}
Complete the hypothesis test started in the previous Example.
------------------------------------------------------------------------
To compute the test compute the standard error associated with $\bar{x}_{diff}$ using the standard deviation of the differences $(s_{diff} = 13.42)$ and the number of differences $(n_{diff} = 68):$
$$SE_{\bar{x}_{diff}} = \frac{s_{diff}}{\sqrt{n_{diff}}} = \frac{13.42}{\sqrt{68}} = 1.63$$
The test statistic is the T score of $\bar{x}_{diff}$ under the null hypothess that the true mean difference is 0:
$$T = \frac{\bar{x}_{diff} - 0}{SE_{\bar{x}_{diff}}} = \frac{3.58 - 0}{1.63} = 2.20$$
To visualize the p-value, the sampling distribution of $\bar{x}_{diff}$ is drawn as though $H_0$ is true, and the p-value is represented by the two shaded tails in the figure below.
The degrees of freedom is $df = 68 - 1 = 67.$ Using statistical software, we find the one-tail area of 0.0156.
```{r}
#| label: textbooksF18HTTails
#| out-width: 60%
#| fig-asp: 0.35
#| fig-align: center
#| fig-width: 5
m <- mean(ucla_textbooks_f18$price_diff)
s <- sd(ucla_textbooks_f18$price_diff)
se <- s / sqrt(length(ucla_textbooks_f18$price_diff))
z <- m / se
par(mar = c(2, 0, 0, 0))
normTail(
L = -abs(m),
U = abs(m),
s = se,
df = 20,
col = IMSCOL["blue", "full"],
axes = FALSE
)
at <- c(-100, 0, m, 100)
labels <- expression(0, mu[0] * " = 0", bar(x)[diff] * " = 2.98", 0)
axis(1, at, labels, cex.axis = 0.9)
```
Doubling this area gives the p-value: 0.0312.
Because the p-value is less than 0.05, we reject the null hypothesis.
The data provide evidence that Amazon prices are different, on average,
than the UCLA Bookstore prices for UCLA courses.
:::
\clearpage
Recall that the margin of error is defined by the standard error.
The margin of error for $\bar{x}_{diff}$ can be directly obtained from $SE(\bar{x}_{diff}).$
::: {.important data-latex=""}
**Margin of error for** $\bar{x}_{diff}.$
The margin of error is $t^\star_{df} \times s_{diff}/\sqrt{n_{diff}}$ where $t^\star_{df}$ is calculated from a specified percentile on the t-distribution with *df* degrees of freedom.
:::
\index{t-CI!paired difference}
::: {.workedexample data-latex=""}
Create a 95% confidence interval for the average price difference between books at the UCLA bookstore and books on Amazon.
------------------------------------------------------------------------
Conditions have already been verified and the standard error computed in a previous Example.\
To find the confidence interval, identify $t^{\star}_{67}$ using statistical software or the $t$-table $(t^{\star}_{67} = 2.00),$ and plug it, the point estimate, and the standard error into the confidence interval formula:
$$
\begin{aligned}
\text{point estimate} \ &\pm \ t^{\star}_{67} \ \times \ SE \\
3.58 \ &\pm \ 2.00 \ \times \ 1.63 \\
(0.32 \ &, \ 6.84)
\end{aligned}
$$
We are 95% confident that the UCLA Bookstore is, on average, between \$0.32 and \$6.84 more expensive than Amazon for UCLA course books.
:::
::: {.guidedpractice data-latex=""}
We have convincing evidence that Amazon is, on average, less expensive.
How should this conclusion affect UCLA student buying habits?
Should UCLA students always buy their books on Amazon?[^21-inference-paired-means-2]
:::
[^21-inference-paired-means-2]: The average price difference is only mildly useful for this question.
Examine the distribution shown in @fig-diffInTextbookPricesF18.
There are certainly a handful of cases where Amazon prices are far below the UCLA Bookstore's, which suggests it is worth checking Amazon (and probably other online sites) before purchasing.
However, in many cases the Amazon price is above what the UCLA Bookstore charges, and most of the time the price isn't that different.
Ultimately, if getting a book immediately from the bookstore is notably more convenient, e.g., to get started on reading or homework, it's likely a good idea to go with the UCLA Bookstore unless the price difference on a specific book happens to be quite large.
For reference, this is a very different result from what we (the authors) had seen in a similar dataset from 2010.
At that time, Amazon prices were almost uniformly lower than those of the UCLA Bookstore's and by a large margin, making the case to use Amazon over the UCLA Bookstore quite compelling at that time.
Now we frequently check multiple websites to find the best price.
A small note on the power of the paired t-test (recall the discussion of power in @sec-pow). It turns out that the paired t-test given here is often more powerful than the independent t-test discussed in @sec-math2samp. That said, depending on how the data are collected, we don't always have mechanism for pairing the data and reducing the inherent variability across observations.
```{r}
#| include: false
terms_chp_21 <- c(terms_chp_21, "paired difference t-test", "paired difference CI")
```
\clearpage
## Chapter review {#sec-chp21-review}
### Summary
Like the two independent sample procedures in [Chapter -@sec-inference-two-means], the paired difference analysis can be done using a t-distribution.
The randomization test applied to the paired differences is slightly different, however.
Note that when randomizing under the paired setting, each null statistic is created by randomly assigning the group to a numerical outcome **within** the individual observational unit.
The procedure for creating a confidence interval for the paired difference is almost identical to the confidence intervals created in [Chapter -@sec-inference-one-mean] for a single mean.
### Terms
The terms introduced in this chapter are presented in @tbl-terms-chp-21.
If you're not sure what some of these terms mean, we recommend you go back in the text and review their definitions.
You should be able to easily spot them as **bolded text**.
```{r}
#| label: tbl-terms-chp-21
#| tbl-cap: Terms introduced in this chapter.
#| tbl-pos: H
make_terms_table(terms_chp_21)
```
\clearpage
## Exercises {#sec-chp21-exercises}
Answers to odd-numbered exercises can be found in [Appendix -@sec-exercise-solutions-21].
::: {.exercises data-latex=""}
{{< include exercises/_21-ex-inference-paired-means.qmd >}}
:::