inference-paired-means.qmd

# Inference for comparing paired means {#sec-inference-paired-means}

```{r}
#| include: false
source("_common.R")
```

::: {.chapterintro data-latex=""}
In [Chapter -@sec-inference-two-means] analysis was done to compare the average population value across two different groups.
Recall that one of the important conditions in doing a two-sample analysis is that the two groups are independent.
Here, independence across groups means that knowledge of the observations in one group does not change what we would expect to happen in the other group.
But what happens if the groups are **dependent**?
Sometimes dependency is not something that can be addressed through a statistical method.
However, a particular dependency, **pairing**, can be modeled quite effectively using many of the same tools we have already covered in this text.
:::

Paired data represent a particular type of experimental structure where the analysis is somewhat akin to a one-sample analysis (see [Chapter -@sec-inference-one-mean]) but has other features that resemble a two-sample analysis (see [Chapter -@sec-inference-two-means]).
As with a two-sample analysis, quantitative measurements are made on each of two different levels of the explanatory variable.
However, because the observational unit is **paired** across the two groups, the two measurements are subtracted such that only the difference is retained.
@tbl-pairedexamples presents some examples of studies where paired designs were implemented.

```{r}
#| label: tbl-pairedexamples
#| tbl-cap: |
#|   Examples of studies where a paired design is used to measure the
#|   difference in the measurement over two conditions.
#| tbl-pos: H
paired_study_examples <- tribble(
  ~variable, ~col1, ~col2, ~col3,
  "Car", "Smooth Turn vs Quick Spin", "amount of tire tread after 1,000 miles", "difference in tread",
  "Textbook", "UCLA vs Amazon", "price of new text", "difference in price",
  "Individual person", "Pre-course vs Post-course", "exam score", "difference in score"
)

paired_study_examples |>
  kbl(
    linesep = "", booktabs = TRUE,
    col.names = c("Observational unit", "Comparison groups", "Measurement", "Value of interest")
  ) |>
  kable_styling(
    bootstrap_options = c("striped", "condensed"),
    latex_options = c("striped"),
    full_width = FALSE
  ) |>
  column_spec(1:3, width = "8em")
```

::: {.important data-latex=""}
**Paired data.**

\index{paired data}

Two sets of observations are *paired* if each observation in one set has a special correspondence or connection with exactly one observation in the other dataset.
:::

It is worth noting that if mathematical modeling is chosen as the analysis tool, paired data inference on the difference in measurements will be identical to the one-sample mathematical techniques described in [Chapter -@sec-inference-one-mean].
However, recall from [Chapter -@sec-inference-one-mean] that with pure one-sample data, the computational tools for hypothesis testing are not easy to implement and were not presented (although the bootstrap was presented as a computational approach for constructing a one sample confidence interval).
With paired data, the randomization test fits nicely with the structure of the experiment and is presented here.

```{r}
#| include: false
terms_chp_21 <- c("paired data")
```

## Randomization test for the mean paired difference

Consider an experiment done to measure whether tire brand Smooth Turn or tire brand Quick Spin has longer tread wear (in cm).
That is, after 1,000 miles on a car, which brand of tires has more tread, on average?

### Observed data

The observed data represent 25 tread measurements (in cm) taken on 25 tires of Smooth Turn and 25 tires of Quick Spin.
The study used a total of 25 cars, so on each car, one tire was of Smooth Turn and one was of Quick Spin.
The mean tread for the Quick Spin tires was 0.308 cm and the mean tread for the Smooth Turn tires was 0.310 cm.
@fig-tiredata presents the observed data, calculations on tread remaining (in cm).

The Smooth Turn manufacturer looks at the box plots and says:

> *Clearly the tread on Smooth Turn tires is higher, on average, than the tread on Quick Spin tires after 1,000 miles of driving.*

The Quick Spin manufacturer is skeptical and retorts:

> *But with only 25 cars, it seems that the variability in road conditions (sometimes one tire hits a pothole, etc.) could be what leads to the small difference in average tread amount.*

```{r}
#| label: fig-tiredata
#| fig-cap: |
#|   Box plots of the tire tread data (in cm) and the brand of tire from which
#|   the original measurements came.
#| fig-alt: |
#|   Box plots of the amount of tire tread for each of the two
#|   brands of tires with data values superimposed over the box plots.
#|   Each superimposed dot represents a car that drove with both types
#|   of tires. A grey line connects each car across the two box plots
#|   indicating that Smooth Turn has more tire wear than Quick Spin.
set.seed(47)
brandA <- rnorm(25, 0.310, 0.003)
brandB <- rnorm(25, 0.308, 0.003)
car <- c(paste("car", 1:25))
miny <- min(brandA, brandB) - .003
maxy <- max(brandA, brandB) + .003

tires <- tibble(
  tread = c(brandA, brandB),
  car = rep(car, 2),
  brand = c(rep("Smooth Turn", 25), rep("Quick Spin", 25))
)

orig_means <- tires |>
  group_by(brand) |>
  summarize(mean_tread = round(mean(tread), 4)) |>
  mutate(
    mean_label = c("bar(x)[QS]", "bar(x)[ST]"),
    mean_label = paste(mean_label, "==", mean_tread)
  )

ggplot(tires, aes(
  x = brand, y = tread,
  color = brand, shape = brand
)) +
  geom_boxplot(
    show.legend = FALSE,
    outlier.shape = "triangle"
  ) +
  geom_text(aes(label = car),
    color = "grey",
    hjust = rep(c(-0.15, 1.3), each = 25),
    show.legend = FALSE, size = 4
  ) +
  geom_line(aes(group = car), color = "grey", linewidth = 0.25) +
  geom_point(show.legend = FALSE) +
  geom_text(
    data = orig_means,
    aes(label = mean_label, y = 0.318),
    parse = TRUE, show.legend = FALSE
  ) +
  scale_color_manual(values = c(IMSCOL["blue", "full"], IMSCOL["red", "full"])) +
  labs(
    x = "Tire brand",
    y = NULL,
    title = "Original data"
  )
```

We'd like to be able to systematically distinguish between what the Smooth Turn manufacturer sees in the plot and what the Quick Spin manufacturer sees in the plot.
Fortunately for us, we have an excellent way to simulate the natural variability (from road conditions, etc.) that can lead to tires being worn at different rates.

### Variability of the statistic

A randomization test will identify whether the differences seen in the box plot of the original data in @fig-tiredata could have happened just by chance variability.
As before, we will simulate the variability in the study under the assumption that the null hypothesis is true.
In this study, the null hypothesis is that average tire tread wear is the same across Smooth Turn and Quick Spin tires.

-   $H_0: \mu_{diff} = 0,$ the average tread wear is the same for the two tire brands.
-   $H_A: \mu_{diff} \ne 0,$ the average tread wear is different across the two tire brands.

When observations are paired, the randomization process randomly assigns the tire brand to each of the observed tread values.
Note that in the randomization test for the two-sample mean setting (see @sec-rand2mean) the explanatory variable was *also* randomly assigned to the responses.
The change in the paired setting, however, is that the assignment happens *within* an observational unit (here, a car).
Remember, if the null hypothesis is true, it will not matter which brand is put on which tire because the overall tread wear will be the same across pairs.

@fig-tiredata4 and @fig-tiredata5 show that the random assignment of group (tire brand) happens within a single car.
That is, every single car will still have one tire of each type.
In the first randomization, it just so happens that the 4th car's tire brands were swapped and the 5th car's tire brands were not swapped.

```{r}
#| label: fig-tiredata4
#| fig-cap: |
#|   The 4th car: the tire brand was randomly permuted, and in the randomization
#|   calculation, the measurements (in cm) ended up in different groups.
#| fig-alt: |
#|   Line plot connecting the tread for the 4th car in the dataset. The first
#|   plot is the original data and the second plot is the permuted data where
#|   the groups happened to get permuted randomly.
#| fig-asp: 0.5
set.seed(47)
permdata <- tires |>
  group_by(car) |>
  mutate(random_brand = sample(brand))

origplot4 <- tires |>
  filter(car == "car 4") |>
  ggplot(aes(
    x = brand, y = tread,
    color = brand, shape = brand
  )) +
  geom_line(aes(group = car), color = "grey") +
  geom_point(size = 3, show.legend = FALSE) +
  geom_text(aes(label = car),
    color = "darkgrey",
    hjust = rep(c(-0.15, 1.3), each = 1),
    show.legend = FALSE, size = 6
  ) +
  ylim(c(miny, maxy)) +
  labs(
    x = "Brand of tire",
    y = NULL,
    color = NULL, shape = NULL,
    title = "Original data"
  ) +
  scale_color_manual(values = c(IMSCOL["blue", "full"], IMSCOL["red", "full"]))

shuffbrand4 <- permdata |>
  filter(car == "car 4") |>
  ggplot(aes(
    x = brand, y = tread,
    color = random_brand, shape = random_brand
  )) +
  geom_line(aes(group = car), color = "grey") +
  geom_point(size = 3) +
  geom_text(aes(label = car),
    color = "darkgrey",
    hjust = rep(c(-0.15, 1.3), each = 1),
    show.legend = FALSE, size = 6
  ) +
  ylim(c(miny, maxy)) +
  scale_color_manual(values = c(IMSCOL["blue", "full"], IMSCOL["red", "full"])) +
  theme_void() +
  annotate(
    "text",
    x = 1.5, y = 0.315,
    label = "Shuffled assignment\nof tire brand",
    size = 4
  ) +
  theme(
    legend.position = c(0.6, 0.1),
    legend.background = element_rect(color = "white")
  ) +
  labs(color = NULL, shape = NULL)

origplot4 + shuffbrand4
```

```{r}
#| label: fig-tiredata5
#| fig-cap: |
#|   The 5th car: the tire brand was randomly permuted to stay the same! In the
#|   randomization calculation, the measurements (in cm) ended up in the
#|   original groups.
#| fig-alt: |
#|   Line plot connecting the tread for the 5th car in the dataset. The first
#|   plot is the original data and the second plot is the permuted data where
#|   the groups happened to stay connected to the original measurements.
#| fig-asp: 0.5
origplot5 <- tires |>
  filter(car == "car 5") |>
  ggplot(aes(
    x = brand, y = tread,
    color = brand, shape = brand
  )) +
  geom_line(aes(group = car), color = "grey") +
  geom_point(size = 3, show.legend = FALSE) +
  geom_text(aes(label = car),
    color = "darkgrey",
    hjust = rep(c(-0.15, 1.3), each = 1),
    show.legend = FALSE, size = 6
  ) +
  ylim(c(miny, maxy)) +
  labs(
    x = "Brand of tire",
    y = NULL,
    color = NULL, shape = NULL,
    title = "Original data"
  ) +
  scale_color_manual(values = c(IMSCOL["blue", "full"], IMSCOL["red", "full"]))

shuffbrand5 <- permdata |>
  filter(car == "car 5") |>
  ggplot(aes(
    x = brand, y = tread,
    color = random_brand, shape = random_brand
  )) +
  geom_line(aes(group = car), color = "grey") +
  geom_point(size = 3) +
  geom_text(aes(label = car),
    color = "darkgrey",
    hjust = rep(c(-0.15, 1.3), each = 1),
    show.legend = FALSE, size = 6
  ) +
  ylim(c(miny, maxy)) +
  scale_color_manual(values = c(IMSCOL["blue", "full"], IMSCOL["red", "full"])) +
  theme_void() +
  annotate(
    "text",
    x = 1.5, y = 0.315,
    label = "Shuffled assignment\nof tire brand",
    size = 4
  ) +
  theme(
    legend.position = c(0.6, 0.1),
    legend.background = element_rect(color = "white")
  ) +
  labs(color = NULL, shape = NULL)

origplot5 + shuffbrand5
```

We can put the shuffled assignments for all the cars into one plot as seen in @fig-tiredataPerm-2.

```{r}
#| label: fig-tiredataPerm
#| fig-cap: |
#|   Tire tread (in cm) by brand, original and shuffled. As evidenced by the
#|   colors, some of the cars kept their original tire assignments and some
#|   cars swapped the tire assignments.
#| fig-subcap:
#|   - Brand of tire is the original brand.
#|   - Brand of tire is the shuffled brand assignment.
#| fig-alt: |
#|   Line plot connecting the tread for the all of the cars in the
#|   dataset. The first plot is the original data and the second plot is
#|   the permuted data where some of the brands are connect to the
#|   original tread measurements and some of the brands have been swapped
#|   across the two tread measurements, within a car.
#| out-width: 100%
#| layout-ncol: 2
#| fig-width: 5
#| fig-asp: 0.8
origplot <- tires |>
  ggplot(aes(
    x = brand, y = tread,
    color = brand, shape = brand
  )) +
  geom_line(aes(group = car), color = "grey") +
  geom_point(size = 3, show.legend = FALSE) +
  geom_text(aes(label = car),
    color = "grey",
    hjust = rep(c(-0.15, 1.3), each = nrow(tires) / 2),
    show.legend = FALSE, size = 3
  ) +
  ylim(c(miny, maxy)) +
  labs(
    x = "Brand of tire",
    y = NULL,
    color = NULL, shape = NULL,
    title = "Original data"
  ) +
  scale_color_manual(values = c(IMSCOL["blue", "full"], IMSCOL["red", "full"]))

shuffbrand <- permdata |>
  ggplot(aes(
    x = brand, y = tread,
    color = random_brand, shape = random_brand
  )) +
  geom_line(aes(group = car), color = "grey") +
  geom_point(size = 3) +
  geom_text(aes(label = car),
    color = "grey",
    hjust = rep(c(-0.15, 1.3), each = nrow(tires) / 2),
    show.legend = FALSE, size = 3
  ) +
  ylim(c(miny, maxy)) +
  scale_color_manual(values = c(IMSCOL["blue", "full"], IMSCOL["red", "full"])) +
  theme_void() +
  theme(
    legend.position = c(0.6, 0.1),
    legend.background = element_rect(color = "white")
  ) +
  labs(
    title = "Shuffled data",
    color = NULL, shape = NULL
  )

origplot
shuffbrand
```

The next step in the randomization test is to sort the brands so that the assigned brand value on the x-axis aligns with the assigned group from the randomization.
@fig-tiredataPermSort-1 shows the same randomized groups, as seen in @fig-tiredataPerm-2 previously.
However, @fig-tiredataPermSort-2 sorts the randomized groups so that we can measure the variability *across* groups as compared to the variability *within* groups.

```{r}
#| label: fig-tiredataPermSort
#| fig-cap: |
#|   Tire tread (in cm) by brand.
#| fig-subcap:
#|   - Randomized brand assignment
#|   - Randomized brand assignment sorted by brand.
#| out-width: 100%
#| fig-alt: |
#|   Scatterplot of tread on the y axis and tire brand on the
#|   x axis. The left panel has the observed tread values matched to
#|   the original tire brand for the x axis location. The
#|   observations are colored based on the permuted tire brand
#|   assigned in the randomization. The right panel has the tread
#|   values matched to the permuted tire brand, so some observations
#|   have swapped orientation. The observations are also colored by
#|   the permuted tire brand assigned in the randomization. In the
#|   right panel, the two tire brands seem equivalent with respect
#|   to tire wear.
#| layout-ncol: 2
#| fig-width: 5
#| fig-asp: 0.8
permed_means <- permdata |>
  group_by(random_brand) |>
  summarize(mean_tread = round(mean(tread), 4)) |>
  mutate(mean_label = c("bar(x)[QSsim1]", "bar(x)[STsim1]")) |>
  mutate(mean_label = paste(mean_label, "==", mean_tread))

shuffbrandfull <- permdata |>
  ggplot(aes(
    x = random_brand, y = tread,
    color = random_brand, shape = random_brand
  )) +
  geom_text(aes(label = car),
    color = "grey",
    hjust = rep(c(-0.15, 1.3), each = 25),
    show.legend = FALSE, size = 3
  ) +
  geom_line(aes(group = car), color = "grey") +
  ylim(c(miny, maxy)) +
  geom_point(size = 3, show.legend = FALSE) +
  geom_text(
    data = permed_means,
    aes(label = mean_label, y = 0.318),
    parse = TRUE, show.legend = FALSE
  ) +
  labs(
    x = "Randomized brand of tire",
    y = NULL,
    title = "Sorted data"
  ) +
  scale_color_manual(values = c(IMSCOL["blue", "full"], IMSCOL["red", "full"]))

shuffbrand
shuffbrandfull
```

@fig-tiredatarand1 presents a second randomization of the data.
Notice that the two observations from the same car are linked with a grey line; some of the tread values have been randomly assigned to the other tire brand, while some are still connected to their original tire brands.

```{r}
#| label: fig-tiredatarand1
#| fig-cap: |
#|   A second randomization where the brand is randomly swapped (or not)
#|   across the two tread wear measurements (in cm) from the same car.
#| fig-alt: |
#|   Box plots and scatterplot with tire brand on the x-axis and
#|   treat on the y-axis. The points are assigned to the x-axis brand
#|   given by the permutation, but the plot differs from previous figures
#|   in that it is a second permutation of the brands. Again, the two
#|   permuted brands seem equivalent with respect to tire wear.
set.seed(47)
tires1 <- tires |>
  group_by(car) |>
  mutate(random_brand = sample(brand))

permed_means <- tires1 |>
  group_by(random_brand) |>
  summarize(mean_tread = round(mean(tread), 4)) |>
  mutate(mean_label = c("bar(x)[QSsim2]", "bar(x)[STsim2]")) |>
  mutate(mean_label = paste(mean_label, "==", mean_tread))

ggplot(
  tires1, aes(
    x = random_brand, y = tread,
    color = random_brand, shape = random_brand
  )
) +
  geom_boxplot(
    show.legend = FALSE, color = "black",
    outlier.shape = "triangle"
  ) +
  geom_line(aes(group = car), color = "grey") +
  geom_point(size = 2, show.legend = FALSE) +
  geom_text(
    data = permed_means,
    aes(label = mean_label, y = 0.318),
    parse = TRUE, show.legend = FALSE
  ) +
  scale_color_manual(values = c(IMSCOL["blue", "full"], IMSCOL["red", "full"])) +
  labs(
    x = "Brand of tires, randomly assigned (2)",
    y = NULL
  )
```

@fig-tiredatarand2 presents yet another randomization of the data.
Again, the same observations are linked by a grey line, and some of the tread values have been randomly assigned to the opposite tire brand than they were originally (while some are still connected to their original tire brands).

```{r}
#| label: fig-tiredatarand2
#| fig-cap: |
#|   An additional randomization where the brand is randomly swapped (or not)
#|   across the two tread wear measurements (in cm) from the same car.
#| fig-alt: |
#|   Box plots and scatterplot with tire brand on the x-axis and
#|   treat on the y-axis. The points are assigned to the x-axis brand
#|   given by the permutation, but the plot differs from previous figures
#|   in that it is a second permutation of the brands. The additional
#|   permutation demonstrates that the box plots
#|   continue to change for each permutation yet the tire tread is
#|   equivalent across the permuted groups.
set.seed(4747)
tires2 <- tires |>
  group_by(car) |>
  mutate(random_brand = sample(brand))

permed_means <- tires2 |>
  group_by(random_brand) |>
  summarize(mean_tread = round(mean(tread), 4)) |>
  mutate(mean_label = c("bar(x)[QSsim3]", "bar(x)[STsim3]")) |>
  mutate(mean_label = paste(mean_label, "==", mean_tread))

ggplot(
  tires2,
  aes(
    x = random_brand, y = tread,
    color = random_brand, shape = random_brand
  )
) +
  geom_boxplot(
    show.legend = FALSE, color = "black",
    outlier.shape = "triangle"
  ) +
  geom_line(aes(group = car), color = "grey") +
  geom_point(size = 2, show.legend = FALSE) +
  geom_text(
    data = permed_means,
    aes(label = mean_label, y = 0.318),
    parse = TRUE, show.legend = FALSE
  ) +
  scale_color_manual(values = c(IMSCOL["blue", "full"], IMSCOL["red", "full"])) +
  labs(
    x = "Brand of tires, randomly assigned (3)",
    y = NULL
  )
```

### Observed statistic vs. null statistics

By repeating the randomization process, we can create a distribution of the average of the differences in tire treads, as seen in @fig-pair-randomize.
As expected (because the differences were generated under the null hypothesis), the center of the histogram is zero.
A line has been drawn at the observed difference which is well outside the majority of the null differences simulated from natural variability by mixing up which the tire received Smooth Turn and which received Quick Spin.
Because the observed statistic is so far away from the natural variability of the randomized differences, we are convinced that there is a difference between Smooth Turn and Quick Spin.
Our conclusion is that the extra amount of average tire tread in Smooth Turn is due to more than just natural variability: we reject $H_0$ and conclude that $\mu_{ST} \ne \mu_{QS}.$

```{r}
#| label: fig-pairrandfull
#| fig-cap: process of randomizing across pairs.
#| warning: false
#| out-width: 75%
#| eval: false
#| echo: false
include_graphics("images/pairrandfull.png")
```

```{r}
#| label: fig-pair-randomize
#| fig-cap: |
#|   Histogram of 1,000 mean differences with tire brand randomly assigned across
#|   the two tread measurements (in cm) per pair.
#| fig-alt: |
#|   Histogram of the average difference in tire wear, Quick Spin
#|   minus Smooth Turn over 1000 different permutations. The histogram is
#|   centered at 0 and spreads to approximately -0.0025 and 0.0025. A
#|   red line at approximately 0.002 indicates the observed difference in
#|   average trend from the original data.
set.seed(474756)
tires_shuffled <- tires |>
  pivot_wider(
    id_cols = car, names_from = brand, names_prefix = "brand_",
    values_from = tread
  ) |>
  mutate(diff_tread = `brand_Quick Spin` - `brand_Smooth Turn`) |>
  specify(response = diff_tread) |>
  hypothesize(null = "paired independence") |>
  generate(1000, type = "permute") |>
  calculate(stat = "mean")

ggplot(tires_shuffled, aes(x = stat)) +
  geom_histogram(binwidth = 0.0002) +
  labs(
    x = "Mean of randomized differences of tire wear\n(Quick Spin - Smooth Turn)",
    title = "1,000 means of randomized differences",
    y = "Count"
  ) +
  geom_vline(
    xintercept = mean(brandA) - mean(brandB),
    color = IMSCOL["red", "full"]
  )
```

## Bootstrap confidence interval for the mean paired difference

For both the bootstrap and the mathematical models applied to paired data, the analysis is virtually identical to the one-sample approach given in [Chapter -@sec-inference-one-mean].
The key to working with paired data (for bootstrapping and mathematical approaches) is to consider the measurement of interest to be the difference in measured values across the pair of observations.

\index{bootstrap!CI paired difference}\index{bootstrap CI paired difference}

```{r}
#| include: false
terms_chp_21 <- c(terms_chp_21, "bootstrap CI paired difference")
```

### Observed data

In an earlier edition of this textbook, we found that Amazon prices were, on average, lower than those of the UCLA Bookstore for UCLA courses in 2010.
It's been several years, and many stores have adapted to the online market, so we wondered, how is the UCLA Bookstore doing today?

We sampled 201 UCLA courses.
Of those, 68 required books could be found on Amazon.
A portion of the dataset from these courses is shown in @tbl-textbooksDF, where prices are in US dollars.

::: {.data data-latex=""}
The [`ucla_textbooks_f18`](http://openintrostat.github.io/openintro/reference/ucla_textbooks_f18.html) data can be found in the [**openintro**](http://openintrostat.github.io/openintro) R package.
:::

```{r}
#| label: tbl-textbooksDF
#| tbl-cap: Four cases from the `ucla_textbooks_f18` dataset.
#| tbl-pos: H
ucla_textbooks_f18 |>
  select(subject, course_num, bookstore_new, amazon_new) |>
  mutate(price_diff = bookstore_new - amazon_new) |>
  filter(!is.na(bookstore_new) & !is.na(amazon_new)) |>
  head(4) |>
  kbl(linesep = "", booktabs = TRUE) |>
  kable_styling(
    bootstrap_options = c("striped", "condensed"),
    latex_options = c("striped"), full_width = FALSE
  )
```

\index{paired data}

Each textbook has two corresponding prices in the dataset: one for the UCLA Bookstore and one for Amazon.
When two sets of observations have this special correspondence, they are said to be **paired**.

\clearpage

### Variability of the statistic

Following the example of bootstrapping the one-sample statistic, the observed *differences* can be bootstrapped in order to understand the variability of the average difference from sample to sample.
Remember, the differences act as a single value to bootstrap.
That is, the original dataset would include the list of 68 price differences, and each resample will also include 68 price differences (some repeated through the bootstrap resampling process).
The bootstrap procedure for paired differences is quite similar to the procedure applied to the one-sample statistic case in @sec-boot1mean.

In @fig-pairboot, two 99% confidence intervals for the difference in the cost of a new book at the UCLA bookstore compared with Amazon have been calculated.
The bootstrap percentile confidence interval is computed using the 0.5 percentile and 99.5 percentile bootstrapped differences and is found to be (\$0.25, \$7.87).

::: {.guidedpractice data-latex=""}
Using the histogram of bootstrapped difference in means, estimate the standard error of the mean of the sample differences, $\bar{x}_{diff}.$[^21-inference-paired-means-1]
:::

[^21-inference-paired-means-1]: The bootstrapped differences in sample means vary roughly from 0.7 to 7.5, a range of \$6.80.
    Although the bootstrap distribution is not symmetric, we use the empirical rule (that with bell-shaped distributions, most observations are within two standard errors of the center), the standard error of the mean differences is approximately \$1.70.
    You might note that the standard error calculation given in @sec-mathpaired is $SE(\bar{x}_{diff}) = \sqrt{s^2_{diff}/n_{diff}} = \sqrt{13.4^2/68} = \$1.62$ (values from @sec-mathpaired), very close to the bootstrap approximation.

The bootstrap SE interval is found by computing the SE of the bootstrapped differences $(SE_{\overline{x}_{diff}} = \$1.64)$ and the normal multiplier of $z^{\star} = 2.58.$ The averaged difference is $\bar{x} = \$3.58.$ The 99% confidence interval is: $\$3.58 \pm 2.58 \times \$ 1.64 = (\$-0.65, \$7.81).$

The confidence intervals seem to indicate that the UCLA bookstore price is, on average, higher than the Amazon price, as the majority of the confidence interval is positive.
However, if the analysis required a strong degree of certainty (e.g., 99% confidence), and the bootstrap SE interval was most appropriate (given a second course in statistics the nuances of the methods can be investigated), the results of which book seller is higher is not well determined (because the bootstrap SE interval overlaps zero).
That is, the 99% bootstrap SE interval gives potential for UCLA bookstore to be lower, on average, than Amazon (because of the possible negative values for the true mean difference in price).

```{r}
#| label: fig-pairboot
#| fig-cap: |
#|   Bootstrap distribution for the average difference in new book
#|   price at the UCLA bookstore versus Amazon. 99% confidence intervals
#|   are superimposed using blue dashed (bootstrap percentile interval) and
#|   red dotted (bootstrap SE interval) lines.
#| fig-alt: |
#|   Histogram showing the distribution of the average
#|   bootstrapped difference of price, UCLA minus Amazon. The center of
#|   the distribution is given at approximately $3.75. Two bootstrap
#|   intervals are given. The percentile interval is approximately $0.25
#|   to $8.50. The SE interval is approximately -$0.50 to $7.75.
diff_price <- ucla_textbooks_f18 |>
  select(subject, course_num, bookstore_new, amazon_new) |>
  mutate(price_diff = bookstore_new - amazon_new) |>
  filter(!is.na(bookstore_new) & !is.na(amazon_new)) |>
  specify(response = price_diff) |>
  calculate(stat = "mean")

boot_diff <- ucla_textbooks_f18 |>
  select(subject, course_num, bookstore_new, amazon_new) |>
  mutate(price_diff = bookstore_new - amazon_new) |>
  filter(!is.na(bookstore_new) & !is.na(amazon_new)) |>
  specify(response = price_diff) |>
  generate(reps = 1000, type = "bootstrap") |>
  calculate(stat = "mean")

ci_perc_diff <- boot_diff |>
  get_confidence_interval(level = 0.99, type = "percentile")

ci_se_diff <- boot_diff |>
  get_confidence_interval(
    level = 0.99, type = "se",
    point_estimate = diff_price
  )

boot_diff |>
  infer::visualize(bins = 20) +
  infer::shade_confidence_interval(ci_perc_diff,
    color = IMSCOL["blue", "full"],
    fill = NULL, linewidth = 1, linetype = "dashed"
  ) +
  infer::shade_confidence_interval(ci_se_diff,
    color = IMSCOL["red", "full"],
    fill = NULL, linewidth = 1, linetype = "dotted"
  ) +
  geom_vline(xintercept = diff_price$stat, linewidth = 1) +
  geom_line(aes(y = replicate, x = stat, color = "a", linetype = "a"), alpha = 0, linewidth = 1) + # bogus code
  geom_line(aes(y = replicate, x = stat, color = "b", linetype = "b"), alpha = 0, linewidth = 1) + # bogus code
  geom_line(aes(y = replicate, x = stat, color = "c", linetype = "c"), alpha = 0) + # bogus code
  ylim(c(0, 200)) +
  guides(color = guide_legend(override.aes = list(alpha = 1))) +
  scale_color_manual(
    name = NULL,
    values = c("a" = IMSCOL["blue", "full"], "b" = IMSCOL["red", "full"], "c" = IMSCOL["black", "full"]),
    labels = c("Percentile\ninterval\n", "SE\ninterval\n", "Observed\nstatistic"),
    guide = "legend"
  ) +
  scale_linetype_manual(
    name = NULL,
    values = c("a" = "dashed", "b" = "dotted", "c" = "solid"),
    labels = c("Percentile\ninterval\n", "SE\ninterval\n", "Observed\nstatistic"),
    guide = "legend"
  ) +
  labs(
    x = "Mean of bootstrapped differences of price\n(UCLA - Amazon)",
    title = "1,000 means of bootstrapped differences",
    y = "Count"
  ) +
  theme(
    legend.position = c(0.925, 0.8),
    legend.background = element_rect(color = "white")
  )
```

\clearpage

## Mathematical model for the mean paired difference {#sec-mathpaired}

Thinking about the differences as a single observation on an observational unit changes the paired setting into the one-sample setting.
The mathematical model for the one-sample case is covered in @sec-one-mean-math.

\vspace{-2mm}

### Observed data

:::: {layout="[0.55, -0.05, 0.4]"}
:::{#text}
To analyze paired data, it is often useful to look at the difference in outcomes of each pair of observations.
In the textbook data, we look at the differences in prices, which is represented as the `price_difference` variable in the dataset.
Here the differences are taken as

$$\text{UCLA Bookstore price} - \text{Amazon price}$$

It is important that we always subtract using a consistent order; here Amazon prices are always subtracted from UCLA prices.
The first difference shown in @tbl-textbooksDF is computed as $47.97 - 47.45 = 0.52.$ Similarly, the second difference is computed as $14.26 - 13.55 = 0.71,$ and the third is $13.50 - 12.53 = 0.97.$ A histogram of the differences is shown in @fig-diffInTextbookPricesF18.
:::
:::{#plot}
```{r}
#| label: fig-diffInTextbookPricesF18
#| fig-cap: Histogram of the differences in prices for each book sampled.
#| fig-alt: |
#|   Histogram of the differences in prices for each book samples,
#|   UCLA minus Amazon. The prices differences range from -$10 to $80 with a
#|   strong right skew.
#| out-width: 100%
#| fig-width: 4
ucla_textbooks_f18 <- ucla_textbooks_f18 |>
  select(subject, course_num, bookstore_new, amazon_new) |>
  filter(!is.na(bookstore_new), !is.na(amazon_new)) |>
  mutate(price_diff = bookstore_new - amazon_new)

ggplot(ucla_textbooks_f18, aes(x = price_diff)) +
  geom_histogram(binwidth = 10) +
  scale_x_continuous(
    labels = label_dollar(accuracy = 1), 
    limits = c(-20, 100), 
    breaks = seq(-20, 100, 20)
  ) +
  labs(
    x = "UCLA Bookstore Price - Amazon Price (USD)",
    y = "Count"
  )
```
:::
::::

### Variability of the statistic

To analyze a paired dataset, we simply analyze the differences.
@tbl-textbooksSummaryStats provides the data summaries from the textbook data.
Note that instead of reporting the prices separately for UCLA and Amazon, the summary statistics are given by the mean of the differences, the standard deviation of the differences, and the total number of pairs (i.e., differences).
The parameter of interest is also a single value, $\mu_{diff},$ so we can use the same $t$-distribution techniques we applied in @sec-one-mean-math directly onto the observed differences.

```{r}
#| label: tbl-textbooksSummaryStats
#| tbl-cap: Summary statistics for the 68 price differences.
#| tbl-pos: H
ucla_textbooks_f18 |>
  summarise(
    n = n(),
    mean = mean(price_diff),
    sd = sd(price_diff)
  ) |>
  kbl(
    linesep = "", booktabs = TRUE,
    col.names = c("n", "Mean", "SD"),
    align = "ccc", digits = 2
  ) |>
  kable_styling(
    bootstrap_options = c("striped", "condensed"),
    latex_options = c("striped"), full_width = FALSE
  ) |>
  column_spec(1, width = "8em") |>
  column_spec(2, width = "8em") |>
  column_spec(3, width = "8em")
```

::: {.workedexample data-latex=""}
Set up a hypothesis test to determine whether, on average, there is a difference between Amazon's price for a book and the UCLA bookstore's price.
Also, check the conditions for whether we can move forward with the test using the $t$-distribution.

------------------------------------------------------------------------

We are considering two scenarios:

-   $H_0:$ $\mu_{diff} = 0.$ There is no difference in the average textbook prices.

-   $H_A:$ $\mu_{diff} \neq 0.$ There is a difference in average prices.

Next, we check the independence and normality conditions.
This is a simple random sample, so assuming the textbooks are independent seems reasonable.
While there are some outliers, $n = 68$ and none of the outliers are particularly extreme, so the normality of $\bar{x}$ is satisfied.
With these conditions satisfied, we can move forward with the $t$-distribution.
:::

\clearpage

### Observed statistic vs. null statistics

As mentioned previously, the methods applied to a difference will be identical to the one-sample techniques.
Therefore, the full hypothesis test framework is presented as guided practices.

::: {.important data-latex=""}
**The test statistic for assessing a paired mean is a T.**

\index{T score!paired difference}\index{paired difference!T score}

The T score is a ratio of how the sample mean difference varies from zero as compared to how the observations vary.

$$T = \frac{\bar{x}_{diff} - 0 }{s_{diff}/\sqrt{n_{diff}}}$$

When the null hypothesis is true and the conditions are met, T has a t-distribution with $df = n_{diff} - 1.$

Conditions:

-   Independently sampled pairs.
-   Large samples and no extreme outliers.
:::

```{r}
#| include: false
terms_chp_21 <- c(terms_chp_21, "T score paired difference")
```

::: {.workedexample data-latex=""}
Complete the hypothesis test started in the previous Example.

------------------------------------------------------------------------

To compute the test compute the standard error associated with $\bar{x}_{diff}$ using the standard deviation of the differences $(s_{diff} = 13.42)$ and the number of differences $(n_{diff} = 68):$

$$SE_{\bar{x}_{diff}} = \frac{s_{diff}}{\sqrt{n_{diff}}} = \frac{13.42}{\sqrt{68}} = 1.63$$

The test statistic is the T score of $\bar{x}_{diff}$ under the null hypothess that the true mean difference is 0:

$$T = \frac{\bar{x}_{diff} - 0}{SE_{\bar{x}_{diff}}} = \frac{3.58 - 0}{1.63} = 2.20$$

To visualize the p-value, the sampling distribution of $\bar{x}_{diff}$ is drawn as though $H_0$ is true, and the p-value is represented by the two shaded tails in the figure below.
The degrees of freedom is $df = 68 - 1 = 67.$ Using statistical software, we find the one-tail area of 0.0156.

```{r}
#| label: textbooksF18HTTails
#| out-width: 60%
#| fig-asp: 0.35
#| fig-align: center
#| fig-width: 5
m <- mean(ucla_textbooks_f18$price_diff)
s <- sd(ucla_textbooks_f18$price_diff)
se <- s / sqrt(length(ucla_textbooks_f18$price_diff))
z <- m / se

par(mar = c(2, 0, 0, 0))
normTail(
  L = -abs(m),
  U = abs(m),
  s = se,
  df = 20,
  col = IMSCOL["blue", "full"],
  axes = FALSE
)
at <- c(-100, 0, m, 100)
labels <- expression(0, mu[0] * " = 0", bar(x)[diff] * " = 2.98", 0)
axis(1, at, labels, cex.axis = 0.9)
```

Doubling this area gives the p-value: 0.0312.
Because the p-value is less than 0.05, we reject the null hypothesis.
The data provide evidence that Amazon prices are different, on average, 
than the UCLA Bookstore prices for UCLA courses.
:::

\clearpage

Recall that the margin of error is defined by the standard error.
The margin of error for $\bar{x}_{diff}$ can be directly obtained from $SE(\bar{x}_{diff}).$

::: {.important data-latex=""}
**Margin of error for** $\bar{x}_{diff}.$

The margin of error is $t^\star_{df} \times s_{diff}/\sqrt{n_{diff}}$ where $t^\star_{df}$ is calculated from a specified percentile on the t-distribution with *df* degrees of freedom.
:::

\index{t-CI!paired difference}

::: {.workedexample data-latex=""}
Create a 95% confidence interval for the average price difference between books at the UCLA bookstore and books on Amazon.

------------------------------------------------------------------------

Conditions have already been verified and the standard error computed in a previous Example.\
To find the confidence interval, identify $t^{\star}_{67}$ using statistical software or the $t$-table $(t^{\star}_{67} = 2.00),$ and plug it, the point estimate, and the standard error into the confidence interval formula:

$$
\begin{aligned}
\text{point estimate} \ &\pm \ t^{\star}_{67} \ \times \ SE \\
3.58 \ &\pm \ 2.00 \ \times \ 1.63 \\
(0.32 \ &, \ 6.84)
\end{aligned}
$$

We are 95% confident that the UCLA Bookstore is, on average, between \$0.32 and \$6.84 more expensive than Amazon for UCLA course books.
:::

::: {.guidedpractice data-latex=""}
We have convincing evidence that Amazon is, on average, less expensive.
How should this conclusion affect UCLA student buying habits?
Should UCLA students always buy their books on Amazon?[^21-inference-paired-means-2]
:::

[^21-inference-paired-means-2]: The average price difference is only mildly useful for this question.
    Examine the distribution shown in @fig-diffInTextbookPricesF18.
    There are certainly a handful of cases where Amazon prices are far below the UCLA Bookstore's, which suggests it is worth checking Amazon (and probably other online sites) before purchasing.
    However, in many cases the Amazon price is above what the UCLA Bookstore charges, and most of the time the price isn't that different.
    Ultimately, if getting a book immediately from the bookstore is notably more convenient, e.g., to get started on reading or homework, it's likely a good idea to go with the UCLA Bookstore unless the price difference on a specific book happens to be quite large.
    For reference, this is a very different result from what we (the authors) had seen in a similar dataset from 2010.
    At that time, Amazon prices were almost uniformly lower than those of the UCLA Bookstore's and by a large margin, making the case to use Amazon over the UCLA Bookstore quite compelling at that time.
    Now we frequently check multiple websites to find the best price.

A small note on the power of the paired t-test (recall the discussion of power in @sec-pow). It turns out that the paired t-test given here is often more powerful than the independent t-test discussed in @sec-math2samp. That said, depending on how the data are collected, we don't always have mechanism for pairing the data and reducing the inherent variability across observations.

```{r}
#| include: false
terms_chp_21 <- c(terms_chp_21, "paired difference t-test", "paired difference CI")
```

\clearpage

## Chapter review {#sec-chp21-review}

### Summary

Like the two independent sample procedures in [Chapter -@sec-inference-two-means], the paired difference analysis can be done using a t-distribution.
The randomization test applied to the paired differences is slightly different, however.
Note that when randomizing under the paired setting, each null statistic is created by randomly assigning the group to a numerical outcome **within** the individual observational unit.
The procedure for creating a confidence interval for the paired difference is almost identical to the confidence intervals created in [Chapter -@sec-inference-one-mean] for a single mean.

### Terms

The terms introduced in this chapter are presented in @tbl-terms-chp-21.
If you're not sure what some of these terms mean, we recommend you go back in the text and review their definitions.
You should be able to easily spot them as **bolded text**.

```{r}
#| label: tbl-terms-chp-21
#| tbl-cap: Terms introduced in this chapter.
#| tbl-pos: H
make_terms_table(terms_chp_21)
```

\clearpage

## Exercises {#sec-chp21-exercises}

Answers to odd-numbered exercises can be found in [Appendix -@sec-exercise-solutions-21].

::: {.exercises data-latex=""}
{{< include exercises/_21-ex-inference-paired-means.qmd >}}
:::