slides2.qmd

---
title: "Creating data plots for effective decision-making using statistical inference with R"
author: "Dianne Cook <br> Monash University"
format:
  revealjs: 
    theme: 
      - default
      - custom.scss
    slide-number: c/t
    chalkboard: true
code-line-numbers: false
message: false
highlight-style: pygments
footer: "[https://github.com/StatSocAus/tutorial_effective_data_plots](https://github.com/StatSocAus/tutorial_effective_data_plots)"
---

```{r, include = FALSE}
#| label: libraries-for-participants
library(tidyverse)
library(colorspace)
library(patchwork)
library(broom)
library(palmerpenguins)
library(nullabor)
library(dslabs) # For stars data
```

```{r, include = FALSE}
#| label: code-for-nice-slides
library(DT)

options(width = 200)
knitr::opts_chunk$set(
  fig.width = 3,
  fig.height = 3,
  fig.align = "center",
  dev.args = list(bg = 'transparent'),
  out.width = "100%",
  fig.retina = 3,
  echo = FALSE,
  warning = FALSE,
  message = FALSE,
  cache = FALSE
)
theme_set(ggthemes::theme_gdocs(base_size = 12) +
  theme(plot.background = 
        element_rect(fill = 'transparent', colour = NA),
        axis.line.x = element_line(color = "black", 
                                   linetype = "solid"),
        axis.line.y = element_line(color = "black", 
                                   linetype = "solid"),
        plot.title.position = "plot",
        plot.title = element_text(size = 18),
        panel.background  = 
          element_rect(fill = 'transparent', colour = "black"),
        legend.background = 
          element_rect(fill = 'transparent', colour = NA),
        legend.key        = 
          element_rect(fill = 'transparent', colour = NA)
  ) 
)
```

```{r}
#| echo: false
#| eval: false
# divergingx_hcl(palette="Zissou 1", n=10)
# [1] "#3B99B1" "#40ABA5" "#6FB798" "#9FC095" "#C7C98A"
# [6] "#EABE23" "#E8A419" "#E78802" "#EA6400" "#F5191C"
# specplot(divergingx_hcl(palette="Zissou 1", n=10))
```

## Session 2: Making decisions and inferential statements based on data plots {.center .center-align}

## Outline

```{r}
plan <- tribble(~time, ~topic,
                "3:00-3:20", "What is your plot testing?",
                "3:20-3:35", "Creating null samples",
                "3:35-4:00", "Conducting a lineup test", 
                "4:00-4:30", "Testing for best plot design")
knitr::kable(plan)
```

## What is your plot testing? 

:::: {.columns}

::: {.column width=50%}

<br>

```{r}
#| eval: false
#| echo: true
LM_FIT <- lm(VAR2 ~ VAR1, 
             data = DATA)
FIT_ALL <- augment(LM_FIT)
ggplot(FIT_ALL, aes(x=.FITTED, 
                    y=.RESID)) + 
  geom_point()
```

<br>
What will we  be assessing using this plot?

:::

::: {.column width=10%}
:::

::: {.column width=40%}

::: {.fragment}
Is the model misspecified? 

::: {style="font-size: 90%;"}
- non-linearity
- heteroskedasticity
- outliers/anomalies
- non-normality
- fitted value distribution
:::
:::

:::

::::


## What is your plot testing? 

:::: {.columns}

::: {.column width=60%}

```{r}
#| fig-width: 4
#| fig-height: 4
cars_lm <- lm(mpg ~ hp, data = mtcars)
cars_all <- augment(cars_lm)
ggplot(cars_all, aes(x=.fitted, y=.resid)) + geom_point()
```

:::

::: {.column width=40%}

What do you see?

::: {.fragment}
&cross; non-linearity <br>
&check; heteroskedasticity <br>
&cross; outliers/anomalies <br>
&check; non-normality <br>
&cross; fitted value distribution is uniform
:::

::: {.fragment}
<br>
<span style="color: #F5191C;"> Are you sure? </span>
:::

:::

::::


## What is your plot testing? 

:::: {.columns}

::: {.column width=50%}

<br>

```{r}
#| eval: false
#| echo: true
ggplot(DATA, 
       aes(x=VAR1, 
           y=VAR2, 
           color=CLASS)) + 
  geom_point() 
```

<br>
What will we  be assessing using this plot?

:::

::: {.column width=10%}
:::

::: {.column width=40%}

::: {.fragment}
<br>
Is there a difference between the groups?

- location
- shape
- outliers/anomalies
:::

:::

::::

## What is your plot testing? 

:::: {.columns}

::: {.column width=50%}

```{r}
#| fig-width: 4
#| fig-height: 4.5
ggplot(penguins, 
       aes(x=flipper_length_mm, 
           y=bill_length_mm, 
           color=species)) + 
  geom_point(alpha=0.8) +
  scale_color_discrete_divergingx(palette="Zissou 1") +
  theme(legend.title = element_blank(), 
        legend.position = "bottom",
        legend.direction = "horizontal",
        legend.text = element_text(size="8"))
```

<br>

:::

::: {.column width=10%}
:::

::: {.column width=40%}

What do you see?

::: {.fragment}
There a difference between the groups

&check; location <br>
&cross; shape <br>
&check; outliers/anomalies
:::

::: {.fragment}
<br>
<span style="color: #F5191C;"> Are you sure? </span>
:::
:::

::::

::: {.hidden}

```{r}
ebs <- read_csv("data/EBS_faculty_keywords.csv") %>%
  mutate(id = 1:n())

theory_stem <- c("infer", "tsa", "statist", "theory", "theori", "bayesian", "infer", "robust", "highdimension", "estim", "nonlinear", "depend", "probabilist", "memori", "risk", "oper", "constrain", "stochast", "minimum", "distanc", "nonparametr", "econometr", "simul", "semiparametr", "mathemat", "actuari","optimisationoper", "test","extrem","causal", "multivari", "econometr", "actuari","cross","section","quantit","uncertainti")

```

:::

## Statistical thinking

- Because the <span style="color: #3B99B1;"> plot </span> is specified using a functional mapping of the variables, it <span style="color: #3B99B1;"> is a statistic</span>. 
- The null and alternative hypotheses are indicated from the plot description.
- Applying the function to a dataset provides the observed value.


## Null hypothesis, example 1

:::: {.columns}

::: {.column width=50%}

<br>

```{r}
#| eval: false
#| echo: true
LM_FIT <- lm(VAR2 ~ VAR1, 
             data = DATA)
FIT_ALL <- augment(LM_FIT)
ggplot(FIT_ALL, aes(x=.FITTED, 
                    y=.RESID)) + 
  geom_point()
```

<br>
What is the null hypothesis?

::: {.fragment}
*There is no relationship between residuals and fitted values.* This is $H_o$.
:::

:::

::: {.column width=5%}
:::

::: {.column width=45%}


::: {.fragment .f80}
<br><br><br>

**Alternative hypothesis**, $H_a$:

*There is some relationship*, which might be

- non-linearity
- heteroskedasticity
- outliers/anomalies
:::

:::

::::

## Null hypothesis, example 2

:::: {.columns}

::: {.column width=50%}

<br>

```{r}
#| eval: false
#| echo: true
ggplot(DATA, 
       aes(x=VAR1, 
           y=VAR2, 
           color=CLASS)) + 
  geom_point()
```

<br>
What is the null hypothesis?

::: {.fragment}
*There is no difference between the classes.* This is $H_o$.
:::

:::

::: {.column width=5%}
:::

::: {.column width=45%}


::: {.fragment .f80}
<br><br><br>
**Alternative hypothesis**, $H_a$:

*There is some difference between the classes*, which might be

- location
- shape
- outliers/anomalies
:::

:::

::::

## Exercise

What is being tested in each of these plot descriptions?

:::: {.columns}

::: {.column width=30% .fragment}
```{r}
#| eval: false
#| echo: true
ggplot(DATA, 
       aes(x=VAR1)) +
  geom_histogram()
```
:::

::: {.column width=38% .fragment}
```{r}
#| eval: false
#| echo: true
ggplot(DATA, 
       aes(x=VAR1, 
           fill=VAR2)) +
  geom_bar(position="fill")
```
:::

::: {.column width=30% .fragment}
```{r}
#| eval: false
#| echo: true
ggplot(DATA, 
       aes(x=VAR1, 
           y=VAR2)) +
  geom_point() +
  geom_smooth()
```
:::

::: {.column width=30% .fragment}
Distribution of VAR1 is ?
:::

::: {.column width=38% .fragment}
There is no relationship between VAR1 and VAR2. More specifically, the proportion of VAR2 in each level of VAR1 is the same.
:::

::: {.column width=30% .fragment .f80}
<br>
There is no relationship between VAR1 and VAR2. Particularly, VAR2 is not dependent on VAR1 and there is no trend.</span>
:::

::::

## Creating null samples {.center}

## Statistical thinking

:::: {.columns}
::: {.column width=50% .f80}

Sampling distribution for a t-statistic. Values expected assuming $H_o$ is true. <span style="color: #F5191C;"> Shaded areas </span> indicate extreme values. 

```{r}
#| fig-width: 3
#| fig-height: 2.6
#| out-width: 70%
t_df <- tibble(x=seq(-4, 4, 0.1), y=dt(x, df=5))
t_cr1 <- tibble(x=c(-4, seq(-4, -2.57, 0.05), -2.57), 
                y=c(0, dt(x[-c(1,31)], df=5), 0))
t_cr2 <- tibble(x=c(2.57, seq(2.57, 4, 0.05), 4), 
                y=c(0, dt(x[-c(1,31)], df=5), 0))
ggplot() + 
  geom_hline(yintercept = 0) +
  geom_line(data=t_df, aes(x=x, y=y)) +
  geom_polygon(data=t_cr1, aes(x=x, y=y), fill="#F5191C", alpha=0.8) +
  geom_polygon(data=t_cr2, aes(x=x, y=y), fill="#F5191C", alpha=0.8) +
  theme(axis.title = element_blank(),
        axis.text = element_blank(),
        panel.grid.major = element_blank())

```

:::

::: {.column width=50% .fragment}
<br><br><br><br>

<span style="color: #3B99B1;"> For making comparisons when plotting, draw a number of null samples, and plot them with the same script in the plot description.</span>

:::

::::


## Creating null samples, example 1

:::: {.columns}
::: {.column width=50% .f80}

```{r}
#| eval: false
#| echo: true
ggplot(DATA, 
       aes(x=VAR1, 
           y=VAR2, 
           color=CLASS)) + 
  geom_point()
```

<br>

$H_o$: *There is no difference between the classes.* 

::: {.fragment}
How would you generate null samples?
:::


::: {.fragment}
<br>
Break any association by permuting (scrambling/shuffling/re-sampling) the CLASS variable.
:::

:::
::: {.column width=50%}

::: {.fragment}
```{r}
#| fig-width: 4
#| fig-height: 8
#| out-width: 60%
p1 <- ggplot(penguins, 
       aes(x=flipper_length_mm, 
           y=bill_length_mm, 
           color=species)) + 
  geom_point(alpha=0.8) +
  scale_color_discrete_divergingx(palette="Zissou 1") +
  ggtitle("Original") +
  theme(legend.position = "none")
set.seed(235)
p2 <- ggplot(penguins, 
       aes(x=flipper_length_mm, 
           y=bill_length_mm, 
           color=sample(species, replace = TRUE))) + 
  geom_point(alpha=0.8) +
  scale_color_discrete_divergingx(palette="Zissou 1") +
  ggtitle("Permuted") +
  theme(legend.position = "none")
p1/p2
```
:::

:::

::::

## Creating null samples, example 2

:::: {.columns}
::: {.column width=50% .f80}

```{r}
#| eval: false
#| echo: true
LM_FIT <- lm(VAR2 ~ VAR1, 
             data = DATA)
FIT_ALL <- augment(LM_FIT)
ggplot(FIT_ALL, aes(x=.FITTED, 
                    y=.RESID)) + 
  geom_point()
```

::: {.fragment}
$H_o$: *There is no relationship between residuals and fitted values.* 
:::

::: {.fragment}
How would you generate null samples?
:::

::: {.fragment}
<br>
Break any association by 

- permuting residuals,
- or residual rotation, 
- or simulate residuals from a normal distribution.
:::

:::
::: {.column width=50%}

::: {.fragment}
```{r}
#| fig-width: 3
#| fig-height: 5.5
#| out-width: 60%
cars_lm <- lm(mpg ~ hp, data = mtcars)
cars_all <- augment(cars_lm)
set.seed(1235)
ggplot(lineup(null_lm(mpg ~ hp, method="rotate"), 
                    cars_all, n=2, pos=1), aes(x=.fitted, y=.resid)) + 
  geom_point() +
  facet_wrap(~.sample, ncol=1) +
  theme(axis.title = element_blank(),
        axis.text = element_blank())
```
:::

:::

::::

## Conducting a lineup test {.center}

## Steps

1. Create a lineup of $m-1$ null plots + 1 data plot, where the data plot is randomly placed among nulls. Remove any distracting information, like tick labels, titles.
2. Ask uninvolved observer(s) to pick the plot that is most different. (May need to use a crowd-sourcing service.)
3. Compute the probability that the data plot was chosen, assuming it is no different from the null plots. This is the $p$-value.
4. Decide to reject or fail to reject the null.

## Lineup example 1 <span style="font-size: 70%;"> (1/2) </span>

::: {.f60}
```{r}
#| code-fold: true
#| echo: true
#| fig-width: 12
#| fig-height: 8
#| out.width: 80%
set.seed(241)
ggplot(lineup(null_permute("species"), penguins, n=15), 
       aes(x=flipper_length_mm, 
           y=bill_length_mm, 
           color=species)) + 
  geom_point(alpha=0.8) +
  facet_wrap(~.sample, ncol=5) +
  scale_color_discrete_divergingx(palette="Zissou 1") +
  theme(legend.position = "none",
        axis.title = element_blank(),
        axis.text = element_blank(),
        panel.grid.major = element_blank())
```
:::

## Lineup example 1 <span style="font-size: 70%;"> (2/2) </span>

If 10 people are shown this lineup and all 10 pick plot 2, which is the data plot, the $p$-value will be 0.

Generally, we can compute the probability that the data plot is chosen by $x$ out of $K$ observers, shown a lineup of $m$ plots, using a simulation approach that extends from a binomial distribution, with $p=1/m$.

```{r}
#| echo: true
pvisual(10, 10, 15)
```

This means we would reject $H_o$ and conclude that there is a difference in the distribution of bill length and flipper length between the species of penguins. 

## Lineup example 2 <span style="font-size: 70%;"> (1/2) </span>

::: {.f60}
```{r}
#| code-fold: true
#| echo: true
#| fig-width: 9
#| fig-height: 6
#| out.width: 80%
data(wasps)
set.seed(258)
wasps_l <- lineup(null_permute("Group"), wasps[,-1], n=15)
wasps_l <- wasps_l %>%
  mutate(LD1 = NA, LD2 = NA)
for (i in unique(wasps_l$.sample)) {
  x <- filter(wasps_l, .sample == i)
  xlda <- MASS::lda(Group~., data=x[,1:42])
  xp <- MASS:::predict.lda(xlda, x, dimen=2)$x
  wasps_l$LD1[wasps_l$.sample == i] <- xp[,1]
  wasps_l$LD2[wasps_l$.sample == i] <- xp[,2]
}
ggplot(wasps_l, 
       aes(x=LD1, 
           y=LD2, 
           color=Group)) + 
  geom_point(alpha=0.8) +
  facet_wrap(~.sample, ncol=5) +
  scale_color_discrete_divergingx(palette="Zissou 1") +
  theme(legend.position = "none",
        axis.title = element_blank(),
        axis.text = element_blank(),
        panel.grid.major = element_blank())
```
:::

## Lineup example 2 <span style="font-size: 70%;"> (2/2) </span>

If 10 people are shown this lineup and 1 picked the data plot (position 6), which is the data plot, the $p$-value will be large.

```{r}
#| echo: true
pvisual(1, 10, 15)
```

This means we would NOT reject $H_o$ and conclude that there is NO difference in the distribution of groups. 

## What is the $p$-value?

```{r}
#| label: coin
head <- '<img src="images/heads.png" height = "70px" style="vertical-align:middle;">'
tail <- '<img src="images/tails.png" height = "70px" style="vertical-align:middle;">'
```


- Suppose $X$ is the number of nandus out of $n$ independent tosses.
- Let $p$ be the probability of getting a `r head` for this coin.
- **Hypotheses**: $H_0: p = 0.5$ vs. $H_a: p > 0.5$. <br> Alternative $H_a$ is saying we believe that the coin is biased to nandus. <br> Alternative needs to be decided before seeing data. 
- **Assumption**: Each toss is independent with equal chance of getting a nandu. 

<!--
**Test statistic**: $X \sim B(n, p)$. Recall $E(X\mid H_0) = np_0$.<br> We observe $n, x, \widehat{p}$. Test statistic is $\widehat{p} - p_0$.
- **P-value**: $P(X ~ \geq ~ x\mid H_0)$ 
- Conclusion**: Reject null hypothesis when the $p$-value is less than<br> some significance level $\alpha$. Usually $\alpha = 0.05$.
-->

## What is the $p$-value?

- Suppose I have a coin that I'm going to flip `r tail` `r head` 
- **Experiment 1**: I flipped the coin 10 times and this is the result:

<center>
```{r coin-bias, results='asis'}
set.seed(924)
samp10 <- sample(rep(c(head, tail), c(7, 3)))
cat(paste0(samp10, collapse = ""))
```
</center>
- The result is 7 nandus and 3 tails. So 70% are nandus. 
- Do you believe the coin is biased based on this data?


## What is the $p$-value?

- **Experiment 2**: Suppose now I flip the coin 100 times and this is the outcome:

```{r coin-bias100, results='asis'}
samp100 <- sample(rep(c(head, tail), c(70, 30)))
cat(paste0(samp100[1:42], collapse = ""))
```
- We observe 70 nandus and 30 tails. So again 70% are nandus. 
- Based on this data, do you think the coin is biased?

# Calculate the $p$-value

:::: {.columns}

::: {.column width=50% .f70}

**Experiment 1 (n=10)**

- We observed $x=7$, or $\widehat{p} = 0.7$.
- Assuming $H_0$ is true, we expect $np=10\times 0.5=5$.
- Calculate the $P(X \geq 7)$

<br>
<br>

```{r echo=TRUE}
sum(dbinom(7:10, 10, 0.5))
```


:::

::: {.column width=50% .f70}

**Experiment 2 (n=100)**

- We observed $x=70$, or $\widehat{p} = 0.7$.
- Assuming $H_0$ is true, we expect $np=100\times 0.5=50$.
- Calculate the $P(X \geq 70)$

<br>
<br>

```{r echo=TRUE}
sum(dbinom(70:100, 100, 0.5))
```

:::

::::

## Lineup $p$-value and power

Suppose $x$ out of $n$ people detected the data plot from a lineup, then
the **visual inference p-value** is given as $P(X \geq x)$ where $X \sim B(n, 1/m)$, but

::: {.fragment}
the assumption of independence is not strictly satisfied, if people are shown the same lineup. So the $p$-value is computed by simulation with 

```{r}
#| eval: false
#| echo: true
nullabor::pvisual()
```

:::

::: {.fragment}
<br> and the **power of a lineup** is estimated as $x/n$. We'll use this to compare the signal strengths for different plot designs. *Stay tuned!*
:::

## Lineup example 3 <span style="font-size: 70%;"> (1/2) </span>

:::: {.columns}

::: {.column width=15%}
<br><br>Which plot is the most different?
:::

::: {.column width=85%}

```{r}
#| fig-width: 8
#| fig-height: 5.5
#| out-width: 90%
set.seed(357)
ggplot(lineup(null_dist("temp", "exp", 
                        list(rate = 1 / mean(dslabs::stars$temp))), 
                        stars, n=15),
       aes(x=temp)) +
  geom_density(fill="black", alpha=0.7) +
  facet_wrap(~.sample, ncol=5, scales="free") +
  theme(axis.title = element_blank(),
        axis.text = element_blank())
```

:::
::::

## Lineup example 3 <span style="font-size: 70%;"> (2/2) </span>

:::: {.columns}

::: {.column width=50%}

Plot description was:

```{r}
#| eval: false
#| echo: true
ggplot(stars, aes(x=temp)) +
  geom_density()
```

::: {.fragment}
In particular, the researcher is interested to know if star temperature is a skewed distribution.
:::

::: {.fragment}

$H_o: X\sim exp(\widehat{\lambda})$ <br>
$H_a:$ it has a different distribution.
:::

:::

::: {.column width=50% .fragment}

::: {.fragment}
Generate the lineup with:

```{r}
#| eval: false
#| echo: true
lineup(null_dist("temp", "exp", 
  list(rate = 1 / 
         mean(dslabs::stars$temp))), 
  stars, n=15)
```

:::

::: {.fragment}

<br><br>Compute the $p$-value based on your responses to the lineup (previous slide).


```{r}
#| eval: false
#| echo: true
pvisual(n=??, k=??, m=15)
```
:::

:::

::::

## Lineup example 4 <span style="font-size: 70%;"> (1/2) </span>

:::: {.columns}

::: {.column width=15%}
<br><br>Which row of plots is the most different?
:::

::: {.column width=85%}

```{r}
tb <- read_csv(here::here("data/TB_notifications_2023-08-21.csv"))
tb_aus <- tb %>%
  filter(iso3 == "AUS") %>%
  filter(year > 1996)
tb_aus_sa <- tb_aus %>%
  filter(year > 2012) %>%
  select(iso3, year, 
         newrel_f014:newrel_f65, 
         newrel_m014:newrel_m65) %>%
  pivot_longer(cols=newrel_f014:newrel_m65,
               names_to = "sex_age", 
               values_to = "count") %>%
  filter(!is.na(count)) %>%
  separate(sex_age, into=c("stuff", 
                           "sex_age")) %>%
  mutate(sex = str_sub(sex_age, 1, 1),
         age = str_sub(sex_age, 2, 
                       str_length(sex_age))) %>%
  mutate(age = case_when(
    age == "014" ~ "0-14",
    age == "1524" ~ "15-24",
    age == "2534" ~ "25-34",
    age == "3544" ~ "35-44",
    age == "4554" ~ "45-54",
    age == "5564" ~ "55-64",
    age == "65" ~ "65")) %>%
  select(iso3, year, sex, age, count)
```


```{r}
#| fig-width: 8
#| fig-height: 5.5
#| out-width: 90%
l1 <- ggplot(tb_aus_sa, 
       aes(x=year, 
           y=count, 
           fill=sex)) + 
  geom_col(position="fill") +
  facet_wrap(~age, ncol=7) +
  scale_fill_discrete_divergingx(palette="ArmyRose") +
  theme(legend.position="none",
        axis.title = element_blank(),
        axis.text = element_blank()) +
  ggtitle("sample 1")
set.seed(446)
l2 <- tb_aus_sa %>% 
  uncount(count) %>%
  group_by(age) %>%
  mutate(sex = sample(sex, replace = FALSE)) %>%
  count(sex, age, year) %>%
  ggplot(aes(x=year, 
             y=n, 
             fill=sex)) + 
  geom_col(position="fill") +
  facet_wrap(~age, ncol=7) +
  scale_fill_discrete_divergingx(palette="ArmyRose") +
  theme(legend.position="none",
        axis.title = element_blank(),
        axis.text = element_blank()) +
  ggtitle("sample 2")
l3 <- tb_aus_sa %>% 
  uncount(count) %>%
  group_by(age) %>%
  mutate(sex = sample(sex, replace = FALSE)) %>%
  count(sex, age, year) %>%
  ggplot(aes(x=year, 
             y=n, 
             fill=sex)) + 
  geom_col(position="fill") +
  facet_wrap(~age, ncol=7) +
  scale_fill_discrete_divergingx(palette="ArmyRose") +
  theme(legend.position="none",
        axis.title = element_blank(),
        axis.text = element_blank()) +
  ggtitle("sample 3")
l1 + l2 + l3 + plot_layout(ncol=1)
```

:::
::::

## Lineup example 4 <span style="font-size: 70%;"> (2/2) </span>

:::: {.columns}

::: {.column width=50%}

Plot description was:


```{r}
#| eval: false
#| echo: true
ggplot(tb_aus_sa, 
       aes(x=year, 
           y=count, 
           fill=sex)) + 
  geom_col(position="fill") +
  facet_wrap(~age, ncol=7)
```

::: {.fragment}

$H_o:$ Proportion of males and females is the same for each year, conditional on age group <br>
$H_a:$ it's not
:::

:::

::: {.column width=50% .fragment}

::: {.fragment}
Generate the lineup with:

```{r}
#| eval: false
#| echo: true
b_aus_sa %>% 
  uncount(count) %>%
  group_by(age) %>%
  mutate(sex = sample(sex, 
    replace = FALSE)) %>%
  count(sex, age, year)
```

:::

::: {.fragment}

<br><br>Compute the $p$-value based on your responses to the lineup (previous slide).


```{r}
#| eval: false
#| echo: true
pvisual(n=??, k=??, m=3)
```
:::

:::

::::


## Practical considerations

- Testing can be done informally with the `nullabor` package
- For practical use, one should
    - Create multiple lineups for a data plot, different positions, different nulls
    - Show each to different groups of observers
    - Compute $p$-value by combining results from each lineup.
- Crowd-sourcing services include: [Amazon Mechanical Turk](https://www.mturk.com/), [prolific](https://www.prolific.co/), 
[Appen](https://appen.com), [LABVANCED](https://www.labvanced.com/).

## Exercise

Take a moment to look at the `lineup` function documentation. Run the sample code to make a lineup, eg:

```{r}
#| eval: false
#| echo: true
ggplot(lineup(null_permute('mpg'), mtcars), aes(mpg, wt)) +
  geom_point() +
  facet_wrap(~ .sample)
```

or your own.

::: {.f80}
<br>And then the different null sample generating functions: `null_permute`, `null_lm`,  `null_ts`, `null_dist`. 
:::

## Testing for best plot design {.center}

## Steps

1. Decide on plot descriptions, say two possibilities.
2. Using the same data, and same null data create lineups that only differ because of the plot description.
3. Show each lineup to two samples of uninvolved observers (one observer cannot see both lineups).
4. Compute the proportion of each sample who identified the data plot, this is the **signal strength** or statistical power of each plot design.
5. The plot with the greater value is the best design (for that problem).

## If your birthday is between Jan 1 and Jun 30, CLOSE YOUR EYES NOW {.center}

No peeking!

## Plot design example 1A

:::: {.columns}

::: {.column width=15%}
<br><br>Which plot is the most different?
:::

::: {.column width=85%}

```{r}
#| fig-width: 8
#| fig-height: 6
#| out-width: 80%
set.seed(601)
ggplot(lineup(null_permute("year"), tb_aus, n=15), 
       aes(x=year, y=c_newinc)) + 
  geom_point() +
  geom_smooth(se=F, colour="#F5191C") +
  facet_wrap(~.sample, ncol=5) +
  theme(axis.title = element_blank(),
        axis.text = element_blank())
```
:::

::::

## Plot design example 1B

:::: {.columns}

::: {.column width=15%}
<br><br>Which plot is the most different?
:::

::: {.column width=85%}

```{r}
#| fig-width: 8
#| fig-height: 5
#| out-width: 90%
set.seed(601)
ggplot(lineup(null_permute("year"), tb_aus, n=15), 
       aes(x=year, y=c_newinc)) + 
  geom_col(fill="#F5191C") +
  facet_wrap(~.sample, ncol=5) +
  theme(axis.title = element_blank(),
        axis.text = element_blank())
```
:::

::::

## Plot design example 1

This is the pair of plot designs we are evaluating.

```{r}
#| message: false
#| warning: false
#| fig-width: 6
#| fig-align: center
#| out-width: 60%
tb_aus <- tb %>%
  filter(iso3 == "AUS") %>%
  filter(year > 1996)
p1 <- ggplot(tb_aus, aes(x=year, y=c_newinc)) + 
  geom_point() +
  geom_smooth(se=F, colour="#F5191C") +
  scale_x_continuous("Year", breaks = seq(1980, 2020, 10), labels = c("80", "90", "00", "10", "20")) +
  ylab("TB incidence") +
  ggtitle("Plot A")

p2 <- ggplot(tb_aus, aes(x=year, y=c_newinc)) + 
  geom_col(fill="#F5191C") +
  scale_x_continuous("Year", breaks = seq(1980, 2020, 10), labels = c("80", "90", "00", "10", "20")) +
  ylab("TB incidence") +
  ggtitle("Plot B")

p1 + p2
```

Compute signal strength:

```{r}
#| eval: false
#| echo: true
?? / ??
```

## If your birthday is between Jul 1 and Dec 31, CLOSE YOUR EYES NOW {.center}

No peeking!

## Plot design example 2A

:::: {.columns}

::: {.column width=15%}
<br><br>Which plot is the most different?
:::

::: {.column width=85%}

```{r}
#| eval: true
#| fig-width: 8
#| fig-height: 6
library(StatsBombR)
library(SBpitch)
load("data/aus_brazil.rda")
passes = aus_brazil %>%
  filter(type.name=="Pass" & is.na(pass.outcome.name))
#create_Pitch() +  
set.seed(622)
ggplot(data=lineup(null_permute("possession_team.name"),
                         passes, n=12), 
             aes(x=location.x, 
                 y=location.y,
                 colour=possession_team.name))  +
  geom_point(alpha=0.7) +
  scale_color_discrete_divergingx(palette="Temps", nmax=2, rev=TRUE) + 
  facet_wrap(~.sample, ncol=4) +
  coord_equal() +
  theme(legend.position = "none",
        axis.title = element_blank(),
        axis.text = element_blank())
```

:::

::::

## Plot design example 2B

:::: {.columns}

::: {.column width=15%}
<br><br>Which plot is the most different?
:::

::: {.column width=85%}

```{r}
#| eval: true
#| fig-width: 8
#| fig-height: 6
library(StatsBombR)
library(SBpitch)
load("data/aus_brazil.rda")
passes = aus_brazil %>%
  filter(type.name=="Pass" & is.na(pass.outcome.name))
#create_Pitch() +  
set.seed(622)
ggplot(data=lineup(null_permute("possession_team.name"),
                         passes, n=12), 
             aes(x=location.x, 
                 y=location.y,
                 colour=possession_team.name))  +
  geom_density2d(alpha=0.7) +
  scale_color_discrete_divergingx(palette="Temps", nmax=2, rev=TRUE) + 
  facet_wrap(~.sample, ncol=4) +
  coord_equal() +
  theme(legend.position = "none",
        axis.title = element_blank(),
        axis.text = element_blank())
```

:::

::::

## Plot design example 2

This is the pair of plot designs we are evaluating. Comparing the positions at which passes were made by both teams.

```{r}
#| message: false
#| warning: false
#| fig-width: 8
#| fig-height: 3
#| fig-align: center
#| out-width: 60%
p1 <- create_Pitch() + 
  geom_point(data=passes, 
             aes(x=location.x, 
                 y=location.y,
                 colour=possession_team.name)) +
  scale_color_discrete_divergingx(palette="Temps", nmax=2, rev=TRUE) + 
  coord_equal() +
  theme(legend.title = element_blank(),
        legend.position = "bottom",
        legend.direction = "horizontal",
        legend.text = element_text(size="8"),
        axis.line.x = element_blank(),
        axis.line.y = element_blank()) +
  ggtitle("Plot A")
p2 <- create_Pitch() + 
  geom_density2d(data=passes, 
             aes(x=location.x, 
                 y=location.y,
                 colour=possession_team.name)) +
  scale_color_discrete_divergingx(palette="Temps", nmax=2, rev=TRUE) + 
  coord_equal() +
  theme(legend.title = element_blank(),
        legend.position = "bottom",
        legend.direction = "horizontal",
        legend.text = element_text(size="8"),
        axis.line.x = element_blank(),
        axis.line.y = element_blank()) +
  ggtitle("Plot B")

p1 + p2
```

Compute signal strength:

```{r}
#| eval: false
#| echo: true
?? / ??
```

## Exercise

For the star temperature data, where we used this plot design

```{r}
#| eval: false
#| echo: true
ggplot(stars, aes(x=temp)) +
  geom_density()
```

create a lineup with a different design, that you think might reveal the data distribution as different from null samples, better than the density plot. Possibilities could include `geom_histogram`, `ggbeeswarm::quasirandom`, `lvplot::geom_lv`.

```{r}
#| eval: false
#| code-fold: true
#| code-summary: "Code for answer"
ggplot(lineup(null_dist("temp", "exp", 
  list(rate = 1 / 
         mean(dslabs::stars$temp))), 
  stars, n=15), aes(x=temp)) +
  geom_density(fill="black", alpha=0.7) + 
  facet_wrap(~.sample, ncol=5, scales="free") +
  theme(axis.title = element_blank(),
        axis.text = element_blank())
ggplot(lineup(null_dist("temp", "exp", 
  list(rate = 1 / 
         mean(dslabs::stars$temp))), 
  stars, n=15), aes(x=temp)) +
  geom_histogram(bins = 15) + 
  facet_wrap(~.sample, ncol=5, scales="free") +
  theme(axis.title = element_blank(),
        axis.text = element_blank())
ggplot(lineup(null_dist("temp", "exp", 
  list(rate = 1 / 
         mean(dslabs::stars$temp))), 
  stars, n=15), aes(x=.sample, y=temp)) +
  scale_x_continuous("sample", breaks=1:15) +
  geom_quasirandom() +
  theme(axis.title.y = element_blank(),
        axis.text.y = element_blank())
```

## Take-aways

- Think about what variability or associations your plot will be assessing
- What would be not interesting pattern(s)
- How would you generate data with where there is **no interesting** pattern
- Plot samples of null data to understand what patterns occur by chance
- With lineups you can compute the 
    - significance of a pattern
    - strength of the signal in one type of display relative to another

## Why

:::: {.columns}
::: {.column width=70%}
![](images/expertise.png)
:::

::: {.column width=30%}

::: {style="font-size: 80%;"}
Do you see any clusters here?

::: {.fragment}
<br><br>
My colleague does, but I don't.
:::
:::

:::

::::
## Wrap up {.center}

## References

::: {.f60}
- Wickham, Cook, Hofmann, Buja (2010) Graphical Inference for Infovis, IEEE TVCG, https://doi.org/10.1109/TVCG.2010.161.
- Hofmann, Follett, Majumder, Cook (2012) Graphical Tests for Power Comparison of Competing Designs, IEEE TVCG, https://doi.org/10.1109/TVCG.2012.230.
- Buja, Cook, Hofmann, Lawrence, Lee EK, Swayne, Wickham (2009) Statistical inference for exploratory data analysis and model diagnostics, https://doi.org/10.1098/rsta.2009.0120.
- Majumder, Hofmann, Cook (2013) Validation of visual statistical inference, applied to linear models, https://doi.org/10.1080/01621459.2013.808157.
- VanderPlas, Rottger, Cook, Hofmann (2021) Statistical significance calculations for scenarios in visual inference, https://doi.org/10.1002/sta4.337.
:::

<a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License</a>.