-
Notifications
You must be signed in to change notification settings - Fork 1
/
10-Hypothesis-Testing-Slides.qmd
439 lines (331 loc) · 24.9 KB
/
10-Hypothesis-Testing-Slides.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
---
title: "Hypothesis Testing"
author:
- name: A. Stamm
affiliations:
- ref: lmjl
corresponding: true
orcid: 0000-0002-8725-3654
affiliations:
- id: lmjl
name: Department of Mathematics Jean Leray, UMR CNRS 6629, Nantes University, Ecole Centrale de Nantes
city: Nantes
country: France
format:
revealjs:
footer: "Data Science with R - <[email protected]> - <https://astamm.github.io/data-science-with-r/>"
slide-number: true
logo: ../logo-lmjl.png
theme: simple
code-annotations: select
chalkboard: true
include-in-header:
text: |
<style>
.v-center-container {
display: flex;
justify-content: center;
align-items: center;
height: 90%;
}
</style>
lightbox: true
---
```{r setup, include=FALSE}
library(tidyverse)
knitr::opts_chunk$set(comment = NA)
```
# Statistical Framework
## General data modeling framework
::: {.r-fit-text}
- You have some data $\{x_1, \dots, x_n\}$ that you presume have been sampled **independently** from their corresponding random variables $\{X_1, \dots, X_n\}$;
- You formulate a simple hypothesis $H_0$, called *null* hypothesis, about the distribution from which this data were sampled;
- You want to confront this hypothesis against an *alternative* hypothesis $H_a$ using your **finite** amount of data;
- You use a *test statistic* $T(X_1, \dots, X_n)$ that depends on the sample (and thus that you can calculate anytime you observe a sample) and of which you know (an approximation of) the distribution when you assume that your null hypothesis is true; it is called the **null distribution** of the test statistic $T$;
- You compute the value $t_0$ of this test statistic with your observed data;
- You strongly reject $H_0$ if $t_0$ falls on the tails of the null distribution of $T$; **or**,
- You lack evidence to reject $H_0$ if $t_0$ ends up in the central part of the null distribution of $T$.
- **Warning:** It is straightforward from this setup to understand that the problem is not symmetric in the hypotheses. Indeed, the procedure relies on what happens when $H_0$ is true but does not depend on $H_a$.
:::
# Test Statistic
## Theoretical aspects
::: {.r-fit-text}
You design a **test statistic** $T(X_1, \dots, X_n)$ for the purpose of performing this test which must satisfy at least the first three of the following four properties:
- It evaluates to real values;
- You have all the required knowledge to compute its value once you observe the data;
- **If** $H_0$ is true, then small values of the statistic should comfort you with the idea that $H_0$ is a reasonable assumption, while larger values of the statistic should generate suspicion about $H_0$ **in favor** of the alternative hypothesis $H_a$;
- \[*optional*\] **If** $H_0$ is true, then it can be very helpful to have access to the (asymptotic) distribution of the test statistic under *classical* assumptions (such as normality).
:::
## Example: Test on the mean
::: {.r-fit-text}
Suppose you have a sample of $n$ *i.i.d.* random variables $X_1, \dots, X_n \sim \mathcal{N}(\mu, \sigma^2)$ and you know the value of $\sigma^2$. We want to test whether the mean $\mu$ of the distribution is equal to some pre-specified value $\mu_0$. We can therefore use $H_0: \mu = \mu_0$ vs. $H_a: \mu \ne \mu_0$. At this point, a good candidate test statistic to look at for performing this test is: $$ T(X_1, \dots, X_n) := \sqrt{n} \frac{\overline X - \mu_0}{\sigma}, \quad \mbox{with} \quad \overline X := \frac{1}{n} \sum_{i=1}^n X_i. $$
- It evaluates to real values;
- You have all the required knowledge to compute its value once you observe the data ($\overline x$);
- The sample mean $\overline X$ is an unbiased estimator of the true unknown mean $\mu$; so, if $H_0$ is true, then $\overline X$ will produce values that are close to $\mu_0$; hence, small values of $T$ will comfort you with the idea that $H_0$ is a reasonable assumption, while larger values, both positive or negative, of the statistic should generate suspicion about $H_0$ **in favor** of the alternative hypothesis $H_a$;
- If you assume normality and independence of the sample, then $T \sim \mathcal{N}(0, 1)$ under $H_0$.
:::
## Distribution of the test statistic under $H_0$
::: {.r-fit-text}
- **Parametric testing.** If you designed your test statistic carefully, you might have access to its theoretical distribution when $H_0$ is true under distributional assumptions about the data. This is called **parametric** hypothesis testing.
- **Asymptotic testing.** If it is not the case, you can often derive the theoretical distribution of the statistic under the null hypothesis asymptotically, i.e. assuming that you have a large sample ($n \gg 1$); this is called **asymptotic** hypothesis testing.
- **Bootstrap testing.** If you are in a large sample size regime but still cannot have access to the theoretical distribution of your test statistic, you can approach this distribution using bootstrapping; this is called **bootstrap** hypothesis testing.
- **Permutation testing.** If you are in a low sample size regime, then you can approach the distribution of the test statistic using permutations; this is called **permutation** hypothesis testing.
:::
## Two-sided vs. one-sided hypothesis tests
::: {.r-fit-text}
Depending on what you put into the alternative hypothesis $H_a$, larger values of the test statistic that raise suspicion regarding the validity of $H_0$ might mean:
- larger values only on the right tail of the null distribution of $T$;
- larger values only on the left tail of the null distribution of $T$;
- larger values on both tails.
In the first two cases, we say that the test is **one-sided**. In the latter case, we say that the test is **two-sided** because we interpret large values in both tails as suspicious.
:::
## Example: Test on the mean (*continued*)
::: {.r-fit-text}
Suppose you have a sample of $n$ *i.i.d.* random variables $X_1, \dots, X_n \sim \mathcal{N}(\mu, \sigma^2)$ and you know the value of $\sigma^2$. We want to test whether the mean $\mu$ of the distribution is equal to some pre-specified value $\mu_0$. As we have seen, a good candidate test statistic to look at for performing this test is:
$$ T(X_1, \dots, X_n) := \sqrt{n} \frac{\overline X - \mu_0}{\sigma}, \quad \mbox{with} \quad \overline X := \frac{1}{n} \sum_{i=1}^n X_i. $$
Now, using this test statistic, we might be interested in performing three different tests:
1. $H_0: \mu = \mu_0$ vs. $H_a: \mu > \mu_0$;
2. $H_0: \mu = \mu_0$ vs. $H_a: \mu < \mu_0$;
3. $H_0: \mu = \mu_0$ vs. $H_a: \mu \ne \mu_0$.
:::
## $H_0: \mu = \mu_0$ vs. $H_a: \mu > \mu_0$
::: {.r-fit-text}
$$ T(X_1, \dots, X_n) := \sqrt{n} \frac{\overline X - \mu_0}{\sigma} $$ Remember that we must look at large values of $T$ that raise suspicion **in favor** of $H_a$.
```{r test-right-tail, echo=FALSE, fig.align='center'}
knitr::include_graphics("images/right-tail-test.jpeg")
```
- $\overline X$ is an unbiased estimator of the true mean.
- Large negative values of $T$ that happen when the true mean is far less than $\mu_0$ **does not** raise suspicion **in favor** of $H_a$.
- Large positive values of $T$ that happen when the true mean is far more than $\mu_0$ **does** raise suspicion **in favor** of $H_a$.
- **Conclusion:** we are interested only in the **right** tail of the null distribution of $T$ to find evidence to reject $H_0$.
:::
## $H_0: \mu = \mu_0$ vs. $H_a: \mu < \mu_0$
::: {.r-fit-text}
$$ T(X_1, \dots, X_n) := \sqrt{n} \frac{\overline X - \mu_0}{\sigma} $$ Remember that we must look at large values of $T$ that raise suspicion **in favor** of $H_a$.
```{r test-left-tail, echo=FALSE, fig.align='center'}
knitr::include_graphics("images/left-tail-test.jpeg")
```
- $\overline X$ is an unbiased estimator of the true mean.
- Large negative values of $T$ that happen when the true mean is far more than $\mu_0$ **does not** raise suspicion **in favor** of $H_a$.
- Large positive values of $T$ that happen when the true mean is far less than $\mu_0$ **does** raise suspicion **in favor** of $H_a$.
- **Conclusion:** we are interested only in the **left** tail of the null distribution of $T$ to find evidence to reject $H_0$.
:::
## $H_0: \mu = \mu_0$ vs. $H_a: \mu \ne \mu_0$
::: {.r-fit-text}
$$ T(X_1, \dots, X_n) := \sqrt{n} \frac{\overline X - \mu_0}{\sigma} $$ Remember that we must look at large values of $T$ that raise suspicion **in favor** of $H_a$.
```{r test-two-tail, echo=FALSE, fig.align='center'}
knitr::include_graphics("images/two-tail-test.jpeg")
```
- $\overline X$ is an unbiased estimator of the true mean.
- Large negative values of $T$ that happen when the true mean is far more than $\mu_0$ **does** raise suspicion **in favor** of $H_a$.
- Large positive values of $T$ that happen when the true mean is far less than $\mu_0$ **does** raise suspicion **in favor** of $H_a$.
- **Conclusion:** we are interested in both the **left** and the **right** tails of the null distribution of $T$ to find evidence to reject $H_0$.
:::
# Making a decision
## Type I and Type II errors
::: {.r-fit-text}
So we are now at a point where we know which tail(s) which we should look at and we now need to make a decision as to what *large* means. In other words, above which **threshold** on all possible values of my test statistic should I consider that I can reject $H_0$.
- Notice that you are going to take this decision based on the null distribution.
- When you decide to reject or not, you might make an error:
```{r error-table, echo=FALSE, message=FALSE}
library(tidyverse)
library(kableExtra)
tibble(
Decision = c("Do not reject \\(H_0\\)", "Reject \\(H_0\\)"),
`\\(H_0\\) is true` = c("Well done", "Type I error"),
`\\(H_a\\) is true` = c("Type II error", "Well done")
) %>%
kbl(format = "html", align = 'c', escape = FALSE) %>%
kable_styling(position = "center")
```
- The only error rate you can control is the type I error rate because it is a probability computed **assuming that the null hypothesis is true**, which is exactly the situation we put ourselves in for making the decision (i.e. looking at the tails of the null distribution of $T$).
:::
## Significance level
::: {.r-fit-text}
- At this point, you can decide that you do not want to make more than a certain amount of type I errors. So you want to force that $\mathbb{P}_{H_0}(\mbox{reject } H_0) \le \alpha$, for some upper bound threshold $\alpha$ on the probability of type I errors. This threshold is called **significance level** of the test and is often denoted by the greek letter $\alpha$.
- Let us now translate what this rule implies for the right-tail alternative case. The event "$\mbox{reject } H_0$" translates in this case into $T > x$ for some $x$ value of the test statistic $T$. Hence, the rule becomes: $$ \mathbb{P}_{H_0}(T > x) \le \alpha \\ \Leftrightarrow 1 - \mathbb{P}_{H_0}(T \le x) \le \alpha \\ \Leftrightarrow 1 - F_T^{\left(H_0\right)}(x) \le \alpha \\ \Leftrightarrow F_T^{\left(H_0\right)}(x) \ge 1 - \alpha $$ verified for all $x \ge q_{1-\alpha}$, where $q_{1-\alpha}$ is the quantile of order $1-\alpha$ of the null distribution of $T$.
:::
## Significance level (*continued*)
::: {.r-fit-text}
**Decision-making rule**:
- We reject $H_0$ if the value $t_0$ of the test statistic computed on the observed sample is greater than the quantile of order $1-\alpha$ of the null distribution of $T$;
- We decide that we lack evidence to reject $H_0$ if the value $t_0$ of the test statistic calculated on the observed sample is smaller than the quantile of order $1-\alpha$ of the null distribution of $T$.
- This decision-making rule guarantees that the probability of making a type I error is upper-bounded by the significance level $\alpha$.
:::
## p-value {.flexbox .vcenter}
::: {.r-fit-text}
- **Definition.** The **p-value** is a scalar value between $0$ and $1$ that measures *what was the probability, assuming that the null hypothesis* $H_0$ is true, of observing the data we did observe, or data even more in favor of the alternative hypothesis.
- **Mathematical expression.** If $t_0$ is the value of the test statistic computed from the observed sample, then:
- $p = \mathbb{P}_{H_0}(T > t_0)$ for right-tail hypothesis tests (e.g. $H_a: \mu > \mu_0$ when testing the mean);
- $p = \mathbb{P}_{H_0}(T < t_0)$ for left-tail hypothesis tests (e.g. $H_a: \mu < \mu_0$ when testing the mean);
- $p = 2 \min\left( \mathbb{P}_{H_0} \left( T > t_0 \right), \mathbb{P}_{H_0} \left( T < t_0 \right) \right)$ for two-tail hypothesis tests (e.g. $H_a: \mu \ne \mu_0$ when testing the mean).
- **Interpretation.** If the $p$-value is very small, it means that
- *either* we observed a miracle,
- *or* the null hypothesis might be wrong.
- **Decision-making rule.** We can show that rejecting the null hypothesis when $p \le \alpha$ also produces a decision-making rule that guarantees a probability of type I error at most $\alpha$.
:::
## Decision-Making: Summary (right-tail scenario)
```{r decision-summary}
#| echo: false
#| fig-align: center
#| out-width: 65%
knitr::include_graphics("images/pvalue-alpha.png")
```
## Statistical power of a test
::: {.r-fit-text}
- **Definition.** It is the probability of correctly rejecting the null hypothesis, *i.e.* to reject it when the alternative is in fact correct. It is often denoted by $\mu$. In terms of events, it is defined by: $$ \mu := \mathbb{P}_{H_a}(\mbox{Reject } H_0). $$
- **Usage.** The statistical power of a test is an important aspect of the test because:
- it is an important performance indicator to compare different testing procedures (observe that there is not a unique statistic to perform a given test);
- it is often used in clinical trials or other types of trials (e.g. crash tests) to calibrate the number of observations required to achieve a given statistical power.
- **Remarks.**
- The statistical power $\mu$ is equal to $1 - \beta$, where $\beta$ is the greek letter often used to designate the probability of type II errors, $\beta := \mathbb{P}_{H_a}(\mbox{Do not reject } H_0)$.
- Power calculations are difficult because it requires to put ourselves under $H_a$, which is often of the form $\mu > \mu_0$ or $\mu < \mu_0$ or $\mu \ne \mu_0$. In other words, you often lack information to compute probabilities assuming that the alternative hypothesis is true. You have to assess how the power changes as you explore different alternatives.
:::
# Compliance testing
::: {.r-fit-text}
Compliance testing aims at determining whether a process, product, or service complies with the requirements of a specification, technical standard, contract, or regulation.
:::
## Testing the mean
::: {.r-fit-text}
- **Model.** Let $(X_1, \dots, X_n)$ be $n$ *i.i.d.* random variables that follow a normal distribution $\mathcal{N}(\mu, \sigma^2)$.
- **Hypotheses.** $$ H_0: \mu = \mu_0 \quad \mbox{vs.} \quad H_a: \mu \ne \mu_0 \quad \mbox{or} \quad H_a: \mu > \mu_0 \quad \mbox{or} \quad H_a: \mu < \mu_0. $$
- **Test Statistic.** $$ Z_n = \sqrt{n} \frac{\overline X_n - \mu_0}{\sigma}. $$
- **Problem.** Under the null hypothesis, I do not have complete knowledge to compute the test statistic because I do not know the value of $\sigma$.
:::
## Testing the mean with known variance
::: {.r-fit-text}
- **Problem solved.** I now have complete knowledge to compute the test statistic.
- **Null distribution.** The null distribution is $$ Z_n \stackrel{H_0}{\sim} \mathcal{N}(0, 1). $$
- **R function.** None
:::
## Testing the mean with unknown variance
::: {.r-fit-text}
- **Problem solved.** I plug in the empirical standard deviation instead of $\sigma$ in the definition of the test statistic.
- **Test Statistic.** $$ T_n = \frac{\overline X_n - \mu_0}{\sqrt{\frac{S_{n-1}^2}{n}}}, \quad \mbox{with} \quad S_{n-1}^2 := \frac{1}{n-1} \sum_{i = 1}^n (X_i - \overline X)^2. $$
- **Null distribution.** $$ T_n \stackrel{H_0}{\sim} \mathcal{S}tudent(n-1). $$
- **R function.** [t.test()](https://stat.ethz.ch/R-manual/R-devel/library/stats/html/t.test.html)
:::
## Testing the variance
::: {.r-fit-text}
- **Assumptions.** Let $(X_1, \dots, X_n)$ be $n$ *i.i.d.* random variables that follow a normal distribution $\mathcal{N}(\mu, \sigma^2)$.
- **Hypotheses.** $$ H_0: \sigma^2 = \sigma_0^2 \quad \mbox{vs.} \quad H_a: \sigma^2 \ne \sigma_0^2 \quad \mbox{or} \quad H_a: \sigma^2 > \sigma_0^2 \quad \mbox{or} \quad H_a: \sigma^2 < \sigma_0^2. $$
- **Test Statistic.** $$ U_n = \sum_{i = 1}^n \frac{(X_i - \mu)^2}{\sigma_0^2}. $$
- **Problem.** Under the null hypothesis, I do not have complete knowledge to compute the test statistic because I do not know the value of $\mu$.
:::
## Testing the variance with known mean
::: {.r-fit-text}
- **Problem solved.** I now have complete knowledge to compute the test statistic.
- **Null distribution.** The null distribution is $$ U_n \stackrel{H_0}{\sim} \chi^2(n). $$
- **R function.** None.
:::
## Testing the variance with unknown mean
::: {.r-fit-text}
- **Problem solved.** I plug in the empirical mean instead of $\mu$ in the definition of the test statistic.
- **Test Statistic.** $$ U_{n-1} = \sum_{i = 1}^n \frac{\left( X_i - \overline X_n \right)^2}{\sigma_0^2}. $$
- **Null distribution.** $$ U_{n-1} \stackrel{H_0}{\sim} \chi^2(n-1). $$
- **R function.** None.
:::
## Testing a proportion
::: {.r-fit-text}
Here we want to test whether the proportion $p$ of individuals in a given
population who have a feature of interest is equal to a pre-specified rate.
- **Model.** Let $(X_1, \dots, X_n)$ be $n$ *i.i.d.* random variables that follow a Bernoulli distribution $\mathcal{B}e(p)$. The interpretation is that $X_i$ measures if individual $i$ possesses the characteristic of which we want to know the proportion or not.
- **Hypotheses.** $$ H_0: p = p_0 \quad \mbox{vs.} \quad H_a: p \ne p_0 \quad \mbox{or} \quad H_a: p > p_0 \quad \mbox{or} \quad H_a: p < p_0. $$
- **Test Statistic.** $$ B_n = \sum_{i=1}^n X_i $$
- **Null distribution.** $$ B_n \sim \mathcal{B}inom(n,p_0) $$
- **R function.** [binom.test()](https://stat.ethz.ch/R-manual/R-devel/library/stats/html/binom.test.html)
:::
# Two-Sample Testing
::: {.r-fit-text}
Hypothesis tests for comparing the distributions that generated two independent samples.
:::
## Testing for variance differences
::: {.r-fit-text}
- **Model.** Let $(X_1, \dots, X_{n_x})$ be $n_X$ *i.i.d.* random variables that follow a normal distribution $\mathcal{N}(\mu_X, \sigma_X^2)$ and $(Y_1, \dots, Y_{n_Y})$ be $n_Y$ *i.i.d.* random variables that follow a normal distribution $\mathcal{N}(\mu_Y, \sigma_Y^2)$. Assume furthermore that the two samples are statistically independent.
- **Hypotheses.** $$ H_0: \sigma_X^2 = \sigma_Y^2 \quad \mbox{vs.} \quad H_a: \sigma_X^2 \ne \sigma_Y^2 \quad \mbox{or} \quad H_a: \sigma_X^2 > \sigma_Y^2 \quad \mbox{or} \quad H_a: \sigma_X^2 < \sigma_Y^2 $$
- **Test Statistic.** $$ V_n = \frac{S_X^2}{S_Y^2}, \quad \mbox{with} \quad S_X^2 = \frac{1}{n_X - 1} \sum_{i = 1}^{n_X} (X_i - \overline X_n)^2 \quad \mbox{and} \quad S_Y^2 = \frac{1}{n_Y - 1} \sum_{i = 1}^{n_Y} (Y_i - \overline Y_n)^2 $$
- **Null distribution.** $$ V_n \stackrel{H_0}{\sim} \mathcal{F}isher(n_X - 1, n_Y - 1) $$
- **R function.** [var.test()](https://stat.ethz.ch/R-manual/R-devel/library/stats/html/var.test.html)
- **Validity.** Relies on the normality assumption and independence within and between samples.
:::
## Testing for mean differences
::: {.r-fit-text}
- **Model.** Let $(X_1, \dots, X_{n_x})$ be $n_X$ *i.i.d.* random variables that follow a normal distribution $\mathcal{N}(\mu_X, \sigma_X^2)$ and $(Y_1, \dots, Y_{n_Y})$ be $n_Y$ *i.i.d.* random variables that follow a normal distribution $\mathcal{N}(\mu_Y, \sigma_Y^2)$. Assume furthermore that the two samples are statistically independent.
- **Hypotheses.** $$ H_0: \mu_X = \mu_Y \quad \mbox{vs.} \quad H_a: \mu_X \ne \mu_Y \quad \mbox{or} \quad H_a: \mu_X > \mu_Y \quad \mbox{or} \quad H_a: \mu_X < \mu_Y $$
- **R function.** [t.test()](https://stat.ethz.ch/R-manual/R-devel/library/stats/html/t.test.html)
- **Validity.** Relies on the normality assumption and independence within and between samples.
:::
## Testing for mean differences when variances are equal
::: {.r-fit-text}
- **Test Statistic.** Let $\delta = \mu_X - \mu_Y$ be the mean difference and $\delta_0$ be the assumed mean difference under $H_0$. Then,
$$
T_n = \frac{\left( \overline X_n - \overline Y_n \right) - \delta_0}{\sqrt{S_\mathrm{pooled}^2 \left( \frac{1}{n_X} + \frac{1}{n_Y} \right)}} \mbox{ with } S_\mathrm{pooled}^2 = \frac{(n_X - 1) S_X^2 + (n_Y - 1) S_Y^2}{n_X + n_Y - 2}
$$
- **Null distribution.**
$$
T_n \stackrel{H_0}{\sim} \mathcal{S}tudent(n_X + n_Y - 2)
$$
:::
## Testing for mean differences when variances are not equal
::: {.r-fit-text}
- **Test Statistic.**
$$
T_n = \frac{\left( \overline X_n - \overline Y_n \right) - \delta_0}{\sqrt{\frac{S_X^2}{n_X} + \frac{S_Y^2}{n_Y}}}
$$
- **Null distribution.**
$$
T_n \stackrel{H_0}{\sim} \mathcal{S}tudent(m) \mbox{ with } m = \frac{\left( \frac{S_X^2}{n_X} + \frac{S_Y^2}{n_Y} \right)^2}{\frac{\left( \frac{S_X^2}{n_X} \right)^2}{n_X-1} + \frac{\left( \frac{S_Y^2}{n_Y} \right)^2}{n_Y-1}}.
$$
:::
# Adequacy testing
::: {.r-fit-text}
Adequacy tests aim at determining if the distribution of the observations is coherent with a given probability distribution.
:::
## Shapiro-Wilk test
::: {.r-fit-text}
- **Hypotheses.** $$ H_0: (X_1, \dots, X_n) \stackrel{i.i.d.}{\sim} \mathcal{N}(\mu, \sigma^2) \quad \mbox{vs.} \quad H_a: (X_1, \dots, X_n) \not\sim \mathcal{N}(\mu, \sigma^2). $$
- **Test Statistic.**
$$
T_n = {\left(\sum_{i=1}^n a_i X_{(i)}\right)^2 \over \sum_{i=1}^n (X_i-\overline{X}_n)^2}
$$
where $X_{(i)}$ is the order statistic for observation $i$, $\overline X_n$ the sample mean and $(a_1, \dots, a_n)$ are weights computed from the first two moments of the order statistics of standard normal variables.
- **Null distribution.** $$ T \stackrel{H_0}{\sim} \mathcal{W}ilks(n). $$
- **R function.** [shapiro.test()](https://stat.ethz.ch/R-manual/R-devel/library/stats/html/shapiro.test.html)
- **Validity.** The sample size should meet $3 \le n \le 5000$. The Wilks distribution is approximated except for $n=3$.
:::
## Kolmogorov-Smirnov test
::: {.r-fit-text}
- **Hypotheses.** $$ H_0: (X_1, \dots, X_n) \stackrel{i.i.d.}{\sim} F_0 \quad \mbox{vs.} \quad H_a: (X_1, \dots, X_n) \not\sim F_0. $$
- **Test Statistic.**
$$
T_n = \sup_{x \in \mathbb R} | F_n(x) - F(x)|
$$
where $F_n$ is the cumulative distribution function and $F$ is the cumulative distribution function of the law under testing.
- **Null distribution.** $$ \sqrt{n} T \xrightarrow{\mathcal{L}} \sup_{x \in \mathbb R} | B(F(x))|, $$ where $B$ is the Brownian bridge.
- **R function.** [ks.test()](https://stat.ethz.ch/R-manual/R-devel/library/stats/html/ks.test.html)
- **Validity.** Asymptotic.
:::
## $\chi^2$ test of adequacy for a single categorical variable
::: {.r-fit-text}
- **Model.** We group observations in classes and we compare the observed frequencies of these classes to the corresponding theoretical frequencies as given by the hypothesized law $F_0$.
- **Hypotheses.** $H_0: (X_1, \dots, X_n) \stackrel{i.i.d.}{\sim} F_0 \quad \mbox{vs.} \quad H_a: (X_1, \dots, X_n) \not\sim F_0$.
- **Test Statistic.**
$$
U_n = n\sum_{i=1}^k\frac{(f_i-f_{0i})^2}{f_{0i}},
$$
where $n$ is the total number of observations, $k$ the number of classes, $f_i = n_i / n$ and $f_{i0}$ the theoretical frequency of class $i$, *i.e.* the probability that the random variable ends up in class $i$.
- **Null distribution.** $$ U_n \xrightarrow{\mathcal{L}} \chi^2_{k - 1 - \ell}, \mbox{ where } \ell \mbox{ is the number of estimated parameters for } F_0.$$
- **R function.** [chisq.test()](https://stat.ethz.ch/R-manual/R-devel/library/stats/html/chisq.test.html)
- **Validity.** Requires large class frequencies. Typically, $n_i \ge 5$.
:::
## $\chi ^2$ test of independence between two categorical variables
::: {.r-fit-text}
- **Model.** Let $X = (Y, Z) = ((Y_1, Z_1), \dots, (Y_n, Z_n))$ be a bivariate sample of $n$ *i.i.d.* pairs of categorical random variables. Let $\nu$ be the law of $(Y_1, Z_1)$, $\mu$ the law of $Y_1$ and $\lambda$ the law of $Z_1$. Let $\{y_1, \dots, y_s\}$ be the set of possible values for $Y_1$ and $\{z_1, \dots, z_r\}$ the set of possible values for $Z_1$. For $\ell \in \{1, \dots, s\}$ et $h \in \{1, \dots, r\}$, let
$$ N_{\ell,\cdot} = \left| \left\{ i \in \{1, \dots, n\}; Y_i = y_\ell \right\} \right|,\quad N_{\cdot, h} = \left| \left\{ i \in \{1, \dots, n\}; Z_i = z_h \right\} \right|, \\ N_{\ell,h} = \left| \left\{(i \in \{1, \dots, n\}; Y_i = y_\ell, Z_i = z_h \right\} \right|.
$$
- **Hypotheses.** $H_0: \nu = \mu \otimes \lambda \quad \mbox{vs.} \quad H_a: \nu \ne \mu \otimes \lambda$.
- **Test statistic.** $$ U_n = n \sum_{\ell = 1}^s \sum_{h = 1}^r \frac{ \left( \frac{N_{\ell, h}}{n} - \frac{N_{\ell, \cdot}}{n} \frac{N_{\cdot, h}}{n} \right)^2 }{ \frac{N_{\ell, \cdot}}{n} \frac{N_{\cdot, h}}{n} }. $$
- **Null distribution.** $U_n \xrightarrow{\mathcal{L}} \chi^2 \left( (s-1)(r-1) \right)$.
- **R function.** [chisq.test()](https://stat.ethz.ch/R-manual/R-devel/library/stats/html/chisq.test.html)
- **Validity.** Asymptotic. Often, $n \gg 30$ and $N_{\ell, h} \gg 5$, for each pair $(\ell, h)$.
:::