Skip to content

Latest commit

 

History

History
274 lines (199 loc) · 8.02 KB

example.org

File metadata and controls

274 lines (199 loc) · 8.02 KB

PSYC 5301 - Lecture 9

Review from last time

Recall that by Bayes Theorem, we have

\[ \underbrace{\frac{p(\mathcal{H}_1\mid \text{data})}{p(\mathcal{H}_0\mid \text{data})}}\substack{\text{posterior odds}} = \underbrace{\frac{p(\mathcal{H}_1)}{p(\mathcal{H}_0)}}\substack{\text{prior odds}} × \underbrace{\frac{p(\text{data}\mid \mathcal{H}_1)}{p(\text{data}\mid \mathcal{H}_0)}}\text{predictive updating factor} \]

The predictive updating factor

\[ B10 = \frac{p(\text{data}\mid \mathcal{H}_1)}{p(\text{data}\mid \mathcal{H}_0)} \]

tells us how much better \(\mathcal{H}_1\) predicts our observed data than \(\mathcal{H}_0\).

This ratio is called the \textbf{Bayes factor}

We can compute Bayes factors for ANOVA models using the BIC:

\[ BIC = Nln (SS\text{residual}) + kln(N) \] where

  • \(N\)=total number of independent observations
  • \(k\)=number of parameters in the model
  • $SS\text{residual}$ = variance NOT explained by the model
  • Note: smaller BIC = better model fit

Steps:

  • set up two models: $\mathcal{H}_0$ and $\mathcal{H}_1$
  • compute BIC (Bayesian information criterion) for each model
  • compute Bayes factor as $\displaystyle{e\frac{Δ BIC{2}}}$

Example

(this is from HW 8, #4)

Treatment 1Treatment 2Treatment 3
157
223
016
124

first, we model as ANOVA:

sourceSSdfMSF
bet tmts32.67216.377.01
within tmts2192.33
total53.6711

We’ll set up our two models:

Null model: $\mathcal{H}_0:μ_1=μ_2=μ_3$

  • this model has $k=1$ parameter (the data is explained by a SINGLE mean)
  • $SS\text{residual} = 53.67$ (the model has only one mean, so all variance is left unexplained)

\begin{align*} BIC &= Nln (SS\text{residual})+kln(N)
&= 12ln(53.67) + 1⋅ ln(12)\ &= 50.28\ \end{align*}

Alternative model: $\mathcal{H}_1:μ_1 ≠\mu_2 ≠ μ_3$

  • this model has $k=3$ parameters (the data is explained by THREE means)
  • $SS\text{residual} = 21$ (the model accounts for variance between treatments with the three means – SS_within is left unexplained)

\begin{align*} BIC &= Nln (SS\text{residual})+kln(N)
&= 12ln(21) + 3⋅ ln(12)\ &= 43.99\ \end{align*}

Thus, \[ B10 = e^\frac{Δ BIC}{2} = e\frac{50.28-43.99{2}} = 23.22 \]

meaning that the data are approximately 23 times more likely under $\mathcal{H}_1$ than $\mathcal{H}_0$

Repeated measures designs

The same ideas will extend to work with repeated measures designs. The only difference is that we need to think carefully about:

  • the number of independent observations
  • residual $SS$

Example

Consider the following example from Exam 1, which asks about task performance as a function of computer desk layout:

SubjectLayout 1Layout 2Layout 3
#1624
#2867
#3369
#4324

Let’s work through the ANOVA model, since it has been a while:

Step 1 - compute condition means AND subject means:

SubjectLayout 1Layout 2Layout 3\(M\)
#16244
#28677
#33696
#43243
\(M\)5465

Remember that once we find $SS\total$, we remove subject variability and partition what’s left:

\begin{align*} SS\text{total} &= ∑ X^2-\frac{(∑ X)^2}{N}
&= 360 - \frac{60^2}{12}\ &= 60 \end{align*}

\begin{align*} SS\text{bet subj} &= n∑i=1^4 (\overline{X}\text{subj i}-\overline{X})^2
&=3\Bigl[(4-5)^2+(7-5)^2+(6-5)^2+(3-5)^2\Bigr]\ &=30\ \end{align*}

\begin{align*} SS\text{bet tmts} &= n∑j=1^3 (\overline{X}\text{tmt j}-\overline{X})^2
&= 4\Bigl[(5-5)^2 + (4-5)^2 + (6-5)^2\Bigr]\ &= 8 \end{align*}

Thus, our ANOVA table is as follows:

SourceSSdfMSF
bet tmts8241.09
residual2263.67
subject30310
total6011

BIC computations

We’ll set up our two models:

Null model: $\mathcal{H}_0:α_1 = α_2 = α_3$

  • this model has $k=1$ parameter (the data is explained by a SINGLE treatment effect)
  • $SS\text{residual} = 30$ (what is left after removing subject variance)
  • $N=8$ independent observations (for each of 4 subjects, there are $3-1=2$ independent observations)
  • Note: general formula: $N=s(c-1)$, where $s=$ number of subjects and $c=$ number of conditions

\begin{align*} BIC &= Nln (SS\text{residual})+kln(N)
&= 8ln(30) + 1⋅ ln(8)\ &= 29.29\ \end{align*}

Alternative model: $\mathcal{H}_1:α_1 ≠\alpha_2 ≠ α_3$

  • this model has $k=3$ parameters (the data is explained by THREE treatment effects)
  • $SS\text{residual} = 22$

\begin{align*} BIC &= Nln (SS\text{residual})+kln(N)
&= 8ln(22) + 3⋅ ln(8)\ &= 30.97\ \end{align*}

Thus, \[ B01 = e^\frac{Δ BIC}{2} = e\frac{30.97-29.81{2}} = 1.79 \]

meaning that the data are approximately 2 times more likely under $\mathcal{H}_0$ than $\mathcal{H}_1$

Some lessons

The previous homework questions give us some lessons about \(p\)-values:

  1. \(p\)-values are uniformly distributed under the null. The implication is that a single \(p\)-value gives us no information about the likelihood of any model
  2. optional stopping inflates Type I error rate.
  3. $p=p(\text{data}\mid \mathcal{H}_0)$. This is NOT equal to $p(\mathcal{H}_0\mid \text{data})$

However, with some cleverness, we can actually calculate $p(\mathcal{H}_0\mid \text{data})$. All we need is Bayes theorem:

Posterior model probabilities

Recall from Bayes theorem:

\[ \frac{p(\mathcal{H}_0\mid \text{data})}{p(\mathcal{H}_1\mid \text{data})} = B01⋅ \frac{p(\mathcal{H}_0)}{p(\mathcal{H}_1)} \]

Let’s assume $p(\mathcal{H}_0)=p(\mathcal{H}_1)$ (that is, $\mathcal{H}_0$ and $\mathcal{H}_1$ are equally likely, a priori).

Then the previous equation reduces to

\[ \frac{p(\mathcal{H}_0\mid \text{data})}{p(\mathcal{H}_1\mid \text{data})} = B01 \]

Then we have:

\begin{align*} p(\mathcal{H}_0\mid \text{data}) &= B01⋅ p(\mathcal{H}_1\mid \text{data})
&= B01\Bigl[1-p(\mathcal{H}_0\mid \text{data})\Bigr]\ &= B01 - B01⋅ p(\mathcal{H}_0\mid \text{data})\ \end{align*}

Let’s solve this equation for $p(\mathcal{H}_0\mid \text{data})$:

\[ p(\mathcal{H}_0\mid \text{data}) + B01⋅ p(\mathcal{H}_0\mid \text{data}) = B01 \]

which implies by factoring:

\[ p(\mathcal{H}_0\mid \text{data})\Bigl[1+B01\Bigr] = B01 \]

or equivalently

\[ p(\mathcal{H}_0\mid \text{data}) = \frac{B01}{1+B01} \]

Note: by the same reasoning, we can prove

\[ p(\mathcal{H}_1\mid \text{data}) = \frac{B10}{1+B10} \]

Let’s compute these for the examples we’ve done tonight:

Example 1: $B10=23.22$

This example that $\mathcal{H}_1$ was a better fit. Thus,

\begin{align*} p(\mathcal{H}_1\mid \text{data}) &= \frac{B10}{1+B10}
&= \frac{23.22}{1+23.22}\ &= 0.959\ \end{align*}

Example 2: $B01=1.79$

This example that $\mathcal{H}_0$ was a better fit. Thus,

\begin{align*} p(\mathcal{H}_0\mid \text{data}) &= \frac{B01}{1+B01}
&= \frac{1.79}{1+1.79}\ &= 0.642 \end{align*}