Skip to content

Commit

Permalink
reran vignettes
Browse files Browse the repository at this point in the history
  • Loading branch information
osorensen committed Sep 16, 2024
1 parent e8143a8 commit 36e0e56
Show file tree
Hide file tree
Showing 17 changed files with 256 additions and 250 deletions.
52 changes: 23 additions & 29 deletions vignettes/glmm_factor.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ vignette: >



```r
``` r
library(galamm)
```

Expand All @@ -28,7 +28,7 @@ As for the vignette on linear mixed models, we thank @rockwoodEstimatingComplexM
We use a simulated dataset from the PLmixed package for this example. The observations are binomial responses representing an ability measurement. The first few lines are shown below.


```r
``` r
library(PLmixed)
head(IRTsim)
#> sid school item y
Expand Down Expand Up @@ -63,14 +63,14 @@ $$
where $N(\mathbf{0}, \boldsymbol{\Psi})$ denotes a bivariate normal distribution with mean zero covariance matrix $\boldsymbol{\Psi}$. The covariance matrix is assumed to be diagonal.


```r
``` r
IRTsim$item <- factor(IRTsim$item)
```

We confirm that `item` has five levels. This means that $\boldsymbol{\lambda}$ is a vector with five elements.


```r
``` r
table(IRTsim$item)
#>
#> 1 2 3 4 5
Expand All @@ -80,7 +80,7 @@ table(IRTsim$item)
For identifiability, we fix the first element of $\boldsymbol{\lambda}$ to one. The rest will be freely estimated. We do this by defining the following matrix. Any numeric value implies that the element is fixed, whereas `NA` implies that the element is unknown, and will be estimated.


```r
``` r
(loading_matrix <- matrix(c(1, NA, NA, NA, NA), ncol = 1))
#> [,1]
#> [1,] 1
Expand All @@ -93,7 +93,7 @@ For identifiability, we fix the first element of $\boldsymbol{\lambda}$ to one.
We fit the model as follows:


```r
``` r
mod <- galamm(
formula = y ~ item + (0 + ability | school / sid),
data = IRTsim,
Expand All @@ -109,7 +109,7 @@ A couple of things in the model formula are worth pointing out. First the part `
We can start by inspecting the fitted model:


```r
``` r
summary(mod)
#> GALAMM fit by maximum marginal likelihood.
#> Formula: y ~ item + (0 + ability | school/sid)
Expand Down Expand Up @@ -148,7 +148,7 @@ summary(mod)
The `fixef` method lets us consider the fixed effects:


```r
``` r
fixef(mod)
#> (Intercept) item2 item3 item4 item5
#> 0.5112006 0.3255941 -0.4491070 0.4929827 0.4584897
Expand All @@ -157,7 +157,7 @@ fixef(mod)
We can also get Wald type confidence intervals for the fixed effects.


```r
``` r
confint(mod, parm = "beta")
#> 2.5 % 97.5 %
#> (Intercept) -0.001727334 1.0241286
Expand All @@ -170,7 +170,7 @@ confint(mod, parm = "beta")
We can similarly extract the factor loadings.


```r
``` r
factor_loadings(mod)
#> ability SE
#> lambda1 1.0000000 NA
Expand All @@ -183,7 +183,7 @@ factor_loadings(mod)
And we can find confidence intervals for them. Currently, only Wald type confidence intervals are available. Be aware that such intervals may have poor coverage properties.


```r
``` r
confint(mod, parm = "lambda")
#> 2.5 % 97.5 %
#> lambda1 0.4516974 1.0223534
Expand All @@ -195,7 +195,7 @@ confint(mod, parm = "lambda")
We can also show a diagnostic plot, although for a binomial model like this it is not very informative.


```r
``` r
plot(mod)
```

Expand All @@ -207,7 +207,7 @@ plot(mod)
We now show how the model studied above can be extended to handle binomially distributed data with multiple trials. We simulate such data by computing predictions from the model fitted above, and drawing binomial samples with multiple trials.


```r
``` r
set.seed(1234)
dat <- IRTsim
dat$trials <- sample(1:10, nrow(dat), replace = TRUE)
Expand All @@ -228,7 +228,7 @@ head(dat)
For binomial models with more than one trial, the response should be specified `cbind(successes, failures)`.


```r
``` r
galamm_mod_trials <- galamm(
formula = cbind(y, trials - y) ~ item + (0 + ability | school / sid),
data = dat,
Expand All @@ -242,7 +242,7 @@ galamm_mod_trials <- galamm(
All the utility functions apply to this model as well. We simply post its summary output here.


```r
``` r
summary(galamm_mod_trials)
#> GALAMM fit by maximum marginal likelihood.
#> Formula: cbind(y, trials - y) ~ item + (0 + ability | school/sid)
Expand Down Expand Up @@ -284,7 +284,7 @@ summary(galamm_mod_trials)
To illustrate the model for counts, we consider an example from Chapter 11.3 in @skrondalGeneralizedLatentVariable2004. The model does not contain factor loadings, but we use it to demonstrate how to fit GLMMs with Poisson distributed responses.


```r
``` r
count_mod <- galamm(
formula = y ~ lbas * treat + lage + v4 + (1 | subj),
data = epilep,
Expand All @@ -295,7 +295,7 @@ count_mod <- galamm(
We can look at the summary output.


```r
``` r
summary(count_mod)
#> GALAMM fit by maximum marginal likelihood.
#> Formula: y ~ lbas * treat + lage + v4 + (1 | subj)
Expand Down Expand Up @@ -327,14 +327,14 @@ summary(count_mod)
In this case there are no factor loadings to return:


```r
``` r
factor_loadings(count_mod)
```

We can again look at a diagnostic plot, which in this case looks much more reasonable.


```r
``` r
plot(count_mod)
```

Expand All @@ -344,14 +344,8 @@ plot(count_mod)
In this case we can confirm that the `galamm` function is correctly implemented by comparing it to the output of `lme4::glmer`. For a model like this, it would also be best to use `lme4`, but with factor structures or other nonlinearities, `lme4` no longer provides the flexibility we need.


```r
``` r
library(lme4)
#> Loading required package: Matrix
#>
#> Attaching package: 'lme4'
#> The following object is masked from 'package:galamm':
#>
#> llikAIC
count_mod_lme4 <- glmer(
formula = y ~ lbas * treat + lage + v4 + (1 | subj),
data = epilep,
Expand All @@ -362,7 +356,7 @@ count_mod_lme4 <- glmer(
We can confirm that the diagnostic plot has the same values as for galamm:


```r
``` r
plot(count_mod_lme4)
```

Expand All @@ -371,7 +365,7 @@ plot(count_mod_lme4)
And we can do the same for the summary output.


```r
``` r
summary(count_mod_lme4)
#> Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
#> Family: poisson ( log )
Expand Down Expand Up @@ -414,7 +408,7 @@ summary(count_mod_lme4)
You might note that the deviance in the summary output of the model fitted by lme4 is different from the deviance of the model fitted by galamm. This is because in the summary output, lme4 shows the deviance as minus two times the log likelihood. In contrast, calling the deviance function on a model object fitted by glmer gives the same output as galamm.


```r
``` r
deviance(count_mod_lme4)
#> [1] 407.0092
```
Expand Down
48 changes: 24 additions & 24 deletions vignettes/latent_observed_interaction.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ vignette: >



```r
``` r
library(galamm)
```

Expand All @@ -26,7 +26,7 @@ This vignette describes how `galamm` can be used to model interactions between l
For this example we use the simulated `latent_covariates` dataset, of which the first six rows are displayed below:


```r
``` r
head(latent_covariates)
#> id type x y response
#> 1 1 measurement1 0.2655087 -0.530999307 0
Expand Down Expand Up @@ -79,7 +79,7 @@ The structural model is simply $\eta = \zeta \sim N(0, \psi)$, where $\psi$ is i
It can be instructive to start by considering a model in which we fix $\\lambda_{4} = 0$. This type of model would be estimated with the following code:


```r
``` r
lambda <- matrix(c(1, NA, NA), ncol = 1)

mod0 <- galamm(
Expand All @@ -94,7 +94,7 @@ mod0 <- galamm(
In the data generating simulations, the true values were $\lambda_{1}=1$, $\lambda_{2} = 1.3$ and $\lambda_{3} = -0.3$. The former two are very well recovered, but the latter is too positive, which is likely due to us omitting the interaction $\lambda_{4}$, whose true value was 0.2.


```r
``` r
summary(mod0)
#> GALAMM fit by maximum marginal likelihood.
#> Formula: y ~ type + x:response + (0 + loading | id)
Expand Down Expand Up @@ -168,14 +168,14 @@ $$
We specify the factor interactions with a list, one for each row of `lambda`:


```r
``` r
factor_interactions <- list(~ 1, ~ 1, ~ x)
```

This specifies that for the first two rows, there are no covariates, but for the third row, we want a linear regression with $x$ as covariate. Next, we specify the loading matrix **without** the interaction parameter, i.e., we reuse the `lambda` object that was specified for `mod0` above. This lets us fit the model as follows:


```r
``` r
mod <- galamm(
formula = y ~ type + x:response + (0 + loading | id),
data = latent_covariates,
Expand All @@ -189,7 +189,7 @@ mod <- galamm(
A model comparison shows overwhelming evidence in favor of this model, which is not surprising since this is how the data were simulated.


```r
``` r
anova(mod, mod0)
#> Data: latent_covariates
#> Models:
Expand All @@ -205,7 +205,7 @@ anova(mod, mod0)
The summary also shows that the bias in $\lambda_{3}$ has basically disappeared, as it is up to -0.318 from -0.195, with the true value being -0.3. The interaction is estimated at 0.233, which is also very close to the true value 0.2. It should of course be noted here that the noise level in this simulated dataset was set unrealistically low, to let us confirm that the implementation itself is correct.


```r
``` r
summary(mod)
#> GALAMM fit by maximum marginal likelihood.
#> Formula: y ~ type + x:response + (0 + loading | id)
Expand Down Expand Up @@ -244,14 +244,14 @@ summary(mod)
We can also try to add interactions between the $x^{2}$ and $\eta$. We first update the formula in `factor_interactions`:


```r
``` r
factor_interactions <- list(~ 1, ~ 1, ~ x + I(x^2))
```

Then we fit the model as before:


```r
``` r
mod2 <- galamm(
formula = y ~ type + x:response + (0 + loading | id),
data = latent_covariates,
Expand All @@ -265,7 +265,7 @@ mod2 <- galamm(
As can be seen, the coefficient for this squared interaction is not significantly different from zero.


```r
``` r
summary(mod2)
#> GALAMM fit by maximum marginal likelihood.
#> Formula: y ~ type + x:response + (0 + loading | id)
Expand Down Expand Up @@ -305,7 +305,7 @@ summary(mod2)
It is also straightforward to include additional random effects in models containing interactions between latent and observed covariates. The dataset `latent_covariates_long` is similar to `latent_covariates` that was used above, but it has six repeated measurements of the response for each subject. The first ten rows of the dataset are shown below.


```r
``` r
head(latent_covariates_long, 10)
#> id type x y response
#> 1 1 measurement1 0.2655087 -0.530999307 0
Expand All @@ -323,14 +323,14 @@ head(latent_covariates_long, 10)
For these data we add a random intercept for the response terms, in addition to the terms that were used above. We start by resetting the interaction models to a linear term:


```r
``` r
factor_interactions <- list(~ 1, ~ 1, ~ x)
```

Next we fit the model using `galamm`. The difference to notice here is that we added `(0 + response | id)` to the formula. This implies that for observations that are responses, for which `response = 1`, there should be a random intercept per subject.


```r
``` r
mod <- galamm(
formula = y ~ type + x:response + (0 + loading | id) + (0 + response | id),
data = latent_covariates_long,
Expand All @@ -344,7 +344,7 @@ mod <- galamm(
From the summary, we see that also in this case the factor loadings are very well recovered.


```r
``` r
summary(mod)
#> GALAMM fit by maximum marginal likelihood.
#> Formula: y ~ type + x:response + (0 + loading | id) + (0 + response | id)
Expand Down Expand Up @@ -386,7 +386,7 @@ summary(mod)
We can also include smooth terms in models containing interactions between latent and observed variables. In this example, we replace the linear term `x:response` with a smooth term `s(x, by = response)`. Since this smooth term also includes the main effect of `response`, which corresponds to an intercept for the response observations, we must remove the `type` term and instead insert two dummy variables, one for each measurement. We first create these dummy variables:


```r
``` r
dat <- latent_covariates
dat$m1 <- as.numeric(dat$type == "measurement1")
dat$m2 <- as.numeric(dat$type == "measurement2")
Expand All @@ -395,7 +395,7 @@ dat$m2 <- as.numeric(dat$type == "measurement2")
We then fit the model:


```r
``` r
mod <- galamm(
formula = y ~ 0 + m1 + m2 + s(x, by = response) + (0 + loading | id),
data = dat,
Expand All @@ -409,7 +409,7 @@ mod <- galamm(
The summary output again suggest that the factor loadings are very well recovered.


```r
``` r
summary(mod)
#> GALAMM fit by maximum marginal likelihood.
#> Formula: y ~ 0 + m1 + m2 + s(x, by = response) + (0 + loading | id)
Expand Down Expand Up @@ -437,11 +437,11 @@ summary(mod)
#> Number of obs: 600, groups: id, 200; Xr, 8
#>
#> Fixed effects:
#> Estimate Std. Error t value Pr(>|t|)
#> m1 -0.01059 0.070477 -0.1503 8.805e-01
#> m2 -0.01276 0.091638 -0.1393 8.892e-01
#> s(x):responseFx1 -0.12412 0.008856 -14.0155 1.252e-44
#> s(x):responseFx2 -0.26284 0.015809 -16.6255 4.558e-62
#> Estimate Std. Error t value Pr(>|t|)
#> m1 -0.01059 0.070477 -0.1503 8.805e-01
#> m2 -0.01276 0.091638 -0.1393 8.892e-01
#> s(x):responseFx1 0.12412 0.008856 14.0155 1.252e-44
#> s(x):responseFx2 0.26284 0.015809 16.6255 4.558e-62
#>
#> Approximate significance of smooth terms:
#> edf Ref.df F p-value
Expand All @@ -451,7 +451,7 @@ summary(mod)
We can also plot the smooth term, which is linear. That is, in this case the smooth term was not necessary. Not that we can also see this from the zero variance estimate of the random effect named `s(x):response` in the summary above, which mean that the smoothing parameter for this term is infinite, and hence that the smooth term is exactly linear.


```r
``` r
plot_smooth(mod)
```

Expand Down
Loading

0 comments on commit 36e0e56

Please sign in to comment.