reran vignettes

LCBC-UiO · Sep 16, 2024 · 36e0e56 · 36e0e56
1 parent e8143a8
commit 36e0e56
Show file tree

Hide file tree

Showing 17 changed files with 256 additions and 250 deletions.
diff --git a/vignettes/glmm_factor.Rmd b/vignettes/glmm_factor.Rmd
@@ -15,7 +15,7 @@ vignette: >
 
 
 
-```r
+``` r
 library(galamm)
 ```
 
@@ -28,7 +28,7 @@ As for the vignette on linear mixed models, we thank @rockwoodEstimatingComplexM
 We use a simulated dataset from the PLmixed package for this example. The observations are binomial responses representing an ability measurement. The first few lines are shown below.
 
 
-```r
+``` r
 library(PLmixed)
 head(IRTsim)
 #>     sid school item y
@@ -63,14 +63,14 @@ $$
 where $N(\mathbf{0}, \boldsymbol{\Psi})$ denotes a bivariate normal distribution with mean zero covariance matrix $\boldsymbol{\Psi}$. The covariance matrix is assumed to be diagonal.
 
 
-```r
+``` r
 IRTsim$item <- factor(IRTsim$item)
 ```
 
 We confirm that `item` has five levels. This means that $\boldsymbol{\lambda}$ is a vector with five elements.
 
 
-```r
+``` r
 table(IRTsim$item)
 #> 
 #>   1   2   3   4   5 
@@ -80,7 +80,7 @@ table(IRTsim$item)
 For identifiability, we fix the first element of $\boldsymbol{\lambda}$ to one. The rest will be freely estimated. We do this by defining the following matrix. Any numeric value implies that the element is fixed, whereas `NA` implies that the element is unknown, and will be estimated.
 
 
-```r
+``` r
 (loading_matrix <- matrix(c(1, NA, NA, NA, NA), ncol = 1))
 #>      [,1]
 #> [1,]    1
@@ -93,7 +93,7 @@ For identifiability, we fix the first element of $\boldsymbol{\lambda}$ to one.
 We fit the model as follows:
 
 
-```r
+``` r
 mod <- galamm(
   formula = y ~ item + (0 + ability | school / sid),
   data = IRTsim,
@@ -109,7 +109,7 @@ A couple of things in the model formula are worth pointing out. First the part `
 We can start by inspecting the fitted model:
 
 
-```r
+``` r
 summary(mod)
 #> GALAMM fit by maximum marginal likelihood.
 #> Formula: y ~ item + (0 + ability | school/sid)
@@ -148,7 +148,7 @@ summary(mod)
 The `fixef` method lets us consider the fixed effects:
 
 
-```r
+``` r
 fixef(mod)
 #> (Intercept)       item2       item3       item4       item5 
 #>   0.5112006   0.3255941  -0.4491070   0.4929827   0.4584897
@@ -157,7 +157,7 @@ fixef(mod)
 We can also get Wald type confidence intervals for the fixed effects.
 
 
-```r
+``` r
 confint(mod, parm = "beta")
 #>                    2.5 %     97.5 %
 #> (Intercept) -0.001727334  1.0241286
@@ -170,7 +170,7 @@ confint(mod, parm = "beta")
 We can similarly extract the factor loadings.
 
 
-```r
+``` r
 factor_loadings(mod)
 #>           ability        SE
 #> lambda1 1.0000000        NA
@@ -183,7 +183,7 @@ factor_loadings(mod)
 And we can find confidence intervals for them. Currently, only Wald type confidence intervals are available. Be aware that such intervals may have poor coverage properties.
 
 
-```r
+``` r
 confint(mod, parm = "lambda")
 #>             2.5 %    97.5 %
 #> lambda1 0.4516974 1.0223534
@@ -195,7 +195,7 @@ confint(mod, parm = "lambda")
 We can also show a diagnostic plot, although for a binomial model like this it is not very informative.
 
 
-```r
+``` r
 plot(mod)
 ```
 
@@ -207,7 +207,7 @@ plot(mod)
 We now show how the model studied above can be extended to handle binomially distributed data with multiple trials. We simulate such data by computing predictions from the model fitted above, and drawing binomial samples with multiple trials.
 
 
-```r
+``` r
 set.seed(1234)
 dat <- IRTsim
 dat$trials <- sample(1:10, nrow(dat), replace = TRUE)
@@ -228,7 +228,7 @@ head(dat)
 For binomial models with more than one trial, the response should be specified `cbind(successes, failures)`.
 
 
-```r
+``` r
 galamm_mod_trials <- galamm(
   formula = cbind(y, trials - y) ~ item + (0 + ability | school / sid),
   data = dat,
@@ -242,7 +242,7 @@ galamm_mod_trials <- galamm(
 All the utility functions apply to this model as well. We simply post its summary output here.
 
 
-```r
+``` r
 summary(galamm_mod_trials)
 #> GALAMM fit by maximum marginal likelihood.
 #> Formula: cbind(y, trials - y) ~ item + (0 + ability | school/sid)
@@ -284,7 +284,7 @@ summary(galamm_mod_trials)
 To illustrate the model for counts, we consider an example from Chapter 11.3 in @skrondalGeneralizedLatentVariable2004. The model does not contain factor loadings, but we use it to demonstrate how to fit GLMMs with Poisson distributed responses.
 
 
-```r
+``` r
 count_mod <- galamm(
   formula = y ~ lbas * treat + lage + v4 + (1 | subj),
   data = epilep,
@@ -295,7 +295,7 @@ count_mod <- galamm(
 We can look at the summary output.
 
 
-```r
+``` r
 summary(count_mod)
 #> GALAMM fit by maximum marginal likelihood.
 #> Formula: y ~ lbas * treat + lage + v4 + (1 | subj)
@@ -327,14 +327,14 @@ summary(count_mod)
 In this case there are no factor loadings to return:
 
 
-```r
+``` r
 factor_loadings(count_mod)
 ```
 
 We can again look at a diagnostic plot, which in this case looks much more reasonable.
 
 
-```r
+``` r
 plot(count_mod)
 ```
 
@@ -344,14 +344,8 @@ plot(count_mod)
 In this case we can confirm that the `galamm` function is correctly implemented by comparing it to the output of `lme4::glmer`. For a model like this, it would also be best to use `lme4`, but with factor structures or other nonlinearities, `lme4` no longer provides the flexibility we need.
 
 
-```r
+``` r
 library(lme4)
-#> Loading required package: Matrix
-#> 
-#> Attaching package: 'lme4'
-#> The following object is masked from 'package:galamm':
-#> 
-#>     llikAIC
 count_mod_lme4 <- glmer(
   formula = y ~ lbas * treat + lage + v4 + (1 | subj),
   data = epilep,
@@ -362,7 +356,7 @@ count_mod_lme4 <- glmer(
 We can confirm that the diagnostic plot has the same values as for galamm:
 
 
-```r
+``` r
 plot(count_mod_lme4)
 ```
 
@@ -371,7 +365,7 @@ plot(count_mod_lme4)
 And we can do the same for the summary output.
 
 
-```r
+``` r
 summary(count_mod_lme4)
 #> Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
 #>  Family: poisson  ( log )
@@ -414,7 +408,7 @@ summary(count_mod_lme4)
 You might note that the deviance in the summary output of the model fitted by lme4 is different from the deviance of the model fitted by galamm. This is because in the summary output, lme4 shows the deviance as minus two times the log likelihood. In contrast, calling the deviance function on a model object fitted by glmer gives the same output as galamm.
 
 
-```r
+``` r
 deviance(count_mod_lme4)
 #> [1] 407.0092
 ```

diff --git a/vignettes/latent_observed_interaction.Rmd b/vignettes/latent_observed_interaction.Rmd
@@ -15,7 +15,7 @@ vignette: >
 
 
 
-```r
+``` r
 library(galamm)
 ```
 
@@ -26,7 +26,7 @@ This vignette describes how `galamm` can be used to model interactions between l
 For this example we use the simulated `latent_covariates` dataset, of which the first six rows are displayed below:
 
 
-```r
+``` r
 head(latent_covariates)
 #>   id         type         x            y response
 #> 1  1 measurement1 0.2655087 -0.530999307        0
@@ -79,7 +79,7 @@ The structural model is simply $\eta = \zeta \sim N(0, \psi)$, where $\psi$ is i
 It can be instructive to start by considering a model in which we fix $\\lambda_{4} = 0$. This type of model would be estimated with the following code:
 
 
-```r
+``` r
 lambda <- matrix(c(1, NA, NA), ncol = 1)
 
 mod0 <- galamm(
@@ -94,7 +94,7 @@ mod0 <- galamm(
 In the data generating simulations, the true values were $\lambda_{1}=1$, $\lambda_{2} = 1.3$ and $\lambda_{3} = -0.3$. The former two are very well recovered, but the latter is too positive, which is likely due to us omitting the interaction $\lambda_{4}$, whose true value was 0.2.
 
 
-```r
+``` r
 summary(mod0)
 #> GALAMM fit by maximum marginal likelihood.
 #> Formula: y ~ type + x:response + (0 + loading | id)
@@ -168,14 +168,14 @@ $$
 We specify the factor interactions with a list, one for each row of `lambda`:
 
 
-```r
+``` r
 factor_interactions <- list(~ 1, ~ 1, ~ x)
 ```
 
 This specifies that for the first two rows, there are no covariates, but for the third row, we want a linear regression with $x$ as covariate. Next, we specify the loading matrix **without** the interaction parameter, i.e., we reuse the `lambda` object that was specified for `mod0` above. This lets us fit the model as follows:
 
 
-```r
+``` r
 mod <- galamm(
   formula = y ~ type + x:response + (0 + loading | id),
   data = latent_covariates,
@@ -189,7 +189,7 @@ mod <- galamm(
 A model comparison shows overwhelming evidence in favor of this model, which is not surprising since this is how the data were simulated.
 
 
-```r
+``` r
 anova(mod, mod0)
 #> Data: latent_covariates
 #> Models:
@@ -205,7 +205,7 @@ anova(mod, mod0)
 The summary also shows that the bias in $\lambda_{3}$ has basically disappeared, as it is up to -0.318 from -0.195, with the true value being -0.3. The interaction is estimated at 0.233, which is also very close to the true value 0.2. It should of course be noted here that the noise level in this simulated dataset was set unrealistically low, to let us confirm that the implementation itself is correct.
 
 
-```r
+``` r
 summary(mod)
 #> GALAMM fit by maximum marginal likelihood.
 #> Formula: y ~ type + x:response + (0 + loading | id)
@@ -244,14 +244,14 @@ summary(mod)
 We can also try to add interactions between the $x^{2}$ and $\eta$. We first update the formula in `factor_interactions`:
 
 
-```r
+``` r
 factor_interactions <- list(~ 1, ~ 1, ~ x + I(x^2))
 ```
 
 Then we fit the model as before:
 
 
-```r
+``` r
 mod2 <- galamm(
   formula = y ~ type + x:response + (0 + loading | id),
   data = latent_covariates,
@@ -265,7 +265,7 @@ mod2 <- galamm(
 As can be seen, the coefficient for this squared interaction is not significantly different from zero.
 
 
-```r
+``` r
 summary(mod2)
 #> GALAMM fit by maximum marginal likelihood.
 #> Formula: y ~ type + x:response + (0 + loading | id)
@@ -305,7 +305,7 @@ summary(mod2)
 It is also straightforward to include additional random effects in models containing interactions between latent and observed covariates. The dataset `latent_covariates_long` is similar to `latent_covariates` that was used above, but it has six repeated measurements of the response for each subject. The first ten rows of the dataset are shown below.
 
 
-```r
+``` r
 head(latent_covariates_long, 10)
 #>    id         type         x            y response
 #> 1   1 measurement1 0.2655087 -0.530999307        0
@@ -323,14 +323,14 @@ head(latent_covariates_long, 10)
 For these data we add a random intercept for the response terms, in addition to the terms that were used above. We start by resetting the interaction models to a linear term:
 
 
-```r
+``` r
 factor_interactions <- list(~ 1, ~ 1, ~ x)
 ```
 
 Next we fit the model using `galamm`. The difference to notice here is that we added `(0 + response | id)` to the formula. This implies that for observations that are responses, for which `response = 1`, there should be a random intercept per subject.
 
 
-```r
+``` r
 mod <- galamm(
   formula = y ~ type + x:response + (0 + loading | id) + (0 + response | id),
   data = latent_covariates_long,
@@ -344,7 +344,7 @@ mod <- galamm(
 From the summary, we see that also in this case the factor loadings are very well recovered.
 
 
-```r
+``` r
 summary(mod)
 #> GALAMM fit by maximum marginal likelihood.
 #> Formula: y ~ type + x:response + (0 + loading | id) + (0 + response |      id)
@@ -386,7 +386,7 @@ summary(mod)
 We can also include smooth terms in models containing interactions between latent and observed variables. In this example, we replace the linear term `x:response` with a smooth term `s(x, by = response)`. Since this smooth term also includes the main effect of `response`, which corresponds to an intercept for the response observations, we must remove the `type` term and instead insert two dummy variables, one for each measurement. We first create these dummy variables:
 
 
-```r
+``` r
 dat <- latent_covariates
 dat$m1 <- as.numeric(dat$type == "measurement1")
 dat$m2 <- as.numeric(dat$type == "measurement2")
@@ -395,7 +395,7 @@ dat$m2 <- as.numeric(dat$type == "measurement2")
 We then fit the model:
 
 
-```r
+``` r
 mod <- galamm(
   formula = y ~ 0 + m1 + m2 + s(x, by = response) + (0 + loading | id),
   data = dat,
@@ -409,7 +409,7 @@ mod <- galamm(
 The summary output again suggest that the factor loadings are very well recovered.
 
 
-```r
+``` r
 summary(mod)
 #> GALAMM fit by maximum marginal likelihood.
 #> Formula: y ~ 0 + m1 + m2 + s(x, by = response) + (0 + loading | id)
@@ -437,11 +437,11 @@ summary(mod)
 #> Number of obs: 600, groups:  id, 200; Xr, 8
 #> 
 #> Fixed effects:
-#>                  Estimate Std. Error  t value  Pr(>|t|)
-#> m1               -0.01059   0.070477  -0.1503 8.805e-01
-#> m2               -0.01276   0.091638  -0.1393 8.892e-01
-#> s(x):responseFx1 -0.12412   0.008856 -14.0155 1.252e-44
-#> s(x):responseFx2 -0.26284   0.015809 -16.6255 4.558e-62
+#>                  Estimate Std. Error t value  Pr(>|t|)
+#> m1               -0.01059   0.070477 -0.1503 8.805e-01
+#> m2               -0.01276   0.091638 -0.1393 8.892e-01
+#> s(x):responseFx1  0.12412   0.008856 14.0155 1.252e-44
+#> s(x):responseFx2  0.26284   0.015809 16.6255 4.558e-62
 #> 
 #> Approximate significance of smooth terms:
 #>               edf Ref.df     F p-value
@@ -451,7 +451,7 @@ summary(mod)
 We can also plot the smooth term, which is linear. That is, in this case the smooth term was not necessary. Not that we can also see this from the zero variance estimate of the random effect named `s(x):response` in the summary above, which mean that the smoothing parameter for this term is infinite, and hence that the smooth term is exactly linear.
 
 
-```r
+``` r
 plot_smooth(mod)
 ```