-
Notifications
You must be signed in to change notification settings - Fork 2
/
README.Rmd
174 lines (120 loc) · 9 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
---
output:
github_document:
fig_width: 6
fig_height: 4
bibliography: inst/REFERENCES.bib
link-citations: yes
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# Generalized Additive Latent and Mixed Models <a href="https://lcbc-uio.github.io/galamm/"><img src="man/figures/logo.png" align="right" height="139" alt="galamm website" /></a>
<!-- badges: start -->
[![CRAN status](https://www.r-pkg.org/badges/version/galamm)](https://cran.r-project.org/package=galamm)
[![Status at rOpenSci Software Peer Review](https://badges.ropensci.org/615_status.svg)](https://github.com/ropensci/software-review/issues/615)
[![Project Status: Active – The project has reached a stable, usable state and is being actively developed.](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active)
[![R-CMD-check](https://github.com/LCBC-UiO/galamm/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/LCBC-UiO/galamm/actions/workflows/R-CMD-check.yaml)
[![Codecov test coverage](https://codecov.io/gh/LCBC-UiO/galamm/branch/main/graph/badge.svg)](https://app.codecov.io/gh/LCBC-UiO/galamm?branch=main)
<!-- badges: end -->
```{r srr-tags1, eval = FALSE, echo = FALSE}
#' @srrstats {G1.0} Primary and secondary references to literature in the paragraph below.
#' @srrstats {G1.1} It is stated in the paragraph below that this is the first implementation of the algorithm developed in Sørensen, Fjell, and Walhovd (2023).
#' @srrstats {G1.3} Statistical terminology defined in detail in the references in the paragraph below. The main paper, Sørensen, FJell, and Walhovd (2023) is open access.
```
galamm estimates generalized additive latent and mixed models (GALAMMs). This is the first package implementing the model framework and the computational algorithms introduced in @sorensenLongitudinalModelingAgeDependent2023. It is an extension of the GLLAMM framework for multilevel latent variable modeling detailed in @rabe-heskethGeneralizedMultilevelStructural2004 and @skrondalGeneralizedLatentVariable2004, in particular by efficiently handling crossed random effects and semiparametric estimation.
## What Can the Package Do?
Many applications, particularly in the social sciences, require modeling capabilities beyond what is easily supported and computationally feasible with popular R packages like [mgcv](https://cran.r-project.org/package=mgcv) [@woodGeneralizedAdditiveModels2017], [lavaan](https://lavaan.ugent.be/) [@rosseelLavaanPackageStructural2012], [lme4](https://cran.r-project.org/package=lme4) [@batesFittingLinearMixedEffects2015], and [OpenMx](https://openmx.ssri.psu.edu/) [@nealeOpenMxExtendedStructural2016], as well as the Stata based [GLLAMM](http://www.gllamm.org/) software [@rabe-heskethGeneralizedMultilevelStructural2004;@rabe-heskethMaximumLikelihoodEstimation2005]. In particular, to maximally utilize large datasets available today, it is typically necessary to combine tools from latent variable modeling, hierarchical modeling, and semiparametric estimation. While this is possible with Bayesian hierarchical models and tools like [Stan](https://mc-stan.org/), it requires considerable expertise and may be beyond scope for a single data analysis project.
The goal of galamm is to enable estimation of models with an arbitrary number of grouping levels, both crossed and hierarchical, and any combination of the following features (click the links to go to the relevant vignette):
- [Linear mixed models with factor structures](https://lcbc-uio.github.io/galamm/articles/lmm_factor.html).
- [Generalized linear mixed models with factor structures](https://lcbc-uio.github.io/galamm/articles/glmm_factor.html).
- [Linear mixed models with heteroscedastic residuals](https://lcbc-uio.github.io/galamm/articles/lmm_heteroscedastic.html).
- [Mixed models with mixed response types](https://lcbc-uio.github.io/galamm/articles/mixed_response.html).
- [Generalized additive mixed models with factor structures](https://lcbc-uio.github.io/galamm/articles/semiparametric.html).
- [Interactions between latent and observed covariates](https://lcbc-uio.github.io/galamm/articles/latent_observed_interaction.html).
```{r srr-tags2, eval = FALSE, echo = FALSE}
#' @srrstats {G1.6} Code for comparing the performance of galamm to the of PLmixed for example models is provided in the vignette on linear mixed models with factor structures.
```
Random effects are defined using [lme4](https://cran.r-project.org/package=lme4) syntax, and the syntax for factor structures are close to that of [PLmixed](https://cran.r-project.org/package=PLmixed) [@rockwoodEstimatingComplexMeasurement2019]. However, for the types of models supported by both PLmixed and galamm, galamm is usually considerably faster. Smooth terms, as in generalized additive mixed models, use the same syntax as [mgcv](https://cran.r-project.org/package=mgcv).
For most users, it should not be necessary to think about how the actual computations are performed, although they are detailed in the [optimization vignette](https://lcbc-uio.github.io/galamm/articles/optimization.html). In short, the core computations are done using sparse matrix methods supported by [RcppEigen](https://cran.r-project.org/package=RcppEigen) [@batesFastElegantNumerical2013] and automatic differentiation using the C++ library [autodiff](https://autodiff.github.io/) [@lealAutodiffModernFast2018]. Scaling of the algorithm is investigated further in [the vignette on computational scaling](https://lcbc-uio.github.io/galamm/articles/scaling.html).
## Where Do I Start?
```{r srr-tags3, eval = FALSE, echo = FALSE}
#' @srrstats {G1.3} Statistical terminology defined in the introductory vignette.
```
To get started, take a look at the [introductory vignette](https://lcbc-uio.github.io/galamm/articles/galamm.html).
## Installation
Install the package from CRAN using
``` r
install.packages("galamm")
```
You can install the development version of galamm from [GitHub](https://github.com/) with:
``` r
# install.packages("remotes")
remotes::install_github("LCBC-UiO/galamm")
```
## Examples
```{r}
library(galamm)
```
### Mixed Response Model
The dataframe `mresp` contains simulated data with mixed response types.
```{r}
head(mresp)
```
Responses in rows with `itemgroup = "a"` are normally distributed while those in rows with `itemgroup = "b"` are binomially distributed. For a given subject, identified by the `id` variable, both responses are associated with the same underlying latent variable. We hence need to model this process jointly, and the model is set up as follows:
```{r, message=FALSE}
mixed_resp <- galamm(
formula = y ~ x + (0 + loading | id),
data = mresp,
family = c(gaussian, binomial),
family_mapping = ifelse(mresp$itemgroup == "a", 1L, 2L),
load.var = "itemgroup",
lambda = matrix(c(1, NA), ncol = 1),
factor = "loading"
)
```
The summary function gives some information about the model fit.
```{r}
summary(mixed_resp)
```
### Generalized Additive Mixed Model with Factor Structures
The dataframe `cognition` contains simulated for which latent ability in three cognitive domains is measured across time. We focus on the first cognitive domain, and estimate a smooth trajectory for how the latent ability depends on time.
We start by reducing the data.
```{r}
dat <- subset(cognition, domain == 1)
dat$item <- factor(dat$item)
```
Next we define the matrix of factor loadings, where `NA` denotes unknown values to be estimated.
```{r}
loading_matrix <- matrix(c(1, NA, NA), ncol = 1)
```
We then compute the model estimates, containing both a smooth term for the latent ability and random intercept for subject and timepoints.
```{r}
mod <- galamm(
formula = y ~ 0 + item + sl(x, factor = "loading") +
(0 + loading | id / timepoint),
data = dat,
load.var = "item",
lambda = loading_matrix,
factor = "loading"
)
```
We finally plot the estimated smooth term.
```{r}
plot_smooth(mod)
```
## How to cite this package
```{r}
citation("galamm")
```
## Acknowledgement
Some parts of the code base for galamm has been derived from internal functions of the R packages, [gamm4](https://cran.r-project.org/package=gamm4) (authors: Simon Wood and Fabian Scheipl), [lme4](https://cran.r-project.org/package=lme4) (authors: Douglas Bates, Martin Maechler, Ben Bolker, and Steven Walker), and [mgcv](https://cran.r-project.org/package=mgcv) (author: Simon Wood), as well the C++ library [autodiff](https://autodiff.github.io/) (author: Allan Leal). In accordance with the [CRAN Repository Policy](https://cran.r-project.org/web/packages/policies.html), all these authors are listed as contributors in the `DESCRIPTION` file. If you are among these authors, and don't want to be listed as a contributor to this package, please let me know, and I will remove you.
## Contributing
Contributions are very welcome, see [CONTRIBUTING.md](https://github.com/LCBC-UiO/galamm/blob/main/.github/CONTRIBUTING.md) for general guidelines.
## References