Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add conditional_bernoulli distribution #27

Open
goldingn opened this issue Nov 2, 2017 · 5 comments
Open

add conditional_bernoulli distribution #27

goldingn opened this issue Nov 2, 2017 · 5 comments

Comments

@goldingn
Copy link
Member

goldingn commented Nov 2, 2017

Add a distribution node type for this compound distribution, where the observed variable y can only take value 1 if a latent bernoulli variable z takes value 1, else it must take 0, i.e.:

y_i ~ bernoulli(z * p_i)
z ~ bernoulli(\psi)

p_i = p(y_i = 1 | z = 1)
\psi = p(z = 1)

Inference on p and psi is tractable when there are multiple bernoulli trials in each observation (where p take different values in each trial). The density of this compound distribution (i.e. the likelihood for y) can be calculated directly, explicitly integrating over the latent variable z.

This formulation underpins the ecological imperfect-detection model of MacKenzie et al. which gives the formulation of the likelihood/density.

For each observation in the imperfect detection model, y and p are vectors giving indicating whether a species was detected at each visit, and the probability of detection (which may vary between visits), and z and psi are scalars indicating whether the species was present (assumed the same at all visits) and the probability of being present.

The syntax would be:

distribution(y) <- conditional_bernoulli(p, psi)

(since the distribution is discrete, y can't be a variable), where y and p are vectors (or matrices) and psi is a scalar (or vector).

@goldingn
Copy link
Member Author

goldingn commented Nov 2, 2017

there are extensions to this model, for example introducing a parameter q_i = p(y_i = 1 | z_i = 0), corresponding to erroneous detections in the imperfect detection case. These could be added via additional, optional parameters

@jdyen
Copy link

jdyen commented Oct 5, 2018

could this be setup to take more flexible inputs, e.g., for a model with constant p or constant psi across multiple individuals?
z_i = bernoulli(\psi) or z_i = bernoulli(\psi_i)
with
y_ij = bernoulli(\psi * p) or y_ij = bernoulli(\psi * p_i) or y_ij = bernoulli(\psi * p_j)

So \psi could have dims (1 x 1) or (n x 1), and p could have dims (1 x 1), (n x 1), (1 x k), or (n x k), assuming y has dims (n x k).

My current way of handling this is:

# with a common psi for all individuals
psi <- beta(1, 1, dim = 1)
psi_expanded <- do.call('c', lapply(1:n, function(x) psi))

and

# with a single p for each individual
p <- beta(1, 1, dim = c(n, 1))
p_expanded <- do.call('cbind', lapply(1:k, function(x) p))

@hrlai
Copy link
Contributor

hrlai commented Mar 28, 2021

Dear @goldingn and @jdyen , thank you pioneering these models in greta. I am exploring how to incorporate imperfect detection into abundance joint species distribution modelling after reading Tobler et al. (2019), which then led me to Yamaura et al. (2012).

However, I ran into an error while trying to implement Yamaura et al. (2012)'s model in greta:

Error: model contains a discrete random variable that doesn't have a fixed value, so cannot be sampled from

I think it is related to this thread, because I cannot use poisson for unobserved variable? This is also hinted in the helpfile. I hope to get some thoughts on whether I'm coding it incorrectly or is it something not yet implemented in greta?

Below is two code chunks, one to simulate abundance data and another to fit model (apologies if they look ugly!) :

Simulate data

# Simulate data to test fit_model.R script

# Following the data generative process in 
# Yamaura et al. (2012) Biodiversity and Conservation

set.seed(123)

# number of species
n_spp  <- 10   
# number of sites  
n_site <- 100 
# number of sampling events
t <- 10

# probability of detection for each species
# assuming the same p_j across sites for now
alpha0_true <- rnorm(n_spp, 2, 1)
mu_true     <- alpha0_true
sigma_true  <- rexp(n_spp, 2)
p_true      <- numeric(n_spp)
for (j in seq_len(n_spp)) {
    logit_p_true <- rnorm(1, mu_true, sigma_true)
    p_true[j] <- exp(logit_p_true) / (exp(logit_p_true) + 1)
}

# true abundance
# using Poisson for now (explore neg. binom. later)
beta0_true  <- rnorm(n_spp, 2, 1)
lambda_true <- exp(rep(1, n_site) %*% t(beta0_true))
N_true <- matrix(rpois(n_site * n_spp, lambda_true), n_site, n_spp)
    
# observed 
n_true <- array(NA, dim = c(n_site, n_spp, t))
for (i in seq_len(n_site)) {
    for (j in seq_len(n_spp)) {
        n_true[i, j, ] <- rbinom(t, N_true[i, j], p_true[j])
    }
}

Fit model

library(greta)

# source("code/sim_data.R")

# data
n <- as_data(n_true)

# variables
X <- ones(n_site)
V <- ones(n_site)

# priors
sd_beta0 <- exponential(2)
beta0    <- normal(0, sd_beta0, dim = n_spp)
lambda   <- exp(X %*% t(beta0))
N        <- poisson(lambda)    # I think this doesn't work as per helpfile
N <- do.call(abind, c(replicate(t, N, simplify = FALSE), along = 3))

sd_alpha0   <- exponential(2)
alpha0      <- normal(0, sd_alpha0, dim = n_spp)
mu          <- V %*% t(alpha0)
sigma       <- exponential(2, dim = n_spp)
sigma <- do.call(rbind, replicate(n_site, t(sigma), simplify = FALSE))
# p           <- ilogit(multivariate_normal(mu, sigma))
p           <- ilogit(normal(mu, sigma))
p <- do.call(abind, c(replicate(t, p, simplify = FALSE), along = 3))

# observational model
distribution(n) <- binomial(N, p)
mod <- model(N, p)

@jdyen
Copy link

jdyen commented Mar 28, 2021

I don't think this has been added to greta. A post on the greta forum might get some good answers (maybe do a quick search first, someone might have already posted a workaround).

@hrlai
Copy link
Contributor

hrlai commented Mar 29, 2021

Thanks for the heads up. Sure I'll move over to the forum and apologies if here isn't the best place to ask :)

@njtierney njtierney transferred this issue from greta-dev/greta.multivariate Dec 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants