Tips for converging variance estimates in hierarchical models #413

jainraj · 2024-05-07T07:50:49Z

jainraj
May 7, 2024

Hello everyone. I am trying to fit a hierarchical DDM as proposed by Weicki et al. 2013 to my data where (just FYI)

Stimulus is coded as -1, 0 and 1 (three levels)
data has ~25 participants with about 100 trials per participant.

I am sampling 6 chains with 1250 samples for tuning and 1250 drawn samples. All my parameters except the following have r_hat < 1.01 and ESS > 1000:

                           ess_bulk   ess_tail    r_hat

v_1|participant_id_sigma  348.51330  505.54732  1.01247
a_1|participant_id_sigma   66.01460  102.07171  1.08330
t_1|participant_id_sigma    6.32525   10.51081  4.70159

I suspect that running the chain longer may help with v, but unsure about a and t.

What I am looking for: Any tips / practical ideas that worked for you to converge these variance parameters, e.g., some specific combination of priors or initial values or using a different loglikelihood estimation method or some best practices suggested in the literature.

My code for reference:

parameters = [
    {
        "name": "v",
        "formula": "v ~ 0 + (1|participant_id) + (stimulus|participant_id)",
        "link": "identity",
        "prior": {
            "1|participant_id": {
                "name": "Normal",
                "mu": {
                    "name": "Normal",
                    "mu": 2,
                    "sigma": 3,
                    "initval": 2,
                },
                "sigma": {
                    "name": "HalfNormal",
                    "sigma": 2,
                    "initval": 0.1,
                },
            },
            "stimulus|participant_id": {
                "name": "Normal",
                "mu": {
                    "name": "Normal",
                    "mu": 0,
                    "sigma": 15,
                    "initval": 0,
                },
                "sigma": {
                    "name": "Uniform",
                    "lower": 1e-10,
                    "upper": 100,
                    "initval": 0.1,
                },
            },
        },
        "bounds": (-numpy.inf, numpy.inf),
    },
    {
        "name": "z",
        "formula": "z ~ 0 + (1|participant_id)",
        "link": "gen_logit",
        "prior": {
            "1|participant_id": {
                "name": "Normal",
                "mu": {
                    "name": "Normal",
                    "mu": 0.5,
                    "sigma": 0.5,
                    "initval": 0.5,
                },
                "sigma": {
                    "name": "HalfNormal",
                    "sigma": 0.05,
                    "initval": 0.1,
                },
            },
        },
        "bounds": (0.0, 1.0),
    },
    {
        "name": "a",
        "formula": "a ~ 0 + (1|participant_id)",
        "link": "identity",
        "prior": {
            "1|participant_id": {
                "name": "Gamma",
                "mu": {
                    "name": "Gamma",
                    "mu": 1.5,
                    "sigma": 0.75,
                    "initval": 1,
                },
                "sigma": {
                    "name": "HalfNormal",
                    "sigma": 0.1,
                    "initval": 0.1,
                },
            },
        },
        "bounds": (0, numpy.inf),
    },
    {
        "name": "t",
        "formula": "t ~ 0 + (1|participant_id)",
        "link": "identity",
        "prior": {
            "1|participant_id": {
                "name": "Gamma",  # deviating from Weicki's paper because HDDM library implements Gamma
                "mu": {
                    "name": "Gamma",
                    "mu": 0.4,
                    "sigma": 0.2,
                    "initval": 0.001,  # small value here can help in convergence
                },
                "sigma": {
                    "name": "HalfNormal",
                    "sigma": 1,
                    "initval": 0.2,
                },
            },
        },
        "bounds": (0.0, numpy.inf),
    },
    {
        "name": "sv",
        "prior": {
            "name": "HalfNormal",
            "sigma": 2.0,
            "initval": 1.0,
        },
        "bounds": (0, numpy.inf),
    },
]
ddm_model = hssm.HSSM(
    data=data,
    model='ddm_sdv',
    loglik_kind='blackbox',
    p_outlier=0.05,
    include=parameters,
)

frankmj · 2024-05-07T10:09:58Z

frankmj
May 7, 2024
Maintainer

Hi @jainraj <https://github.com/jainraj>, thanks for digging in to HSSM and helping us improve it! As Alex mentioned, the t parameter (particularly when using gradient based samplers, and especially in hierarchical settings) can give us some problems and we have a couple of avenues in the works to address that. But one thing you could try here is to simply reduce the bounds for sigma on t. Really it shouldn't vary across individuals by more than 1 or so. Your gamma prior on the mean of t seems fine (and in line with hddm), but the sigma prior has halfnormal(1) with bounds up to infinite. I would suggest simply trying to reduce the halfnormal for t sigma to be halfnormal(0.3) like hddm and then with bounds up to 1 instead of infinite. Please let us know how that goes! Michael

…

On Tue, May 7, 2024 at 3:51 AM Raj V Jain ***@***.***> wrote: Hello everyone. I am trying to fit a hierarchical DDM as proposed by Weicki et al. 2013 <https://www.frontiersin.org/articles/10.3389/fninf.2013.00014/full> to my data where (just FYI) 1. Stimulus is coded as -1, 0 and 1 (three levels) 2. data has ~25 participants with about 100 trials per participant. I am sampling 6 chains with 1250 samples for tuning and 1250 drawn samples. All my parameters *except the following* have r_hat < 1.01 and ESS > 1000: ess_bulk ess_tail r_hat v_1|participant_id_sigma 348.51330 505.54732 1.01247 a_1|participant_id_sigma 66.01460 102.07171 1.08330 t_1|participant_id_sigma 6.32525 10.51081 4.70159 I suspect that running the chain longer may help with v, but unsure about a and t. What I am looking for: Any tips / practical ideas that worked for you to converge these variance parameters, e.g., some specific combination of priors or initial values or using a different loglikelihood estimation method or some best practices suggested in the literature. My code for reference: parameters = [ { "name": "v", "formula": "v ~ 0 + (1|participant_id) + (stimulus|participant_id)", "link": "identity", "prior": { "1|participant_id": { "name": "Normal", "mu": { "name": "Normal", "mu": 2, "sigma": 3, "initval": 2, }, "sigma": { "name": "HalfNormal", "sigma": 2, "initval": 0.1, }, }, "stimulus|participant_id": { "name": "Normal", "mu": { "name": "Normal", "mu": 0, "sigma": 15, "initval": 0, }, "sigma": { "name": "Uniform", "lower": 1e-10, "upper": 100, "initval": 0.1, }, }, }, "bounds": (-numpy.inf, numpy.inf), }, { "name": "z", "formula": "z ~ 0 + (1|participant_id)", "link": "gen_logit", "prior": { "1|participant_id": { "name": "Normal", "mu": { "name": "Normal", "mu": 0.5, "sigma": 0.5, "initval": 0.5, }, "sigma": { "name": "HalfNormal", "sigma": 0.05, "initval": 0.1, }, }, }, "bounds": (0.0, 1.0), }, { "name": "a", "formula": "a ~ 0 + (1|participant_id)", "link": "identity", "prior": { "1|participant_id": { "name": "Gamma", "mu": { "name": "Gamma", "mu": 1.5, "sigma": 0.75, "initval": 1, }, "sigma": { "name": "HalfNormal", "sigma": 0.1, "initval": 0.1, }, }, }, "bounds": (0, numpy.inf), }, { "name": "t", "formula": "t ~ 0 + (1|participant_id)", "link": "identity", "prior": { "1|participant_id": { "name": "Gamma", # deviating from Weicki's paper because HDDM library implements Gamma "mu": { "name": "Gamma", "mu": 0.4, "sigma": 0.2, "initval": 0.001, # small value here can help in convergence }, "sigma": { "name": "HalfNormal", "sigma": 1, "initval": 0.2, }, }, }, "bounds": (0.0, numpy.inf), }, { "name": "sv", "prior": { "name": "HalfNormal", "sigma": 2.0, "initval": 1.0, }, "bounds": (0, numpy.inf), }, ] ddm_model = hssm.HSSM( data=data, model='ddm_sdv', loglik_kind='blackbox', p_outlier=0.05, include=parameters, ) — Reply to this email directly, view it on GitHub <#413>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAG7TFH3WA3WRJAGBZVOOE3ZBCBW3AVCNFSM6AAAAABHKOEC42VHI2DSMVQWIX3LMV43ERDJONRXK43TNFXW4OZWGYZDCOBSG4> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

2 replies

jainraj May 8, 2024
Author

Thank you, Professor, for your suggestions. Here are my findings:

Changing the bounds has absolutely no effect - Results match up to the 5th decimal point - I think this relates to Model ignores specified bounds and priors #263
Changing the sigma did improve the r_hat a bit, but still >3.2

One caveat in my dataset is that some participants had a maximum response window of 1 second, and some had 1.5 seconds. This could factor into the convergence issue for sigma for t. Do you have any suggestions for this? My first hunch is to scale the RTs by 1.5 seconds and rerun these analyses.

jainraj May 8, 2024
Author

Update: even the scaling didn't help

frankmj · 2024-05-08T12:35:55Z

frankmj
May 8, 2024
Maintainer

I was going to say- scaling is not the best approach to deal with that issue - if you have different response deadlines then that changes the decision process (and it won't be linear). The usual approach to that would be to consider a collapsing bound model (e.g. the angle LAN model, or the weibull, see the eLife or JoCN papers on this) because the participant knows that they need to response before a certain time, and where the angle of the collapse may differ for different deadlines. So you could either fit those participants in two separate models or consider a collapsing bound model to see if only allowing the angle theta (and/or the initial height a) varies by deadline. I suspect that if you fit this model you would find that it fits better than the standard DDM (in both model selection metrics like WAIC but more importantly in posterior predictive checks such as quantile probability plots). Note though that if you have a reasonable number of trials in which subjects omit responses altogether then even fitting the angle model will be biased if you just throw out those trials - we have a method we developed to address this which allows one to consider the likelihood of producing an omission and then using the omitted trials as part of the data to be fit. (This is a cogsci paper but will be ported to HSSM soon). For now though if the number of omissions is very small you could still try fitting the angle LAN. This might also interact with t because of collinearities, especially in the regular DDM which will try however it can to force a fit to RT distributions via a combination of t,a,z,v. And this would be worse if the subjects have different deadlines. That said, ultimately one should still be able to get the model to converge even if the fit is not great. So it would also be useful to know for diagnostic purposes if simply fitting subjects with the different deadlines in two separate models helps with convergence, again using a prior on standard deviation for t that is not too wide. As mentioned in other threads, one of the issues here is that we are using a gradient sampler which is more efficient but the gradient for t is not well behaved due to sharp decline in likelihood as soon as any individual t proposed is > minRT for that participant. HDDM used a slice sampler which is not gradient based. One could also try that here in HSSM which also has a slice sampler, but for now it is slow because we have not optimized the slice sampler parameters as we had done in HDDM (this is a TODO for cases in which gradient samplers might fail). We also have other potential solutions in the works for the gradient samplers.

…

On Wed, May 8, 2024 at 8:07 AM Raj V Jain ***@***.***> wrote: Update: even the scaling didn't help — Reply to this email directly, view it on GitHub <#413 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAG7TFC3RQ7QLDQ46N5PE3LZBIIP5AVCNFSM6AAAAABHKOEC42VHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM4TGNJTHA4DQ> . You are receiving this because you commented.Message ID: ***@***.***>

1 reply

jainraj May 8, 2024
Author

Thank you, Professor, for these suggestions. I can try the angle LAN model as we have few no-response trials. Some follow-up questions:

In your opinion, would it be better to fit theta per participant or per cohort (cohort = group of participants having the same response window)?
Do you suggest trying to fit all the parameters, i.e., theta, t, and a simultaneously or in a staggered fashion?
to clarify: The current implementation of angle LAN in HSSM doesn't support omissions; in future versions, it could. Right?

frankmj · 2024-05-08T13:12:56Z

frankmj
May 8, 2024
Maintainer

I would start for purposes of understanding and convergence simply trying a hierarchical model per each cohort, where all parameters within that cohort are estimated hierarchically (ie theta is fit per participant from a group distribution). Then you can try a version where you fit a single hierarchical model but allow cohort to be a between subjects factor that modifies theta and/or a. And correct, we have not yet implemented the omissions into HSSM, but this is on the TODO list for one of the next releases. M

…

On Wed, May 8, 2024 at 9:03 AM Raj V Jain ***@***.***> wrote: Thank you, Professor, for these suggestions. I can try the angle LAN model as we have few no-response trials. Some follow-up questions: 1. In your opinion, would it be better to fit theta per *participant* or per *cohort* (cohort = group of participants having the same response window)? 2. Do you suggest trying to fit all the parameters, i.e., theta, t, and a simultaneously or in a staggered fashion? 3. to clarify: The current implementation of angle LAN in HSSM doesn't support omissions; in future versions, it could. Right? — Reply to this email directly, view it on GitHub <#413 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAG7TFGZ5SYD3Y5B26FXKCTZBIPDTAVCNFSM6AAAAABHKOEC42VHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM4TGNJUGQ2DG> . You are receiving this because you commented.Message ID: ***@***.***>

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tips for converging variance estimates in hierarchical models #413

{{title}}

Replies: 3 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Tips for converging variance estimates in hierarchical models #413

jainraj May 7, 2024

Replies: 3 comments · 3 replies

frankmj May 7, 2024 Maintainer

jainraj May 8, 2024 Author

jainraj May 8, 2024 Author

frankmj May 8, 2024 Maintainer

jainraj May 8, 2024 Author

frankmj May 8, 2024 Maintainer

jainraj
May 7, 2024

Replies: 3 comments 3 replies

frankmj
May 7, 2024
Maintainer

jainraj May 8, 2024
Author

jainraj May 8, 2024
Author

frankmj
May 8, 2024
Maintainer

jainraj May 8, 2024
Author

frankmj
May 8, 2024
Maintainer