Replies: 3 comments 3 replies
-
Hi @jainraj <https://github.com/jainraj>,
thanks for digging in to HSSM and helping us improve it!
As Alex mentioned, the t parameter (particularly when using gradient based
samplers, and especially in hierarchical settings) can give us some
problems and we have a couple of avenues in the works to address that.
But one thing you could try here is to simply reduce the bounds for sigma
on t. Really it shouldn't vary across individuals by more than 1 or so.
Your gamma prior on the mean of t seems fine (and in line with hddm), but
the sigma prior has halfnormal(1) with bounds up to infinite. I would
suggest simply trying to reduce the halfnormal for t sigma to be
halfnormal(0.3) like hddm and then with bounds up to 1 instead of infinite.
Please let us know how that goes!
Michael
…On Tue, May 7, 2024 at 3:51 AM Raj V Jain ***@***.***> wrote:
Hello everyone. I am trying to fit a hierarchical DDM as proposed by Weicki
et al. 2013
<https://www.frontiersin.org/articles/10.3389/fninf.2013.00014/full> to
my data where (just FYI)
1. Stimulus is coded as -1, 0 and 1 (three levels)
2. data has ~25 participants with about 100 trials per participant.
I am sampling 6 chains with 1250 samples for tuning and 1250 drawn
samples. All my parameters *except the following* have r_hat < 1.01 and
ESS > 1000:
ess_bulk ess_tail r_hat
v_1|participant_id_sigma 348.51330 505.54732 1.01247
a_1|participant_id_sigma 66.01460 102.07171 1.08330
t_1|participant_id_sigma 6.32525 10.51081 4.70159
I suspect that running the chain longer may help with v, but unsure about
a and t.
What I am looking for: Any tips / practical ideas that worked for you to
converge these variance parameters, e.g., some specific combination of
priors or initial values or using a different loglikelihood estimation
method or some best practices suggested in the literature.
My code for reference:
parameters = [
{
"name": "v",
"formula": "v ~ 0 + (1|participant_id) + (stimulus|participant_id)",
"link": "identity",
"prior": {
"1|participant_id": {
"name": "Normal",
"mu": {
"name": "Normal",
"mu": 2,
"sigma": 3,
"initval": 2,
},
"sigma": {
"name": "HalfNormal",
"sigma": 2,
"initval": 0.1,
},
},
"stimulus|participant_id": {
"name": "Normal",
"mu": {
"name": "Normal",
"mu": 0,
"sigma": 15,
"initval": 0,
},
"sigma": {
"name": "Uniform",
"lower": 1e-10,
"upper": 100,
"initval": 0.1,
},
},
},
"bounds": (-numpy.inf, numpy.inf),
},
{
"name": "z",
"formula": "z ~ 0 + (1|participant_id)",
"link": "gen_logit",
"prior": {
"1|participant_id": {
"name": "Normal",
"mu": {
"name": "Normal",
"mu": 0.5,
"sigma": 0.5,
"initval": 0.5,
},
"sigma": {
"name": "HalfNormal",
"sigma": 0.05,
"initval": 0.1,
},
},
},
"bounds": (0.0, 1.0),
},
{
"name": "a",
"formula": "a ~ 0 + (1|participant_id)",
"link": "identity",
"prior": {
"1|participant_id": {
"name": "Gamma",
"mu": {
"name": "Gamma",
"mu": 1.5,
"sigma": 0.75,
"initval": 1,
},
"sigma": {
"name": "HalfNormal",
"sigma": 0.1,
"initval": 0.1,
},
},
},
"bounds": (0, numpy.inf),
},
{
"name": "t",
"formula": "t ~ 0 + (1|participant_id)",
"link": "identity",
"prior": {
"1|participant_id": {
"name": "Gamma", # deviating from Weicki's paper because HDDM library implements Gamma
"mu": {
"name": "Gamma",
"mu": 0.4,
"sigma": 0.2,
"initval": 0.001, # small value here can help in convergence
},
"sigma": {
"name": "HalfNormal",
"sigma": 1,
"initval": 0.2,
},
},
},
"bounds": (0.0, numpy.inf),
},
{
"name": "sv",
"prior": {
"name": "HalfNormal",
"sigma": 2.0,
"initval": 1.0,
},
"bounds": (0, numpy.inf),
},
]
ddm_model = hssm.HSSM(
data=data,
model='ddm_sdv',
loglik_kind='blackbox',
p_outlier=0.05,
include=parameters,
)
—
Reply to this email directly, view it on GitHub
<#413>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAG7TFH3WA3WRJAGBZVOOE3ZBCBW3AVCNFSM6AAAAABHKOEC42VHI2DSMVQWIX3LMV43ERDJONRXK43TNFXW4OZWGYZDCOBSG4>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
2 replies
-
I was going to say- scaling is not the best approach to deal with that
issue - if you have different response deadlines then that changes the
decision process (and it won't be linear). The usual approach to that would
be to consider a collapsing bound model (e.g. the angle LAN model, or the
weibull, see the eLife or JoCN papers on this) because the participant
knows that they need to response before a certain time, and where the angle
of the collapse may differ for different deadlines. So you could either fit
those participants in two separate models or consider a collapsing bound
model to see if only allowing the angle theta (and/or the initial height a)
varies by deadline.
I suspect that if you fit this model you would find that it fits better
than the standard DDM (in both model selection metrics like WAIC but more
importantly in posterior predictive checks such as quantile probability
plots).
Note though that if you have a reasonable number of trials in which
subjects omit responses altogether then even fitting the angle model will
be biased if you just throw out those trials - we have a method we
developed to address this which allows one to consider the likelihood of
producing an omission and then using the omitted trials as part of the data
to be fit. (This is a cogsci paper but will be ported to HSSM soon). For
now though if the number of omissions is very small you could still try
fitting the angle LAN.
This might also interact with t because of collinearities, especially in
the regular DDM which will try however it can to force a fit to RT
distributions via a combination of t,a,z,v. And this would be worse if the
subjects have different deadlines.
That said, ultimately one should still be able to get the model to converge
even if the fit is not great. So it would also be useful to know for
diagnostic purposes if simply fitting subjects with the different deadlines
in two separate models helps with convergence, again using a prior on
standard deviation for t that is not too wide.
As mentioned in other threads, one of the issues here is that we are using
a gradient sampler which is more efficient but the gradient for t is not
well behaved due to sharp decline in likelihood as soon as any individual t
proposed is > minRT for that participant. HDDM used a slice sampler which
is not gradient based. One could also try that here in HSSM which also has
a slice sampler, but for now it is slow because we have not optimized the
slice sampler parameters as we had done in HDDM (this is a TODO for cases
in which gradient samplers might fail). We also have other potential
solutions in the works for the gradient samplers.
…On Wed, May 8, 2024 at 8:07 AM Raj V Jain ***@***.***> wrote:
Update: even the scaling didn't help
—
Reply to this email directly, view it on GitHub
<#413 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAG7TFC3RQ7QLDQ46N5PE3LZBIIP5AVCNFSM6AAAAABHKOEC42VHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM4TGNJTHA4DQ>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
1 reply
-
I would start for purposes of understanding and convergence simply trying a
hierarchical model per each cohort, where all parameters within that cohort
are estimated hierarchically (ie theta is fit per participant from a group
distribution). Then you can try a version where you fit a single
hierarchical model but allow cohort to be a between subjects factor that
modifies theta and/or a.
And correct, we have not yet implemented the omissions into HSSM, but this
is on the TODO list for one of the next releases.
M
…On Wed, May 8, 2024 at 9:03 AM Raj V Jain ***@***.***> wrote:
Thank you, Professor, for these suggestions. I can try the angle LAN model
as we have few no-response trials. Some follow-up questions:
1. In your opinion, would it be better to fit theta per *participant*
or per *cohort* (cohort = group of participants having the same
response window)?
2. Do you suggest trying to fit all the parameters, i.e., theta, t,
and a simultaneously or in a staggered fashion?
3. to clarify: The current implementation of angle LAN in HSSM doesn't
support omissions; in future versions, it could. Right?
—
Reply to this email directly, view it on GitHub
<#413 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAG7TFGZ5SYD3Y5B26FXKCTZBIPDTAVCNFSM6AAAAABHKOEC42VHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM4TGNJUGQ2DG>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello everyone. I am trying to fit a hierarchical DDM as proposed by Weicki et al. 2013 to my data where (just FYI)
data
has ~25 participants with about 100 trials per participant.I am sampling 6 chains with 1250 samples for tuning and 1250 drawn samples. All my parameters except the following have r_hat < 1.01 and ESS > 1000:
I suspect that running the chain longer may help with
v
, but unsure abouta
andt
.What I am looking for: Any tips / practical ideas that worked for you to converge these variance parameters, e.g., some specific combination of priors or initial values or using a different loglikelihood estimation method or some best practices suggested in the literature.
My code for reference:
Beta Was this translation helpful? Give feedback.
All reactions