Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

svyglm.nb() function hangs #112

Open
sconti555 opened this issue Jan 25, 2022 · 7 comments
Open

svyglm.nb() function hangs #112

sconti555 opened this issue Jan 25, 2022 · 7 comments

Comments

@sconti555
Copy link

sconti555 commented Jan 25, 2022

Dear Daniel,

First and foremost I'd like to acknowledge you for your time and dedication in making available to the wider community your code and modelling facilities through your very handy 'sjstats' library.

I've been attempting to use your svyglm.nb() function to fit a survey-weighted Negative Binomial regression model to some data, invariably ending up with a hung code execution. I first tried to fit a relatively sizeable data-set (ca 750,000 cases, ca 80 predictors) separately with a variety of weights, to no avail: the code would just hang with no way to halt its execution (at least) in RStudio other than killing the RStudio task. Thinking that the issue may lie in an awkward distribution of weights I took a much smaller subset of the data (of ca 4,600 cases) and subsetted the weights accordingly (with the subset.survey.design() function); again to no avail and with no different code execution behaviour. The standard glm.nb() function from the MASS package instead fits the unweighted data (either the full or sub-sample) with no problem within reasonable run-time, whereas using svyglm.nb() on either a set of unit weights -- making it equivalent to the use of glm.nb() -- or any of the 4 sets of weights I'm exploring leads to me killing RStudio after over 24hrs run-times.

By looking at the code underlying your svyglm.nb() function I developed the impression that the issue may lie in over- / under-flow evaluations of the Digamma functions featuring the first derivatives of the Negative Binomial likelihood. I am writing to check whether you have experienced similar issues with the use of your svyglm.nb() function, and if so how (if) you managed to circumvent them.

Many thanks in advance for any help you may be able to offer!

@paige90
Copy link

paige90 commented Apr 11, 2022

Hello,
I have exactly the same issue and my rstudio’s been running for hours with no hope for converging.
Did you by any luck find a solution to this problem?

@sconti555
Copy link
Author

Hello,
I have exactly the same issue and my rstudio’s been running for hours with no hope for converging.
Did you by any luck find a solution to this problem?

Hi page90,

Unfortunately I haven't, and not having heard from the developer – whose code incidentally I don't believe to be necessarily faulty – on the subject I'm more inclined on trying a bootstrap-based approach, as outlined in Prof Lumley's monograph.

@strengejacke
Copy link
Owner

I have used the function in the past on a rather large survey dataset (SHARE), and had no problems. I can't say for sure why the function hangs at some point, especially since this also happens for you with a relatively small dataset.

@paige90
Copy link

paige90 commented Apr 13, 2022

@sconti555
Hello, thank you for your reply!
I tried the same command in stata and it did not obtain convergence. Although I had success in getting the results using zero inflated poisson in r. I think negative binomial is tricky for complex survey design.

@sconti555
Copy link
Author

@sconti555
Hello, thank you for your reply!
I tried the same command in stata and it did not obtain convergence. Although I had success in getting the results using zero inflated poisson in r. I think negative binomial is tricky for complex survey design.

Thank you for your follow-up, @paige90. I reckon that in my case the problem is entirely numerical, and stemming from the Digamma function in the Information matrix. Weights aren't likely the issue for me, since settings a Uniform sampling design (with unit weights) I run into the same exact issue.

@CharlyMarie
Copy link

Hello everyone

I have exactly the same problem with an emailed survey:

  • I had missing data, which I imputed using Mice
  • I calculated ATE weights to account for non-response bias (around 10 % of respondents)
  • I am using svyglm.nb to regress a count variable by my variable of interest. The function does not seem to converge for a simple regression like Y ~ X (at least after several hours)

Have any of you, like @sconti555, found the problem and/or a way of solving it?

Many thanks

Charly

@sconti555
Copy link
Author

Hi @CharlyMarie,

Unfortunately I had given up, since I was unable to overcome that same issue. Believing that it has to do with the underlying svymle() engine – I had tried recoding the Negative Binomial likelihood function to avoid it getting in a tailspin around (unavoidable, which makes it tricky) Digamma function evaluations, to no avail – I e-mailed Thomas Lumley about the problem, but don't recall receiving a response.

Best of luck with advancing this matter, which I'd love to see resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants