-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sampler hanging above some hazy number of samples #97
Comments
Note: sampling using a SerialPool process does not reproduce this issue. |
If you use conda, could you dump your environment info with: |
I think there are a few things going on here. (1) I fixed some minor speed issues in (2) Generating that many prior samples is always going to be somewhat slow, so I would do it once and store the samples in a file on disk. I would use something like this: import os
prior_cache_file = 'prior_samples.hdf5'
if not os.path.exists(prior_cache_file):
prior_samples = prior.sample(2*10**8)
prior_samples.write(prior_cache_file)
with schwimmbad.MultiPool() as pool:
joker = tj.TheJoker(params, pool=pool)
samples = joker.rejection_sample([rvdata_j, rvdata_k], prior_cache_file) (3) When you pass in a |
Hmm, now I'm seeing that theano lock issue again. Also, the bottleneck
didn't seem to be in the prior sampling; that always happened in a
reasonable amount of time.
…On Tue, Nov 26, 2019 at 11:59 AM Adrian Price-Whelan < ***@***.***> wrote:
I think there are a few things going on here.
(1) I fixed some minor speed issues in prior.sample(), so please try
updating with: pip install git+https://github.com/adrn/thejoker
(2) Generating that many prior samples is always going to be somewhat
slow, so I would do it once and store the samples in a file on disk. I
would use something like this:
import os
prior_cache_file = 'prior_samples.hdf5'if not os.path.exists(prior_cache_file):
prior_samples = prior.sample(2*10**8)
prior_samples.write(prior_cache_file)
with schwimmbad.MultiPool() as pool:
joker = tj.TheJoker(params, pool=pool)
samples = joker.rejection_sample([rvdata_j, rvdata_k], prior_cache_file)
(3) When you pass in a JokerSamples object to rejection_sample(), the
first thing it does (by default) is write it out to a temporary file. In
this case, that is a ~7 GB file, so that is probably slow. This will be
fixed if you use the code snippet above.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#97?email_source=notifications&email_token=ABYLHCDJLVUMCFK3YGYNKBTQVVIXVA5CNFSM4JQU6RFKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFGXFVA#issuecomment-558723796>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABYLHCGYDHDDQ76PC6NB3UTQVVIXVANCNFSM4JQU6RFA>
.
|
I am still seeing the theano lock error message when attempting to run with MultiPool(). I am using a dataset with ~60 RVs; could that be part of the issue? |
@leerosenthalj Coming back to this after a long time, I'm realizing that this is likely an issue with running from a Jupyter notebook. If you switch to a script, and set the @AstroSong I made a new issue to discuss your question #106 |
I am running the sampler using MultiPool(), and keeping an eye on my activity monitor. I see eight python processes run for a few minutes, and then they stop -- but the notebook cell in which I am running the process does not complete. Is it possible that the joker.rejection_sample() code is stalling somewhere at the end of the process, after the actual sampling has been completed?
EDIT: code snippet share via email
The text was updated successfully, but these errors were encountered: