-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
issue with lsf-drmaa #14
Comments
Nothing comes to mind, but you could always try using faulthandler to see if you can get more debugging information. |
thanks, will try. i could only trace it to the c call in drmaa/helpers.py |
I'm afraid I can't get any further with faulthandler than I already got with python tracing
With faulthandler
It is clear that runJob calls "c" which crashes when returning a call from the API. It does not look to me as an API problem since the c function is called a number of times prior to that. |
The I've never used LSF before (or even really heard much about it), but I think the problem is that the LSF DRMAA implementation seems to be expecting the job ID to be a integer, whereas we're passing it as a string. I'm basing this on the fact that PyLSF library uses a I just double-checked the DRMAA specifications, and they definitely say that job IDs should be strings, so this seems to be a mistake in the LSF DRMAA interface. I would file an issue with the LSF people if you can, or try using PyLSF if you need Python bindings that work with LSF right away. Although, did you say that this is working on openSUSE? |
Dan, thank you for looking into this. LSF DRMAA specifies job id as a char array. For example, #define MAX_LEN_JOBID 100 This should work similarly to your
in runJob function For reference, LSF drmaa.h is here On 13 June 2014 18:23, Dan Blanchard [email protected] wrote:
|
I see that Galaxy has version 0.6 of our library set as what they require. Is that what you're using? I'm curious if you see that same issues with 0.7.6. I've also just submitted a PR to that project to update their version to 0.7.6 (or at least, I tried to but bitbucket is being very slow with the forking at the moment). If LSF limits the JOBID length at 100, that might explain why you get a segfault (since we pass a string buffer of 128 characters). Although, that wouldn't explain why it works on OpenSUSE and not on RedHat... Maybe try modifying your locally installed copy of DRMAA Python to change the buffer there to be 100 characters and see if that works. If that works, I'll modify the DRMAA Python to allow you set an environment variable that controls how long the buffer can be. |
Yes, once I identified where the problem is with Galaxy I switched to Indeed, LSF drmaa.h defines whereas drmaa/const.py sets so I've already tried changing the latter to I've also tried reducing string buffer in runJob in drmaa/session.py from
Running the sample C code from LSF -DRMAA produces the following
So the next line above after fsd_job_release should have been a return from |
Tested Python DRMAA on another cluster - SLES 11 SP1, python 2.6. |
Have been using a recent copy of lsf-drmaa and drmaa-python without issues. So maybe this was fixed at some point in one of them? |
I am having issue with lsf-drmaa(1.1.1) and drmaa-python (0.7.9) with string formatting. #!/usr/bin/env python import drmaa def main():
if name=='main': I got the following error message: A DRMAA object was created Anyone has an idea how to fix it? |
Hello,
Could you think of a reason why drmaa-python and lsf-drmaa
https://github.com/PlatformLSF/lsf-drmaa
would segfault on RH 6.4 but not on opensuse 12.1?
Trying some of the examples, I can see a job gets submitted (and run) but Python segfaults.
Thanks
The text was updated successfully, but these errors were encountered: