-
-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intermittent SIGSEGV when running multiple ssh_channel.exec_command() #645
Comments
When I enable logging via
|
Version of remote SSH daemon (if that makes any difference):
|
@Jakuje ideas? |
Having log level trace output (see #597) would help to investigate the issue. From the current debug log it is not clear what is going on. I can just guess from the backtrace, that after the first channel got closed, either the callback or something is probing the structure that might have been freed. Is there way to install debuginfo on openbsd to see some more information through gdb about variables that make it crash or does it look like crashing inside the cpython? |
I didn't prepare OpenBSD package with #597 yet, but I have plan to do it. For now I have vanilla version ansible-libssh
|
I built package based on Jakuje@6d1f467 Ran
output is as follows:
trace output is:
|
One more way to get debug logs from libssh is to use |
Not sure does changing logging level via Python's
output is still the same:
|
I reverted to ansible-libssh 1.2.2 and now logging works as expected, but output is too long to paste here, so attaching a file test4-output-ansible-libssh-1.2.2-v001.txt |
It's not visible in the attached file but it did ended up with |
I think I see the issue (sorry I did not notice it at first). The SSH channels do not allow running multiple commands in them as they are closed after the command execution. You need to allocate a new one for next command or handle the IO in the shell yourself. I think the following should do:
(untested) Certainly the pylibssh/libssh should not crash at this attempt and if this is not clear from the documentation, we should be more explicit about this. Regarding handling the IO, it would mean opening shell with |
The reference where this is described in the RFC 4254:
|
Hmm. There is test |
I also suspected that and have another version of my Python script, but that also dumps core. However I wanted to open a new GitHub issue for that. I am not sure which way for you is better, keep it here or open a new one. I'll open a new one, but then it can be marked as duplicated and you can bring discussion back here. |
I've also opened #657 for another core dump, when multiple channels are opened. |
My bad, the So the following test basically does all what you do: pylibssh/tests/unit/channel_test.py Lines 42 to 48 in cc2ceff
This test is present for 3 years since #280 when this code was introduced and I think it was not crashing in Linux builds so I am wondering if this is something specific for OpenBSD. Do you happen to be able to test your code on different platform to pinpoint it to something specific to OpenBSD? |
Ok, reading further it test is flaky and sometimes segfaults as described in #57 and they have reports also from ubuntu and macos. Let me see if I can reproduce it. |
Outside of this issue, I am wondering should |
That would make most sense for me. The current location of this function is confusing.
I think this would break existing applications, which we will not want to do. |
I think the issue will live in the following code: pylibssh/src/pylibsshext/channel.pyx Lines 169 to 174 in cc2ceff
the callback structure |
I've packaged Jakuje@8c72faa to test your changes on OpenBSD and I cannot reproduce core dump anymore with |
Thank you for testing! Good to hear that it worked for you! Will leave it up to @webknjaz to review and help with the python side as I am more C programmer. |
Having slept over it, I think the libssh is also a bit to blame. The delayed freeing of the channels sound like a good idea, but it results in these issues (we have also some random failures in CI, but not that common like here). I think the libssh should be changed to not invoke callbacks on channels user explicitly freed. If we keep them around for a reasons, we should not assume the user kept the callbacks around. I will submit a MR to fix that later today. |
Could you try that with the top commit from https://gitlab.com/libssh/libssh-mirror/-/merge_requests/549 and original pylibssh if it will still crash? |
I needed to modify the patch a bit to make it apply to libssh
and I cannot reproduce core dump with both |
Thank you for testing that! I updated the patch to keep freeing in both places as in rare occasions, it could happen that the callbacks could be assigned even after the shell is freed (which is even more awkward conditions and we should probably prevent it too, but it sounds like our tests caught some occasions of this with valgrind). |
I've tested !549 (c40a1a16) merge request in this GitHub issue and I cannot reproduce core dump with both |
SUMMARY
On OpenBSD -current as of 2024-09-04 I have following backtrace from a core dump:
ISSUE TYPE
PYLISSH and LIBSSH VERSION
OS / ENVIRONMENT
STEPS TO REPRODUCE
EXPECTED RESULTS
Execution of below script should work all the time, but it code dumps intermittently.
ACTUAL RESULTS
Core dumps every now and then. Always on second command. First command always works.
The text was updated successfully, but these errors were encountered: