Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent SEGFAULTs on consecutive exec_command() invocations #658

Open
wants to merge 5 commits into
base: devel
Choose a base branch
from

Conversation

Jakuje
Copy link
Contributor

@Jakuje Jakuje commented Nov 14, 2024

SUMMARY

The function exec_command() keeps the callbacks as a local variable before assigning them to the created channel. The channel is not guaranteed to be completely freed when ssh_channel_free() is called because there might be some leftover messages or responses to process (close confirmation, exit code ...).

Calling the exec_command() as done previously in the test from the same function without anything in between (except assert) will likely map the second function call to the same memory on the call stack so it was working most of the time. But calling it from different functions or contexts will likely change the call stack and processing of outstanding callbacks is more likely to result in addressing wrong memory location.

Likely fixes #57, #645 and #657

ISSUE TYPE
  • Bugfix Pull Request
ADDITIONAL INFORMATION

I was not able to reproduce the issue locally so pushing to see if the CI will be able to crash.

This is also introducing memory leaks as the callback structure is never freed. We should probably store it somewhere in the python code before returning to make sure it is not garbage collected (or can the python GC track the callback pointer is still stored on the libssh side?).

@psf-chronographer psf-chronographer bot added the bot:chronographer:provided There is a change note present in this PR label Nov 14, 2024
Copy link

Congratulations! One of the builds has completed. 🍾

You can install the built RPMs by following these steps:

  • sudo yum install -y dnf-plugins-core on RHEL 8
  • sudo dnf install -y dnf-plugins-core on Fedora
  • dnf copr enable packit/ansible-pylibssh-658
  • And now you can install the packages.

Please note that the RPMs should be used only in a testing environment.

@Jakuje Jakuje force-pushed the crash-exec branch 3 times, most recently from 1de5bc0 to 251bb6d Compare November 19, 2024 10:12
@Jakuje Jakuje marked this pull request as ready for review November 19, 2024 10:12
strict=False,
)
def exec_second_command(ssh_channel):
"""Call exec_command() from different context in the call stack."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling from a different context is a part of the motivation. Within this function, there's nothing that changes the context. You're probably referring to the calling test but this is not something that would go into the docstring because it's not what the function does, it's what that test does, probably.

Copy link
Member

@webknjaz webknjaz Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""Call exec_command() from different context in the call stack."""
"""Return ``exec_command()`` stdout as a text string."""

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if this is more accurate as the function does not return anything -- just runs the command and checks the output.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right, I made another suggestion now.

@webknjaz webknjaz changed the title Prevent crash on consecutive exec_command() Prevent SEGFAULTs on consecutive exec_command() invocations Nov 19, 2024
@webknjaz
Copy link
Member

@Jakuje it looks like this is making some CI jobs get stuck: https://github.com/ansible/pylibssh/actions/runs/11910858999/job/33210235668?pr=658 / https://github.com/ansible/pylibssh/actions/runs/11910858999/job/33210009382?pr=658.
(I cancelled the 3.9 one after 39 minutes, and the 3.10 one was killed by xdist after a 30-second timeout)

@webknjaz
Copy link
Member

I restarted said jobs, but this is something to look into, as it'll probably make the CI flakier if merged.

@webknjaz
Copy link
Member

@webknjaz
Copy link
Member

OTOH, it's also unstable on devel: https://github.com/ansible/pylibssh/actions/runs/11917417576/job/33212843276

@Jakuje Jakuje force-pushed the crash-exec branch 4 times, most recently from 9bc197c to 9a6e7d6 Compare November 20, 2024 14:05
@webknjaz
Copy link
Member

So the failure @ https://github.com/ansible/pylibssh/actions/runs/11935006968/job/33265619653?pr=658#step:15:181 is referring to https://github.com/ansible/pylibssh/blob/6cbf2dd/src/pylibsshext/channel.pyx#L45.
And for some reason, the result variable in that context seems to contain an instance of a pytest-internal object — ReprTraceback (https://github.com/pytest-dev/pytest/blob/72f17d1/src/_pytest/_code/code.py#L1156-L1180).

This is rather bizzare. How did it get in there?

============================= test session starts ==============================
platform linux -- Python 3.8.18, pytest-8.3.3, pluggy-1.5.0
cachedir: .tox/test-source-dists/.pytest_cache
rootdir: /home/runner/work/pylibssh/pylibssh
configfile: pytest.ini
testpaths: tests/
plugins: cov-5.0.0, forked-1.6.0, xdist-3.6.1
created: 4/4 workers
4 workers [25 items]

Timeout (0:00:30)!
Thread 0x00007fd72445b700 (most recent call first):
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/execnet/gateway_base.py", line 534 in read
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/execnet/gateway_base.py", line 567 in from_io
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/execnet/gateway_base.py", line 1160 in _thread_receiver
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/execnet/gateway_base.py", line 341 in run
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/execnet/gateway_base.py", line 411 in _perform_spawn

Thread 0x00007fd7255ff740 (most recent call first):
  File "/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/subprocess.py", line 1780 in _try_wait
  File "/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/subprocess.py", line 1822 in _wait
  File "/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/subprocess.py", line 1083 in wait
  File "/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/subprocess.py", line 342 in call
  File "/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/subprocess.py", line 359 in check_call
  File "/home/runner/work/pylibssh/pylibssh/tests/conftest.py", line 103 in ssh_clientkey_path
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/fixtures.py", line 898 in call_fixture_func
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/fixtures.py", line 1140 in pytest_fixture_setup
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/fixtures.py", line 1091 in execute
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/fixtures.py", line 617 in _get_active_fixturedef
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/fixtures.py", line 1047 in execute
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/fixtures.py", line 617 in _get_active_fixturedef
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/fixtures.py", line 1047 in execute
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/fixtures.py", line 617 in _get_active_fixturedef
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/fixtures.py", line 1047 in execute
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/fixtures.py", line 617 in _get_active_fixturedef
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/fixtures.py", line 1047 in execute
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/fixtures.py", line 617 in _get_active_fixturedef
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/fixtures.py", line 1047 in execute
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/fixtures.py", line 617 in _get_active_fixturedef
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/fixtures.py", line 532 in getfixturevalue
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/fixtures.py", line 697 in _fillfixtures
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/python.py", line 1630 in setup
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/runner.py", line 514 in setup
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/runner.py", line 160 in pytest_runtest_setup
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/runner.py", line 242 in <lambda>
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/runner.py", line 341 in from_call
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/runner.py", line 241 in call_and_report
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/runner.py", line 126 in runtestprotocol
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/runner.py", line 113 in pytest_runtest_protocol
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/xdist/remote.py", line 195 in run_one_test
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/xdist/remote.py", line 174 in pytest_runtestloop
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/main.py", line 337 in _main
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/main.py", line 283 in wrap_session
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/_pytest/main.py", line 330 in pytest_cmdline_main
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/xdist/remote.py", line 393 in <module>
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/execnet/gateway_base.py", line 1291 in executetask
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/execnet/gateway_base.py", line 341 in run
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/execnet/gateway_base.py", line 411 in _perform_spawn
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/execnet/gateway_base.py", line 389 in integrate_as_primary_thread
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/execnet/gateway_base.py", line 1273 in serve
  File "/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/execnet/gateway_base.py", line 1806 in serve
  File "<string>", line 8 in <module>
  File "<string>", line 1 in <module>
/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/coverage/inorout.py:520: CoverageWarning: Module pylibsshext was previously imported, but not measured (module-not-measured)
  self.warn(msg, slug="module-not-measured")
/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/coverage/inorout.py:520: CoverageWarning: Module pylibsshext was previously imported, but not measured (module-not-measured)
  self.warn(msg, slug="module-not-measured")
/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/coverage/inorout.py:520: CoverageWarning: Module pylibsshext was previously imported, but not measured (module-not-measured)
  self.warn(msg, slug="module-not-measured")
/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/coverage/inorout.py:520: CoverageWarning: Module pylibsshext was previously imported, but not measured (module-not-measured)
  self.warn(msg, slug="module-not-measured")
..FE......................                                               [100%]
==================================== ERRORS ====================================
____________________ ERROR at teardown of test_exec_command ____________________
[gw0] linux -- Python 3.8.18 /home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/bin/python

>   result.stdout += data_b
E   AttributeError: 'ReprTraceback' object has no attribute 'stdout'

BytesIO    = <class '_io.BytesIO'>
Channel    = <class 'pylibsshext.channel.Channel'>
ChannelCallback = <class 'pylibsshext.channel.ChannelCallback'>
CompletedProcess = <class 'subprocess.CompletedProcess'>
LibsshChannelReadFailure = <class 'pylibsshext.errors.LibsshChannelReadFailure'>
__builtins__ = <builtins>
__doc__    = None
__file__   = '/home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/pylibsshext/channel.cpython-38-x86_64-linux-gnu.so'
__loader__ = <_frozen_importlib_external.ExtensionFileLoader object at 0x7f1714238b50>
__name__   = 'pylibsshext.channel'
__package__ = 'pylibsshext'
__reduce_cython__ = <cyfunction Channel.__reduce_cython__ at 0x7f17141352b0>
__setstate_cython__ = <cyfunction Channel.__setstate_cython__ at 0x7f1714133450>
__spec__   = ModuleSpec(name='pylibsshext.channel', loader=<_frozen_importlib_external.ExtensionFileLoader object at 0x7f1714238b50...libssh/pylibssh/.tox/test-source-dists/lib/python3.8/site-packages/pylibsshext/channel.cpython-38-x86_64-linux-gnu.so')
__test__   = {}
signal     = <module 'signal' from '/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/signal.py'>
time       = <module 'time' (built-in)>

src/pylibsshext/channel.pyx:45: AttributeError

The above exception was the direct cause of the following exception:

ssh_client_session = <pylibsshext.session.Session object at 0x7f170fbccca0>

    @pytest.fixture
    def ssh_channel(ssh_client_session):
        """Initialize a channel and tear it down after testing.
    
        :param ssh_client_session: A pre-authenticated SSH session.
        :type ssh_client_session: pylibsshext.session.Session
    
        :yields: A libssh channel instance.
        :ytype: pylibsshext.channel.Channel
        """
        chan = ssh_client_session.new_channel()
        try:  # noqa: WPS501
            yield chan
        finally:
>           chan.close()
E           SystemError: <cyfunction Channel.close at 0x7f17141351e0> returned a result with an error set

chan       = <pylibsshext.channel.Channel object at 0x7f1714180fc0>
ssh_client_session = <pylibsshext.session.Session object at 0x7f170fbccca0>

tests/unit/channel_test.py:33: SystemError
---------------------------- Captured stdout setup -----------------------------
Generating public/private rsa key pair.
Your identification has been saved in /tmp/pytest-of-runner/pytest-0/popen-gw0/test_exec_command0/sshd/ssh_client_rsa_key
Your public key has been saved in /tmp/pytest-of-runner/pytest-0/popen-gw0/test_exec_command0/sshd/ssh_client_rsa_key.pub
The key fingerprint is:
SHA256:Us8tQjYuzYhTfcblu4lb5wrgGe2ldOtFQ++NgSVUrug ansible-pylibssh integration tests key
The key's randomart image is:
+---[RSA 8192]----+
|            o..  |
|       . . + .   |
|      . * + o +  |
|     o O B o B . |
|    o + S B B + .|
|     . + O * = =.|
|        o E = + o|
|           = +   |
|          . o..  |
+----[SHA256]-----+
Generating public/private rsa key pair.
Your identification has been saved in /tmp/pytest-of-runner/pytest-0/popen-gw0/test_exec_command0/sshd/ssh_host_rsa_key
Your public key has been saved in /tmp/pytest-of-runner/pytest-0/popen-gw0/test_exec_command0/sshd/ssh_host_rsa_key.pub
The key fingerprint is:
SHA256:fplYscGeYIpDFgvJToQJg+83vm8zBbtnnQpDeef9xUw runner@fv-az524-896
The key's randomart image is:
+---[RSA 3072]----+
|==o..            |
|+.+. o   .       |
| +  +   o +      |
|  oo ..+ o =     |
| .  o +oS *     E|
|  . oo.o.= +   + |
|   o .ooo.=..   +|
|    . =oo.o  . . |
|    .+.=..    .  |
+----[SHA256]-----+
---------------------------- Captured stderr setup -----------------------------
debug2: load_server_config: filename /dev/null
debug2: load_server_config: done config len = 1
debug2: parse_server_config_depth: config /dev/null len 1
debug1: sshd version OpenSSH_8.2, OpenSSL 1.1.1f  31 Mar 2020
debug1: private host key #0: ssh-rsa SHA256:fplYscGeYIpDFgvJToQJg+83vm8zBbtnnQpDeef9xUw
debug1: setgroups() failed: Operation not permitted
debug1: rexec_argv[0]='/usr/sbin/sshd'
debug1: rexec_argv[1]='-D'
debug1: rexec_argv[2]='-f'
debug1: rexec_argv[3]='/dev/null'
debug1: rexec_argv[4]='-o'
debug1: rexec_argv[5]='LogLevel=DEBUG3'
debug1: rexec_argv[6]='-o'
debug1: rexec_argv[7]='HostKey=/tmp/pytest-of-runner/pytest-0/popen-gw0/test_exec_command0/sshd/ssh_host_rsa_key'
debug1: rexec_argv[8]='-o'
debug1: rexec_argv[9]='PidFile=/tmp/pytest-of-runner/pytest-0/popen-gw0/test_exec_command0/sshd/sshd.pid'
debug1: rexec_argv[10]='-o'
debug1: rexec_argv[11]='UsePAM=yes'
debug1: rexec_argv[12]='-o'
debug1: rexec_argv[13]='PasswordAuthentication=no'
debug1: rexec_argv[14]='-o'
debug1: rexec_argv[15]='ChallengeResponseAuthentication=no'
debug1: rexec_argv[16]='-o'
debug1: rexec_argv[17]='GSSAPIAuthentication=no'
debug1: rexec_argv[18]='-o'
debug1: rexec_argv[19]='StrictModes=no'
debug1: rexec_argv[20]='-o'
debug1: rexec_argv[21]='PermitEmptyPasswords=yes'
debug1: rexec_argv[22]='-o'
debug1: rexec_argv[23]='PermitRootLogin=yes'
debug1: rexec_argv[24]='-o'
debug1: rexec_argv[25]='Protocol=2'
debug1: rexec_argv[26]='-o'
debug1: rexec_argv[27]='HostbasedAuthentication=no'
debug1: rexec_argv[28]='-o'
debug1: rexec_argv[29]='IgnoreUserKnownHosts=yes'
debug1: rexec_argv[30]='-o'
debug1: rexec_argv[31]='Port=43559'
debug1: rexec_argv[32]='-o'
debug1: rexec_argv[33]='ListenAddress=127.0.0.1'
debug1: rexec_argv[34]='-o'
debug1: rexec_argv[35]='AuthorizedKeysFile=/tmp/pytest-of-runner/pytest-0/popen-gw0/test_exec_command0/sshd/authorized_keys'
debug1: rexec_argv[36]='-o'
debug1: rexec_argv[37]='AcceptEnv=LANG LC_*'
debug1: rexec_argv[38]='-o'
debug1: rexec_argv[39]='Subsystem=sftp internal-sftp'
Warning: Permanently added '[127.0.0.1]:43559' (RSA) to the list of known hosts.
=================================== FAILURES ===================================
______________________________ test_exec_command _______________________________
[gw0] linux -- Python 3.8.18 /home/runner/work/pylibssh/pylibssh/.tox/test-source-dists/bin/python

ssh_channel = <pylibsshext.channel.Channel object at 0x7f1714180fc0>

    def test_exec_command(ssh_channel):
        """Test getting the output of a remotely executed command."""
        u_cmd_out = ssh_channel.exec_command('echo -n Hello World').stdout.decode()
        assert u_cmd_out == u'Hello World'  # noqa: WPS302
        # Test that repeated calls to exec_command do not segfault.
    
        # NOTE: Call `exec_command()` once again from another function to
        # NOTE: force it to happen in another place of the call stack,
        # NOTE: making sure that the context is different from one in this
        # NOTE: this test function. The resulting call stack will end up
        # NOTE: being more random.
>       exec_second_command(ssh_channel)

ssh_channel = <pylibsshext.channel.Channel object at 0x7f1714180fc0>
u_cmd_out  = 'Hello World'

tests/unit/channel_test.py:53: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

ssh_channel = <pylibsshext.channel.Channel object at 0x7f1714180fc0>

    def exec_second_command(ssh_channel):
        """Call ``exec_command()`` and checks the output."""
        u_cmd_out = ssh_channel.exec_command('echo -n Hello Again').stdout.decode()
>       assert u_cmd_out == u'Hello Again'  # noqa: WPS302
E       AssertionError: assert '' == 'Hello Again'
E         
E         - Hello Again

ssh_channel = <pylibsshext.channel.Channel object at 0x7f1714180fc0>
u_cmd_out  = ''

tests/unit/channel_test.py:39: AssertionError
---------------------------- Captured stdout setup -----------------------------
Generating public/private rsa key pair.
Your identification has been saved in /tmp/pytest-of-runner/pytest-0/popen-gw0/test_exec_command0/sshd/ssh_client_rsa_key
Your public key has been saved in /tmp/pytest-of-runner/pytest-0/popen-gw0/test_exec_command0/sshd/ssh_client_rsa_key.pub
The key fingerprint is:
SHA256:Us8tQjYuzYhTfcblu4lb5wrgGe2ldOtFQ++NgSVUrug ansible-pylibssh integration tests key
The key's randomart image is:
+---[RSA 8192]----+
|            o..  |
|       . . + .   |
|      . * + o +  |
|     o O B o B . |
|    o + S B B + .|
|     . + O * = =.|
|        o E = + o|
|           = +   |
|          . o..  |
+----[SHA256]-----+
Generating public/private rsa key pair.
Your identification has been saved in /tmp/pytest-of-runner/pytest-0/popen-gw0/test_exec_command0/sshd/ssh_host_rsa_key
Your public key has been saved in /tmp/pytest-of-runner/pytest-0/popen-gw0/test_exec_command0/sshd/ssh_host_rsa_key.pub
The key fingerprint is:
SHA256:fplYscGeYIpDFgvJToQJg+83vm8zBbtnnQpDeef9xUw runner@fv-az524-896
The key's randomart image is:
+---[RSA 3072]----+
|==o..            |
|+.+. o   .       |
| +  +   o +      |
|  oo ..+ o =     |
| .  o +oS *     E|
|  . oo.o.= +   + |
|   o .ooo.=..   +|
|    . =oo.o  . . |
|    .+.=..    .  |
+----[SHA256]-----+
---------------------------- Captured stderr setup -----------------------------
debug2: load_server_config: filename /dev/null
debug2: load_server_config: done config len = 1
debug2: parse_server_config_depth: config /dev/null len 1
debug1: sshd version OpenSSH_8.2, OpenSSL 1.1.1f  31 Mar 2020
debug1: private host key #0: ssh-rsa SHA256:fplYscGeYIpDFgvJToQJg+83vm8zBbtnnQpDeef9xUw
debug1: setgroups() failed: Operation not permitted
debug1: rexec_argv[0]='/usr/sbin/sshd'
debug1: rexec_argv[1]='-D'
debug1: rexec_argv[2]='-f'
debug1: rexec_argv[3]='/dev/null'
debug1: rexec_argv[4]='-o'
debug1: rexec_argv[5]='LogLevel=DEBUG3'
debug1: rexec_argv[6]='-o'
debug1: rexec_argv[7]='HostKey=/tmp/pytest-of-runner/pytest-0/popen-gw0/test_exec_command0/sshd/ssh_host_rsa_key'
debug1: rexec_argv[8]='-o'
debug1: rexec_argv[9]='PidFile=/tmp/pytest-of-runner/pytest-0/popen-gw0/test_exec_command0/sshd/sshd.pid'
debug1: rexec_argv[10]='-o'
debug1: rexec_argv[11]='UsePAM=yes'
debug1: rexec_argv[12]='-o'
debug1: rexec_argv[13]='PasswordAuthentication=no'
debug1: rexec_argv[14]='-o'
debug1: rexec_argv[15]='ChallengeResponseAuthentication=no'
debug1: rexec_argv[16]='-o'
debug1: rexec_argv[17]='GSSAPIAuthentication=no'
debug1: rexec_argv[18]='-o'
debug1: rexec_argv[19]='StrictModes=no'
debug1: rexec_argv[20]='-o'
debug1: rexec_argv[21]='PermitEmptyPasswords=yes'
debug1: rexec_argv[22]='-o'
debug1: rexec_argv[23]='PermitRootLogin=yes'
debug1: rexec_argv[24]='-o'
debug1: rexec_argv[25]='Protocol=2'
debug1: rexec_argv[26]='-o'
debug1: rexec_argv[27]='HostbasedAuthentication=no'
debug1: rexec_argv[28]='-o'
debug1: rexec_argv[29]='IgnoreUserKnownHosts=yes'
debug1: rexec_argv[30]='-o'
debug1: rexec_argv[31]='Port=43559'
debug1: rexec_argv[32]='-o'
debug1: rexec_argv[33]='ListenAddress=127.0.0.1'
debug1: rexec_argv[34]='-o'
debug1: rexec_argv[35]='AuthorizedKeysFile=/tmp/pytest-of-runner/pytest-0/popen-gw0/test_exec_command0/sshd/authorized_keys'
debug1: rexec_argv[36]='-o'
debug1: rexec_argv[37]='AcceptEnv=LANG LC_*'
debug1: rexec_argv[38]='-o'
debug1: rexec_argv[39]='Subsystem=sftp internal-sftp'
Warning: Permanently added '[127.0.0.1]:43559' (RSA) to the list of known hosts.
- generated xml file: /home/runner/work/pylibssh/pylibssh/.test-results/pytest/results.xml -
============================= slowest 10 durations =============================
33.40s setup    tests/unit/scp_test.py::test_get
29.28s setup    tests/unit/scp_test.py::test_get_large
22.48s setup    tests/unit/channel_test.py::test_recv_eof
22.26s setup    tests/unit/scp_test.py::test_get_existing_local
16.19s setup    tests/unit/channel_test.py::test_channel_exit_status
12.88s setup    tests/unit/channel_test.py::test_send_eof
12.46s setup    tests/unit/scp_test.py::test_copy_from_non_existent_remote_path
12.35s setup    tests/unit/channel_test.py::test_request_exec
11.33s setup    tests/unit/channel_test.py::test_send_signal
11.08s setup    tests/unit/channel_test.py::test_is_eof
=========================== short test summary info ============================
ERROR tests/unit/channel_test.py::test_exec_command - SystemError: <cyfunctio...
FAILED tests/unit/channel_test.py::test_exec_command - AssertionError: assert...
=============== 1 failed, 24 passed, 1 error in 67.99s (0:01:07) ===============

'Ref: https://github.com/ansible/pylibssh/issues/57',
strict=False,
)
@pytest.mark.forked
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jakuje did you mean to also delete this @pytest.mark.forked marker? It was isolating this test from the rest.

If it is no longer needed indeed, we should also drop pytest-forked from all the places: https://github.com/search?q=repo%3Aansible%2Fpylibssh%20forked&type=code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the forked was needed because of the random segfaults. Most of the tests work without that, but indeed not all. The commit cc6bf63 introducing this line does not show why it was needed. With "forked", its impossible to see the errors from that test, which complicates debugging. Let me try with that again ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it crashes only in rpm build for some reason. Is there something specific in the rpm builds that is different from the manylinux build? It fails consistently with different fedora versions, which means different OpenSSH and libssh versions ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jakuje the manylinux wheels bundle a statically compiled libssh 0.9.6, while RPMs link against the platform-provided libssh build. According to https://download.copr.fedorainfracloud.org/results/packit/ansible-pylibssh-658/fedora-39-x86_64/08285674-python-ansible-pylibssh/builder-live.log.gz, F39 has libssh-devel-0.10.6-2.fc39.x86_64 on Packit.

I see that the RPM tests don't segfault, but the exec_command() output is an empty string for some reason:

+ /usr/bin/python3 -m tox --current-env -q --recreate -e just-pytest -- --deselect tests/unit/scp_test.py::test_get --deselect tests/unit/scp_test.py::test_put
============================= test session starts ==============================
platform linux -- Python 3.12.7, pytest-7.3.2, pluggy-1.2.0
cachedir: .tox/just-pytest/.pytest_cache
rootdir: /builddir/build/BUILD/ansible-pylibssh-1.2.3.dev75+g566ab40
configfile: pytest.ini
testpaths: tests/
plugins: cov-4.0.0, xdist-3.3.1, forked-1.6.0
created: 4/4 workers
4 workers [21 items]

....F................                                                    [100%]
=================================== FAILURES ===================================
______________________________ test_exec_command _______________________________
[gw0] linux -- Python 3.12.7 /usr/bin/python3

ssh_channel = <pylibsshext.channel.Channel object at 0x7f61b14bfc80>

    def test_exec_command(ssh_channel):
        """Test getting the output of a remotely executed command."""
        u_cmd_out = ssh_channel.exec_command('echo -n Hello World').stdout.decode()
        assert u_cmd_out == u'Hello World'  # noqa: WPS302
        # Test that repeated calls to exec_command do not segfault.
    
        # NOTE: Call `exec_command()` once again from another function to
        # NOTE: force it to happen in another place of the call stack,
        # NOTE: making sure that the context is different from one in this
        # NOTE: this test function. The resulting call stack will end up
        # NOTE: being more random.
>       exec_second_command(ssh_channel)

ssh_channel = <pylibsshext.channel.Channel object at 0x7f61b14bfc80>
u_cmd_out  = 'Hello World'

tests/unit/channel_test.py:54: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

ssh_channel = <pylibsshext.channel.Channel object at 0x7f61b14bfc80>

    @pytest.mark.forked
    def exec_second_command(ssh_channel):
        """Check the standard output of ``exec_command()`` as a string."""
        u_cmd_out = ssh_channel.exec_command('echo -n Hello Again').stdout.decode()
>       assert u_cmd_out == u'Hello Again'  # noqa: WPS302
E       AssertionError: assert '' == 'Hello Again'
E         - Hello Again

ssh_channel = <pylibsshext.channel.Channel object at 0x7f61b14bfc80>
u_cmd_out  = ''

tests/unit/channel_test.py:40: AssertionError
---------------------------- Captured stdout setup -----------------------------
Generating public/private rsa key pair.
Your identification has been saved in /tmp/pytest-of-mockbuild/pytest-0/popen-gw0/test_exec_command0/sshd/ssh_client_rsa_key
Your public key has been saved in /tmp/pytest-of-mockbuild/pytest-0/popen-gw0/test_exec_command0/sshd/ssh_client_rsa_key.pub
The key fingerprint is:
SHA256:cEFCs7vu2RVSC89QLOat1Ug6lxeHFYyqHL4gnbpM24w ansible-pylibssh integration tests key
The key's randomart image is:
+---[RSA 8192]----+
|     .+.o..  =o. |
|       +ooo + o  |
|      oo+=.= o   |
|       +=*B.o    |
|     ..+SB=.     |
|    . +.=. .     |
|    .o.. ..      |
|   o.* o..       |
|    Eo* .        |
+----[SHA256]-----+
Generating public/private rsa key pair.
Your identification has been saved in /tmp/pytest-of-mockbuild/pytest-0/popen-gw0/test_exec_command0/sshd/ssh_host_rsa_key
Your public key has been saved in /tmp/pytest-of-mockbuild/pytest-0/popen-gw0/test_exec_command0/sshd/ssh_host_rsa_key.pub
The key fingerprint is:
SHA256:SdrScQOGHGawqyEUKm1dr8bBNtUn7KIbB4jV4ruWiII mockbuild@fd1b674cd54f4b538c6ebdb2012879fa
The key's randomart image is:
+---[RSA 3072]----+
| .  .+=o+o       |
|....o*++ .+ .    |
|o.o+oo= +.oo     |
|o.. o+.O.+..     |
|. . ..*oS.       |
| . o..o..        |
|.... o +         |
|E . + .          |
|.  .             |
+----[SHA256]-----+
---------------------------- Captured stderr setup -----------------------------
debug2: load_server_config: filename /dev/null
debug2: load_server_config: done config len = 1
debug2: parse_server_config_depth: config /dev/null len 1
debug1: sshd version OpenSSH_9.3, OpenSSL 3.1.4 24 Oct 2023
ssh: connect to host 127.0.0.1 port 45793: Connection refused
ssh: connect to host 127.0.0.1 port 45793: Connection refused
ssh: connect to host 127.0.0.1 port 45793: Connection refused
debug1: private host key #0: ssh-rsa SHA256:SdrScQOGHGawqyEUKm1dr8bBNtUn7KIbB4jV4ruWiII
debug1: setgroups() failed: Operation not permitted
debug1: rexec_argv[0]='/usr/sbin/sshd'
debug1: rexec_argv[1]='-D'
debug1: rexec_argv[2]='-f'
debug1: rexec_argv[3]='/dev/null'
debug1: rexec_argv[4]='-o'
debug1: rexec_argv[5]='LogLevel=DEBUG3'
debug1: rexec_argv[6]='-o'
debug1: rexec_argv[7]='HostKey=/tmp/pytest-of-mockbuild/pytest-0/popen-gw0/test_exec_command0/sshd/ssh_host_rsa_key'
debug1: rexec_argv[8]='-o'
debug1: rexec_argv[9]='PidFile=/tmp/pytest-of-mockbuild/pytest-0/popen-gw0/test_exec_command0/sshd/sshd.pid'
debug1: rexec_argv[10]='-o'
debug1: rexec_argv[11]='UsePAM=yes'
debug1: rexec_argv[12]='-o'
debug1: rexec_argv[13]='PasswordAuthentication=no'
debug1: rexec_argv[14]='-o'
debug1: rexec_argv[15]='ChallengeResponseAuthentication=no'
debug1: rexec_argv[16]='-o'
debug1: rexec_argv[17]='GSSAPIAuthentication=no'
debug1: rexec_argv[18]='-o'
debug1: rexec_argv[19]='StrictModes=no'
debug1: rexec_argv[20]='-o'
debug1: rexec_argv[21]='PermitEmptyPasswords=yes'
debug1: rexec_argv[22]='-o'
debug1: rexec_argv[23]='PermitRootLogin=yes'
debug1: rexec_argv[24]='-o'
debug1: rexec_argv[25]='Protocol=2'
debug1: rexec_argv[26]='-o'
debug1: rexec_argv[27]='HostbasedAuthentication=no'
debug1: rexec_argv[28]='-o'
debug1: rexec_argv[29]='IgnoreUserKnownHosts=yes'
debug1: rexec_argv[30]='-o'
debug1: rexec_argv[31]='Port=45793'
debug1: rexec_argv[32]='-o'
debug1: rexec_argv[33]='ListenAddress=127.0.0.1'
debug1: rexec_argv[34]='-o'
debug1: rexec_argv[35]='AuthorizedKeysFile=/tmp/pytest-of-mockbuild/pytest-0/popen-gw0/test_exec_command0/sshd/authorized_keys'
debug1: rexec_argv[36]='-o'
debug1: rexec_argv[37]='AcceptEnv=LANG LC_*'
debug1: rexec_argv[38]='-o'
debug1: rexec_argv[39]='Subsystem=sftp internal-sftp'
Warning: Permanently added '[127.0.0.1]:45793' (RSA) to the list of known hosts.
- generated xml file: /builddir/build/BUILD/ansible-pylibssh-1.2.3.dev75+g566ab40/.test-results/pytest/results.xml -
============================= slowest 10 durations =============================
28.25s setup    tests/unit/channel_test.py::test_double_close
24.28s setup    tests/integration/sshd_test.py::test_sshd_addr_fixture_port
22.69s setup    tests/unit/channel_test.py::test_send_signal
20.16s setup    tests/unit/channel_test.py::test_request_exec
17.74s setup    tests/unit/channel_test.py::test_channel_exit_status
13.68s setup    tests/unit/channel_test.py::test_read_bulk_response
13.57s setup    tests/unit/channel_test.py::test_recv_eof
7.16s setup    tests/unit/channel_test.py::test_send_eof
6.38s setup    tests/unit/sftp_test.py::test_make_sftp
5.75s setup    tests/unit/sftp_test.py::test_put
=========================== short test summary info ============================
FAILED tests/unit/channel_test.py::test_exec_command - AssertionError: assert...
======================== 1 failed, 20 passed in 51.25s =========================

Interestingly, we have RPM tests both in Packit and GHA. And the GHA ones pass while the Packit ones fail. The GHA build logs reveal the same libssh version being used.

So there must be something additional. The entire GHA pipeline builds an sdist in one job and then, all the following builds happen from that sdist. Packit also builds an sdist first and feeds that to an RPM spec. However, it does not reuse the sdist from GHA so that build is separate. Still, I can't imagine it causing this problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for having a look into that! I am wondering if it could be just a timing issue, that the runners in one case just return from exec_command() without the callbacks being able to kick in and record the stdout. There is the ssh_channel_get_exit_status(), which waits with default timeout, but the exit code is not checked in the test. Let see if the exit code is recorded correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, the exit status looks ok.

Previously, stack-local variables hold the callbacks. This
worked as long as the call stack was not overridden by other
function calls or as long as the server closed the channel
very early. In other cases, the delayed close in libssh
could result in calling the callbacks even after we believed
the channel was freed, causing invalid memory access and crashes.

Fixes ansible#57

Signed-off-by: Jakub Jelen <[email protected]>
Removes the failed marker as it works now.

Signed-off-by: Jakub Jelen <[email protected]>
Signed-off-by: Jakub Jelen <[email protected]>
Copy link

sonarcloud bot commented Nov 25, 2024

@Jakuje
Copy link
Contributor Author

Jakuje commented Nov 25, 2024

I have the feeling there is some issue in how the references are counted or some timing/synchronization issue. The current code works for me locally as well as in manylinux, but fails in rpm build of Fedora 40+ fails already with the first command:

>       assert u_cmd.stdout.decode() == u'Hello World'  # noqa: WPS302
E       AssertionError: assert '' == 'Hello World'
E         - Hello World

but on Fedora 39 fails on the second one:

>       assert u_cmd.stdout.decode() == u'Hello Again'  # noqa: WPS302
E       AssertionError: assert '' == 'Hello Again'
E         - Hello Again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bot:chronographer:provided There is a change note present in this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Flaky SEGFAULT while testing exec_command
2 participants