Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guacamole user sync timeout #2335

Open
5 tasks done
JimMadge opened this issue Dec 10, 2024 · 4 comments
Open
5 tasks done

Guacamole user sync timeout #2335

JimMadge opened this issue Dec 10, 2024 · 4 comments
Labels
bug Problem when deploying a Data Safe Haven.

Comments

@JimMadge
Copy link
Member

JimMadge commented Dec 10, 2024

✅ Checklist

  • I have searched open and closed issues for duplicates.
  • This is a problem observed when managing a Data Safe Haven.
  • I can reproduce this with the latest version.
  • I have read through the documentation.
  • This isn't an open-ended question (open a discussion if it is).

💻 System information

  • Operating System:
  • Data Safe Haven version: 5.2.0

🚫 Describe the problem

The Guacamole user sync container may encounter a timeout, at which point it seems to stall and not crash/restart/try to sync again.
This seems to happen sometimes on first deployment.

2024-12-10 09:09:03 [ERROR   ] Exception during reset or similar
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/sqlalchemy/pool/base.py", line 986, in _finalize_fairy
    fairy._reset(
  File "/usr/local/lib/python3.11/dist-packages/sqlalchemy/pool/base.py", line 1432, in _reset
    pool._dialect.do_rollback(self)
  File "/usr/local/lib/python3.11/dist-packages/sqlalchemy/engine/default.py", line 699, in do_rollback
    dbapi_connection.rollback()
  File "/usr/local/lib/python3.11/dist-packages/psycopg/connection.py", line 261, in rollback
    self.wait(self._rollback_gen())
  File "/usr/local/lib/python3.11/dist-packages/psycopg/connection.py", line 394, in wait
    return waiting.wait(gen, self.pgconn.socket, interval=interval)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "psycopg_c/_psycopg/waiting.pyx", line 213, in psycopg_c._psycopg.wait_c
  File "/usr/local/lib/python3.11/dist-packages/psycopg/_connection_base.py", line 602, in _rollback_gen
    yield from self._exec_command(b"ROLLBACK")
  File "/usr/local/lib/python3.11/dist-packages/psycopg/_connection_base.py", line 469, in _exec_command
    result = (yield from generators.execute(self.pgconn))[-1]
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "psycopg_c/_psycopg/generators.pyx", line 122, in execute
  File "psycopg_c/_psycopg/generators.pyx", line 176, in fetch_many
  File "psycopg_c/_psycopg/generators.pyx", line 229, in fetch
psycopg.OperationalError: consuming input failed: could not receive data from server: Connection timed out
SSL SYSCALL error: Connection timed out

Steps to reproduce

  1. Deploy a brand new SRE, and add a new user to it. We've seen it happening on the green SHM.
  2. Use this user to access the brand new SRE, via Guacamole. There won't be any SRD available:
    image
  3. Changing browsers or using Incognito tabs won't help.
  4. Going to portal, and checking the logs of the container guacamole-user-sync shows the following in logs:
2024-12-10 09:09:03 [ERROR   ] Exception during reset or similar
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/sqlalchemy/pool/base.py", line 986, in _finalize_fairy
    fairy._reset(
  File "/usr/local/lib/python3.11/dist-packages/sqlalchemy/pool/base.py", line 1432, in _reset
    pool._dialect.do_rollback(self)
  File "/usr/local/lib/python3.11/dist-packages/sqlalchemy/engine/default.py", line 699, in do_rollback
    dbapi_connection.rollback()
  File "/usr/local/lib/python3.11/dist-packages/psycopg/connection.py", line 261, in rollback
    self.wait(self._rollback_gen())
  File "/usr/local/lib/python3.11/dist-packages/psycopg/connection.py", line 394, in wait
    return waiting.wait(gen, self.pgconn.socket, interval=interval)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "psycopg_c/_psycopg/waiting.pyx", line 213, in psycopg_c._psycopg.wait_c
  File "/usr/local/lib/python3.11/dist-packages/psycopg/_connection_base.py", line 602, in _rollback_gen
    yield from self._exec_command(b"ROLLBACK")
  File "/usr/local/lib/python3.11/dist-packages/psycopg/_connection_base.py", line 469, in _exec_command
    result = (yield from generators.execute(self.pgconn))[-1]
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "psycopg_c/_psycopg/generators.pyx", line 122, in execute
  File "psycopg_c/_psycopg/generators.pyx", line 176, in fetch_many
  File "psycopg_c/_psycopg/generators.pyx", line 229, in fetch
psycopg.OperationalError: consuming input failed: could not receive data from server: Connection timed out
SSL SYSCALL error: Connection timed out
  1. Restarting the -container-group-remote-desktop containers solves the problem.

🚂 Workarounds or solutions

Restarting the container group can fix the problem. Presumably because Entra/Apricot are now ready to receive requests

@JimMadge JimMadge added the bug Problem when deploying a Data Safe Haven. label Dec 10, 2024
@JimMadge
Copy link
Member Author

@cptanalatriste anything to add here?

@cptanalatriste
Copy link
Contributor

@JimMadge I added some steps to reproduce

@cptanalatriste

This comment was marked as off-topic.

@JimMadge

This comment was marked as off-topic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Problem when deploying a Data Safe Haven.
Projects
None yet
Development

No branches or pull requests

2 participants