Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Site connection exceptions #127

Closed
EGI-ILM opened this issue Mar 1, 2022 · 13 comments
Closed

Site connection exceptions #127

EGI-ILM opened this issue Mar 1, 2022 · 13 comments

Comments

@EGI-ILM
Copy link
Contributor

EGI-ILM commented Mar 1, 2022

There is a current issue in one of the sites, which is affecting the program operation:

$ fedcloud --version
fedcloud, version 1.2.14

$ fedcloud endpoint projects -a
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 445, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 440, in _make_request
...
File "/usr/lib/python3/dist-packages/requests/adapters.py", line 529, in send
    raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='dashboard2.ilifu.ac.za', port=5000): Read timed out. (read timeout=10)

$

I cannot retrieve any information because of this single site. I would suggest to capture the exception, continue processing the rest of the sites, and at the end provide the expected output with some warning about the failing sites (in stderr or some place where it can be easily removed if the output is later processed in a script).

@tdviet
Copy link
Owner

tdviet commented Mar 1, 2022

Sorry for misunderstanding your ticket. I am opening a new issue #128 for better description

@EGI-ILM
Copy link
Contributor Author

EGI-ILM commented Mar 1, 2022

No problem at all. What I am trying is to be able to work with Fedcloud even if one or more sites are unavailable, like in this case. I think it will be very beneficial if a single site cannot disrupt the program operation.

If it can help, I can attempt to do a PR myself if I have time, although I am not familiar with the code.

@tdviet
Copy link
Owner

tdviet commented Mar 1, 2022

Yes, the code should be error-proofing. I tried to catch all exceptions, but some of them still manage to leak out :-). Thank you for reporting.

Fixing it would be simple (adding Timeout exception to this line https://github.com/tdviet/fedcloudclient/blob/master/fedcloudclient/endpoint.py#L214). And some sensible error messages should be useful instead of ignoring the errors ("pass"). I will do it when I have time.

@EGI-ILM
Copy link
Contributor Author

EGI-ILM commented Mar 1, 2022

Thanks. I can give it a try tomorrow if you like. I just did a "quick-and-dirty" solution at the code by adding:

            except (requests.exceptions.ReadTimeout) as e:
                print('site is not accessible: ', site_ep[1])
                pass

At https://github.com/tdviet/fedcloudclient/blob/master/fedcloudclient/endpoint.py#L190, just to have it working. Tomorrow I will look for a more elegant way to fix this...

@tdviet
Copy link
Owner

tdviet commented Mar 1, 2022

OMG. I made mistake twice :) in the haste (advised you to change line 214 instead line 190). Thank you for fixing this issue. Once you finish, just make a pull request and I will merge it to the main branch.

@EGI-ILM
Copy link
Contributor Author

EGI-ILM commented Mar 2, 2022

Well, in fact I assume both places should be modified the same way, but I would normally unify both methods (for example with an abstraction via a lambda in Java). I will look for the homologous in Python. I just addressed the one that affected my case.
I will have a look to it today

@EGI-ILM
Copy link
Contributor Author

EGI-ILM commented Mar 2, 2022

I have finished the change, and I will do a PR in a few hours.
I have some comments that you might find helpful:

  • I have made some amendments in the code to avoid code duplication. In my version, the get_projects_from_sites can be called as usual, and it will return the projects as a list by default. Alternatively, the projects can be retrieved as dict (I did not find anywhere in the code that this was used, though), but the common code is unique, so only one place must be modified in case of need. Additionally, if a new output format is desired, just creating a new function will suffice, the common code remains unaltered. This greatly improves maintainability.
  • I think it can be very beneficial to implement proper logging in the application
  • There are some sites using self-signed certificates. Whereas this is not ideal, it is the actual situation. It could be good to have an argument (e.g. -k like in curl) so the user can instruct fedcloud "OK, I know the risks, but please retrieve the information from sites that did not implement HTTPS correctly". Perhaps a sign can be added to the output, so the user knows exactly which sites did give a TSL problem.
  • As you said in Better exception catching #128, it will be good to review exception handling in the whole code

I hope I could help!

@EGI-ILM EGI-ILM mentioned this issue Mar 2, 2022
@EGI-ILM
Copy link
Contributor Author

EGI-ILM commented Mar 2, 2022

I have seen that the problem with the certificates is explained here

So, probably, instead of my previous suggestion, the code will have to be modified to actually inform the user about this situation. The problem currently is that the access to the site failed silently and there is no way for the user to know about the TLS error message explained in the PPT. I can do a new PR if you do not have time.

@EGI-ILM
Copy link
Contributor Author

EGI-ILM commented Mar 3, 2022

I have made some new changes and I will do a new PR:

  • Fixed minor issues highlighted by Lint
  • Refactoring of some repeated string constants and error messages. This should be reviewed to be uniformly addressed in the project
  • Creation of a basic Exception for token errors, to replace the generic RuntimeError, so different errors can be addressed appropriately. This class can be extended in future and should be reviewed to be uniformly addressed in the project
  • Added message to explain why a connection to a site failed. This is temporary and should be improved in future.
  • Added from clause in SystemExit exception raised
  • Renamed (refactored) "list" function to "endpoint_list" as it was masking the built-in "list". In any case, this does not seem to be used in any part of the project, but confirmation is needed.

Please, review the last point, as I am not sure there this function is used somewhere.

@tdviet
Copy link
Owner

tdviet commented Mar 3, 2022

The "list" function in endpoint module is the "fedcloud endpoint list" command. By default, "click" lib uses function name as command name. If we want to change the function name, we need to configure to make "fedcloud endpoint list" to call the new function name

@EGI-ILM
Copy link
Contributor Author

EGI-ILM commented Mar 3, 2022

Oh I see. I knew that this could break some things.
Is it possible to use list externally (as a command option) but keep endpoint_list internally?

Perhaps something like:

@click.option('--format', '-f', 'format_arg_name')
def plug(format_arg_name):
    print(format_arg_name)

Source: https://stackoverflow.com/questions/35567753/python-click-library-rename-argument

@tdviet
Copy link
Owner

tdviet commented Mar 3, 2022

So far, I haven't found an elegant way to do it.

tdviet added a commit that referenced this issue Mar 4, 2022
@tdviet
Copy link
Owner

tdviet commented Mar 4, 2022

Temporary fixed by #129 and #131. Further discussion in #128 and #130

@tdviet tdviet closed this as completed Mar 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants