Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scrubber seems to be breaking service #14

Open
Gordonei opened this issue Sep 30, 2020 · 4 comments
Open

Scrubber seems to be breaking service #14

Gordonei opened this issue Sep 30, 2020 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@Gordonei
Copy link
Contributor

Certain values to the scrub path result in a 500 error.

Command to reproduce:

bin/cogpn-process-address.py --address "Site C, Khayelitsha" --username test-user --password <REDACTED> --verbose

Output:

cape_of_good_place_names_client.exceptions.ServiceException: (500)
Reason: INTERNAL SERVER ERROR
HTTP response headers: HTTPHeaderDict({'Date': 'Wed, 30 Sep 2020 06:17:53 GMT', 'Content-Type': 'application/problem+json', 'Content-Length': '251', 'Connection': 'keep-alive', 'X-Request-ID': 'd18d0607-36a6-456c-a355-43a8d1c0ddb7', 'Server': 'Werkzeug/0.16.1 Python/3.6.12'})
HTTP response body: {
  "detail": "The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.",
  "status": 500,
  "title": "Internal Server Error",
  "type": "about:blank"
}
@Gordonei Gordonei added the bug Something isn't working label Sep 30, 2020
@Gordonei Gordonei self-assigned this Sep 30, 2020
@Gordonei
Copy link
Contributor Author

Rather worryingly, doesn't look like anything is showing up in the CW logs.

My current theory is that the scrub operation is failing, and the error isn't being caught correctly

@ColinAnthony
Copy link

ColinAnthony commented Sep 30, 2020

Oude Molen Road and Ndabeni works, but Oude Molen Road doesn't

similarly for other variants:

Peak Rd and Plantation works but not Peak Rd

interestingly SiteC - Khayelitsha doesn't work but Site C - Khayelitsha does
also

SiteC, Khayelitsha fails, but SiteC , Khayelitsha works

@Gordonei
Copy link
Contributor Author

Tracked down the missing logs - it is a failure in the PHDC scrubber:

[2020-09-30T06:57:51+0000]-[PID:1]-[RID:ec124a01-48bb-4f96-9796-6f693b475272] authorization_controller.check_basicAuth [DEBUG]: Checking 'city-user'...
[2020-09-30T06:57:51+0000]-[PID:1]-[RID:ec124a01-48bb-4f96-9796-6f693b475272] authorization_controller.check_basicAuth [DEBUG]: user_auth_check='True'
[2020-09-30T06:57:51+0000]-[PID:1]-[RID:ec124a01-48bb-4f96-9796-6f693b475272] scrub_controller.scrub [INFO]: Scrubb[ing]...
[2020-09-30T06:57:51+0000]-[PID:1]-[RID:ec124a01-48bb-4f96-9796-6f693b475272] scrub_controller.scrub [DEBUG]: address='melck road'
[2020-09-30T06:57:51+0000]-[PID:1]-[RID:ec124a01-48bb-4f96-9796-6f693b475272] BasicScrubber.scrub [DEBUG]: Received address 'melck road'
[2020-09-30T06:57:51+0000]-[PID:1]-[RID:ec124a01-48bb-4f96-9796-6f693b475272] BasicScrubber.scrub [DEBUG]: address after strip: 'melck road'
[2020-09-30T06:57:51+0000]-[PID:1]-[RID:ec124a01-48bb-4f96-9796-6f693b475272] BasicScrubber.scrub [DEBUG]: address after value injection: melck road, Western Cape, South Africa
[2020-09-30T06:57:51+0000]-[PID:1]-[RID:ec124a01-48bb-4f96-9796-6f693b475272] PhdcScrubber.scrub [DEBUG]: Scrubbing address 'melck road'
[2020-09-30T06:57:51+0000]-[PID:1]-[RID:ec124a01-48bb-4f96-9796-6f693b475272] PhdcScrubber.get_street_info [DEBUG]: 'ROAD' in 'melck road'
[2020-09-30T06:57:51+0000]-[PID:1]-[RID:ec124a01-48bb-4f96-9796-6f693b475272] PhdcScrubber.get_street_info [DEBUG]: potential_street_number='None'
[2020-09-30T06:57:51+0000]-[PID:1]-[RID:ec124a01-48bb-4f96-9796-6f693b475272] PhdcScrubber.scrub [DEBUG]: street_type='ROAD', street_number='', address_string='MELCK ROAD'
[2020-09-30T06:57:51+0000]-[PID:1]-[RID:ec124a01-48bb-4f96-9796-6f693b475272] PhdcScrubber.scrub [DEBUG]: postcode=None
[2020-09-30T06:57:51+0000]-[PID:1]-[RID:ec124a01-48bb-4f96-9796-6f693b475272] app.log_exception [ERROR]: Exception on /v1/scrub [GET]
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 2447, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1952, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1821, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/usr/local/lib/python3.6/site-packages/flask/_compat.py", line 39, in reraise
    raise value
  File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1950, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1936, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/usr/local/lib/python3.6/site-packages/connexion/decorators/decorator.py", line 48, in wrapper
    response = function(request)
  File "/usr/local/lib/python3.6/site-packages/connexion/decorators/security.py", line 300, in wrapper
    return function(request)
  File "/usr/local/lib/python3.6/site-packages/connexion/decorators/uri_parsing.py", line 143, in wrapper
    response = function(request)
  File "/usr/local/lib/python3.6/site-packages/connexion/decorators/validation.py", line 347, in wrapper
    return function(request)
  File "/usr/local/lib/python3.6/site-packages/connexion/decorators/parameter.py", line 126, in wrapper
    return function(**kwargs)
  File "/usr/src/app/cape_of_good_place_names/controllers/scrub_controller.py", line 30, in scrub
    for scrubber in scrubbers
  File "/usr/src/app/cape_of_good_place_names/controllers/scrub_controller.py", line 30, in <listcomp>
    for scrubber in scrubbers
  File "/usr/local/lib/python3.6/site-packages/phdc_scrubber/PhdcScrubber.py", line 283, in scrub
    address_words, address_string, self.address_lookup, postcode
  File "/usr/local/lib/python3.6/site-packages/phdc_scrubber/PhdcScrubber.py", line 173, in get_matches
    end_of_address_string = address_string[address_string.index(address_words[-4]):]
IndexError: list index out of range
[2020-09-30T06:57:51+0000]-[PID:1]-[RID:NA] _internal._log [INFO]: 172.31.9.76 - - [30/Sep/2020 06:57:51] "GET /v1/scrub?address=melck%20road HTTP/1.1" 500 -

@Gordonei
Copy link
Contributor Author

There are two issues here:

  1. There is an IndexError in geocode array - should just be fixed via a more careful application of min. More importantly, some failing cases should be added to the tests.
  2. An individual scrubber is breaking the overall scrub response - should this be allowed? In the geocoder case, the failure is properly trapped by the geocode-array library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants