Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[query] Failures to communicate with the spark/local backend result in cryptic error message #14557

Open
daniel-goldstein opened this issue May 22, 2024 · 0 comments

Comments

@daniel-goldstein
Copy link
Contributor

What happened?

Hail propagates nicely explained error messages from java to python when an exception is thrown in the user's pipeline. However, the hail python front end does not handle a situation where the java backend disappears entirely, which can happen in the case of an OOM killer killing the JVM. The result is an error as seen below. In such a scenario, the python front end should add a useful message suggesting that the backend is not reachable and might have run out of memory.

Version

0.2.130

Relevant log output

File ~/Library/Python/3.9/lib/python/site-packages/hail/table.py:2814, in Table.collect(self, _localize, _timed)
2812 e = construct_expr(rows_ir, hl.tarray(t.row.dtype))
2813 if _localize:
→ 2814 return Env.backend().execute(e._ir, timed=_timed)
2815 else:
2816 return e

File ~/Library/Python/3.9/lib/python/site-packages/hail/backend/backend.py:188, in Backend.execute(self, ir, timed)
186 payload = ExecutePayload(self._render_ir(ir), ‘{“name”:“StreamBufferSpec”}’, timed)
187 try:
→ 188 result, timings = self._rpc(ActionTag.EXECUTE, payload)
189 except FatalError as e:
190 raise e.maybe_user_error(ir) from None

File ~/Library/Python/3.9/lib/python/site-packages/hail/backend/py4j_backend.py:218, in Py4JBackend._rpc(self, action, payload)
216 path = action_routes[action]
217 port = self._backend_server_port
→ 218 resp = self._requests_session.post(f’http://localhost:{port}{path}', data=data)
219 if resp.status_code >= 400:
220 error_json = orjson.loads(resp.content)

File ~/Library/Python/3.9/lib/python/site-packages/requests/sessions.py:637, in Session.post(self, url, data, json, **kwargs)
626 def post(self, url, data=None, json=None, **kwargs):
627 r""“Sends a POST request. Returns :class:Response object.
628
629 :param url: URL for the new :class:Request object.
(…)
634 :rtype: requests.Response
635 “””
→ 637 return self.request(“POST”, url, data=data, json=json, **kwargs)

File ~/Library/Python/3.9/lib/python/site-packages/requests/sessions.py:589, in Session.request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)
584 send_kwargs = {
585 “timeout”: timeout,
586 “allow_redirects”: allow_redirects,
587 }
588 send_kwargs.update(settings)
→ 589 resp = self.send(prep, **send_kwargs)
591 return resp

File ~/Library/Python/3.9/lib/python/site-packages/requests/sessions.py:703, in Session.send(self, request, **kwargs)
700 start = preferred_clock()
702 # Send the request
→ 703 r = adapter.send(request, **kwargs)
705 # Total elapsed time of the request (approximately)
706 elapsed = preferred_clock() - start

File ~/Library/Python/3.9/lib/python/site-packages/requests/adapters.py:501, in HTTPAdapter.send(self, request, stream, timeout, verify, cert, proxies)
486 resp = conn.urlopen(
487 method=request.method,
488 url=url,
(…)
497 chunked=chunked,
498 )
500 except (ProtocolError, OSError) as err:
→ 501 raise ConnectionError(err, request=request)
503 except MaxRetryError as e:
504 if isinstance(e.reason, ConnectTimeoutError):
505 # TODO: Remove this in 3.0.0: see #2811

ConnectionError: (‘Connection aborted.’, RemoteDisconnected(‘Remote end closed connection without response’))
@daniel-goldstein daniel-goldstein added enhancement needs-triage A brand new issue that needs triaging. query snack labels May 22, 2024
@daniel-goldstein daniel-goldstein removed the needs-triage A brand new issue that needs triaging. label Jun 3, 2024
@daniel-goldstein daniel-goldstein self-assigned this Jun 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant