Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-44720: Support pickling of SlackException #256

Merged
merged 2 commits into from
Jun 8, 2024
Merged

DM-44720: Support pickling of SlackException #256

merged 2 commits into from
Jun 8, 2024

Conversation

rra
Copy link
Member

@rra rra commented Jun 7, 2024

Exception classes that call the BaseException.__init__ method (possibly via Exception) with a different number of arguments than the __init__ method of the derived exception class takes cannot be pickled and unpickled. Since arq uses pickle to record the exception of failed jobs, and we may want such exceptions to be derived from SlackException so that we can easily report them to Slack, this is an annoying limitation. The restriction is due to the properties of the default __reduce__ method on BaseException and probably cannot be changed in Python.

Work around this by using the pattern recommended at the end of the discussion in python/cpython#44791, namely avoid calling BaseException.__init__ entirely. This means we can no longer rely on the default __str__ behavior and must implement __str__.

Add a test that SlackException and classes derived from it with different numbers of constructor arguments can be pickled and unpickled without loss of the information that goes into to_slack.

@rra rra requested a review from jonathansick June 7, 2024 18:44
rra added 2 commits June 7, 2024 14:01
Exception classes that call the `BaseException.__init__` method
(possibly via `Exception`) with a different number of arguments than
the `__init__` method of the derived exception class takes cannot be
pickled and unpickled. Since arq uses pickle to record the exception
of failed jobs, and we may want such exceptions to be derived from
`SlackException` so that we can easily report them to Slack, this is
an annoying limitation. The restriction is due to the properties of
the default `__reduce__` method on `BaseException` and probably cannot
be changed in Python.

Work around this by using the pattern recommended at the end of the
discussion in python/cpython#44791, namely
avoid calling `BaseException.__init__` entirely. This means we can
no longer rely on the default `__str__` behavior and must implement
`__str__`.

Add a test that `SlackException` and classes derived from it with
different numbers of constructor arguments can be pickled and unpickled
without loss of the information that goes into `to_slack`.
Some of the parameters in the test for the Google Storage mock were
misspelled. This previously was not caught because that Python library
wasn't typed, but the latest version has types and mypy started
rejecting the misspelled arguments. Fix the spelling.
@rra rra force-pushed the tickets/DM-44720 branch from 88363ff to 6e7e3e3 Compare June 7, 2024 21:01
Copy link
Member

@jonathansick jonathansick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well this is quite interesting. I'll trust your research on this approach.

One thing I will note is that I hadn't considered sending Slack webhook messsages from the arq client rather than in the Arq worker function itself. In Noteburst, for instance, I catch any exceptions in the worker function exception itself and then send a Slack message in the exception handler before reraising the original exception (essentially doing my own version of SlackException). Noteburst also doesn't raise SlackExceptions from any code related to task services, so I guess that's why this hadn't come up. But in a codebase where everything is already a SlackException, I can see this being quite useful.

On the other hand, do you think it'd be useful for the arq workers to be sending the Slack messages rather than waiting for the client to consume the result?

@rra
Copy link
Member Author

rra commented Jun 7, 2024

I agree with that strategy, and largely that's what I'm doing, but the trick in my case is that the backend worker is running in a stack container and is supposed to have minimal dependencies. In order for it to send Slack messages directly, I would have to pull in Safir and and least httpx (and more than that if I didn't want to break Safir apart even further). Also, I want to do as little as possible in the backend worker since the developer has to supply that, and move as much shared code as possible to other places.

The strategy I'm using is that the backend worker raises exceptions derived from TaskError that are guaranteed to be possible to pickle, and then the database worker, which is responsible for gathering the arq results and stuffing them in the database and which is code that will be provided by Safir, is responsible for sending the Slack notifications. The documentation for anyone else writing one of these services can then just say to raise TaskError or one of its subclasses, and Slack error reporting happens for free. (Unless they use TaskUserError, which suppresses the reporting.) TaskError also has support for a few other fancy things, such as serializing the traceback so that it can be included in the Slack notification.

Originally, I implemented this workaround entirely in TaskError, and I could do the conversion to something Slack-reportable in the database worker, but it was more convenient if a TaskError was a SlackException from the start since the code was more straightforward. Since this deserialization support had to live somewhere, and putting it in SlackException meant that all inherited classes didn't need any further precautions, this seemed like the highest-leverage spot.

@rra rra merged commit 17720c6 into main Jun 8, 2024
6 checks passed
@rra rra deleted the tickets/DM-44720 branch June 8, 2024 01:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants