-
-
Notifications
You must be signed in to change notification settings - Fork 462
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stream implementation is inconvenient #1025
Comments
I love this suggestion. Correct me if I'm wrong, but wouldn't there be no need to put the iterators for submission and comment streams inside the try/except. E.g., would this work instead? while True:
for submission in submission_stream:
if submission is None:
break
try:
print(submission)
...
except Exception:
# submission processing exception
for comment in comment_stream:
if comment is None:
break
try:
print(comment)
...
except Exception:
# comment processing exception My only other concern has to do with the asynchronous nature of the approach. I personally like it, but I worry it might be a bit confusing to new-to-programming users. That said, perhaps how it handles exceptions makes it actually easier to those users, and only the users who care enough to do something with the exceptions need to figure it out. Thoughts? Thanks for the suggestions. |
Yes, there’s no longer a strong need to do exception handling on the outside of the iterators anymore, and it may make a little more sense to have the try/except on the inside of each stream loop. Where the exception handling written here isn’t too important… the way I see it, having such a generic try-catch at this level is simply to have a last ditch effort to keep the bot alive if something goes wrong. More localised exception handing is, of course, encouraged as soon as those sources can be identified. I don’t think exception handling this high up should need to be shown in the docs anyway (to reduce distractions), but if so, I’m perfectly comfortable with either way you want to write it. On one hand, the decision to do fetching in a separate thread isn’t completely necessary, because, when multi streaming, if one of the streams holds up the other then it’s likely the other will have fetch problems as well. But I like having this threading arrangement because it allows for a degree of control over how long (via As far as I can tell, knowing that the fetch happens in parallel isn’t going to change how anyone would write things. But one minor way I know this particular threading approach could make an impression is if the user has really bad internet and they decide to control-c the script. It could take a moment for the script to end, unless a second control-c is issued. A second control-c would end things immediately, but probably not on Python 2. On Python 2 you’d be forced to wait. I’ve only test new The new In other progress efforts, I’ve tried to tinker around with the |
While I don't have time to fully respond I can respond to this statement:
The only reason for limit is to attempt to break any caches. The thought being that adjusting that parameter will result in a fresh response. While I'm not looking at the code, I think it's especially important when the before parameter doesn't change because identical requests would otherwise be re-issued which might hit a cache. In addition to narrowing in the before parameter for the purpose of breaking caches, using it means less data needs to be received when it works. Unfortunately, it isn't 100% reliable, as it's not possible to distinguish between an empty response meaning there are no more new items, or an empty response meaning that the item identified by the I hope that help explain it. |
Thanks @bboe, that information has really helped. So the idea essentially is to use limit adjustingCurrent behaviourA New behaviourTo dissuade the potential for cached results, I’ve made the value of RationaleThere’s a greater chance to break caches if we alter before adjustingCurrent behaviourThe value of Some notes about this behaviour:
New behaviourI’ve bound the value of Rationale[edit: outdated information in this section] In the current system, if a listing goes silent then up to 100 (100 to 70) older items will still get pulled in from reddit. As a benchmark, the most inefficient approach, in terms of data consumption, would be to take 100 (old) items each time. As explained by @bboe this is done this way because it’s very difficult to tell if an item has been deleted, removed, or fallen off the top 1000 listing due to new entries. A value of To circumvent the potential that an item has been deleted or removed, after each request the stream picks a new random item from the last 45 to 18 seen entries and uses it for the value of Dealing with the potential that all 45 to 18 last seen items have fallen off the top 1000 is another matter. Although this may seem like an extremely unlikely case, if the bot has a really bad connection and disconnects from the internet for a while, and it’s streaming a very active reddit listing, then there is a possibility that all those 45 to 18 items have fallen off the top 1000 listing by the time the bot reconnects. Since we can’t assume the bot will always have a stable internet connection, to address with this problem, Failing any of this, I’ve written in a 1/200 chance that All this work should hopefully amount to about a 66% overall reduction in data consumption and unnecessary seen item checking. concurrent.futuresI’ve realised that the BoundedSetSince I needed to index into The constructor’s signature has changed but I made sure it’s backward compatible with the old By the way, I’ve added support for |
I was thinking of adding an option to suppress errors, but then we come to the error where someone tries streaming, e.g. comments of a private subreddit they lost access to, which would just go into an infinite loop, so maybe we should have an internal counter of exceptions, and throw an exception if X amount of errors happen in Y seconds. |
Hi, I am having the same problem. The streams stop if an exception occurs and the workaround to create the stream again in the exception which is not a very good solution. PRAW stream generator doesn't start after an exception occurs |
This issue is stale because it has been open for 20 days with no activity. Remove the Stale label or comment or this will be closed in 10 days. |
This issue was closed because it has been stale for 10 days with no activity. |
I would like to revisit the suggested implementation. Is this something you'd still like to see merged in @Pyprohly? |
This issue was closed because it has been stale for 10 days with no activity. |
This issue is stale because it has been open for 20 days with no activity. Remove the Stale label or comment or this will be closed in 10 days. |
This issue is stale because it has been open for 20 days with no activity. Remove the Stale label or comment or this will be closed in 10 days. |
This issue is stale because it has been open for 20 days with no activity. Remove the Stale label or comment or this will be closed in 10 days. |
This issue is stale because it has been open for 20 days with no activity. Remove the Stale label or comment or this will be closed in 10 days. |
This issue is stale because it has been open for 20 days with no activity. Remove the Stale label or comment or this will be closed in 10 days. |
This issue is stale because it has been open for 20 days with no activity. Remove the Stale label or comment or this will be closed in 10 days. |
the way I am handling exceptions in my bots is by catching them before they propagate to the stream, and this by monkey patching the import time
from prawcore.exceptions import BadJSON, RequestException, ServerError, TooManyRequests
from prawcore.sessions import Session
original_session_request = Session.request
def patched_session_request(*args, **kwargs) -> Any:
while True:
try:
return original_session_request(*args, **kwargs)
except (RequestException, TooManyRequests, BadJSON, ServerError) as e:
print(e)
time.sleep(5)
Session.request = patched_session_request |
I have just created my own stream using .new method. I can share the example code if you still need it. |
Many reddit bots operate as a service that needs to stream in new content in real time. Often, we want the bot to be unkillable: if it disconnects from the internet it should try to reconnect continuously; if it finds that it cannot handle a particular item, it’s probably in its best interest to just ignore it. We want the bot to stay alive as much as possible.
To facilitate streaming, PRAW provides a high level streaming function which lets a bot creator focus more on bot behaviour and less on filtering older or already seen items. Unfortunately, due to a lack of exception handling, the stream generator frequently breaks: when an exception is raised in a generator the generator breaks by issuing a
StopIteration
on all further attempts to yield items from it.In the case of
stream_generator
if the stream dies it means the bot can no longer continue running its service if it doesn’t recreate the stream.Current approaches to streaming
Ideally, a stream object should only need to be created once…
However, if we go about this approach under the current stream implementation then we’d eventually find this to be an unstable setup, because an exception in the stream would bring things to a stop.
It’s currently more viable to set things up this way:
The bot doesn’t break so easily now because if the stream breaks it’ll just be recreated. The bot is stable and code is manageable so far.
But what if we want to do a double stream? Would a similar approach work?
Turns out the same strategy won’t work here; both streams would yield
None
the whole time. If we try to fix this by removing theskip_existing=True
then suddenly we’d be dealing with old and duplicate items, which is something that the stream is supposed to be handling for us. We could go back to defining the streams outside the loop, but then we’d face the same problem we had before, where an exception could easily break things.There are two real solutions here:
Clearly, these are all terrible workarounds. If we want something better, the stream’s implementation has to change.
Designing a better streaming solution
The main problem with our current stream generator is its lack of exception handling.
Since there’s no way to intercept an exception thrown in a generator in the consumer code without the generator breaking, exception handling needs to be written within the generator. At the same time we don’t want the exception handling logic to be predetermined. It’s important that we give the user a way to listen to exceptions that come from the stream. Given
stream_generator
’s current implementation, this may require more than minor changes to support.If we’re going to change streaming now, there are other inconveniences about the current streaming system that we may as well address…
Since the stream generator is intended to aid in bot making, we’d want to ensure that a new streaming object, if made, would have characteristics that maximises its usefulness in bot making. Namely,
skip_existing=True
by default).With all this in mind, this is how I envision an ideal streaming program to look:
If this looks any bit promising, please see and try out my
Stream
class draft here.The text was updated successfully, but these errors were encountered: