-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handling timeouts/disconnects, dropping some pushes? #47
Comments
I also recently switched, and I've noticed this as well. (at least, increased dropped notifications) I wasn't sure where to assign blame yet, as Apple doesn't guarantee delivery either. |
Hi Ben, The intent is when an error comes back from Apple, any notifications from that point on are resent. See: However, it only currently buffers 50 notifications, which may not be enough. @bdotdub is the original maintainer of the project, and I imagine he hasn't ran into this issue in production. I'm still trying to get my app ready for production, so you're further along than me. :-) |
Hi Nathan, Thanks for the response. That's true, but I'm pretty sure that there's also a scenario when Apple gets a certain kind of error that causes no response, just a silent disconnect (great API, I know), and the persistent connection used in this library doesn't handle that. |
It does have a retry loop (in RunLoop). But right now it's not exposing the reason why there was an error and is retrying. |
Hmm, I see that. I'm not seeing "EOF trying to write notification", or "err writing to apns" in my logs, so I wonder if there's a kind of error that doesn't break it out of the run loop, thus it gets hung in a state that drops pushes until restart. |
Unfortunately all I can suggest atm is to add more logging in a fork of timehop/apns. 😦 If you do end up trying @pranavraja's library, it would be great to hear how it goes. |
I'll help look into this. On Fri, May 15, 2015 at 12:48 PM, Nathan Youngman
|
We've actually only just ran into this issue last week. @finkel you're right re: error in the run loop. Doing some logging and debugging on our end, we've found that it gets stuck in some failure condition in these lines and can continually get an error, ie. never successfully reconnects: https://github.com/timehop/apns/blob/master/client.go#L132-L136 I can't tell if Apple is giving us a slap on the wrist for trying to connect to quickly. We added logging in that
I'm currently at a bit of a loss as to why it's happening. In the short term, we could put a log statement in there or pass the connection error back on the channel. In the longer term, we've been talking about doing a refactor to remove the @finkel @wfreeman how are you guys detecting dropped pushes? |
Thanks Benny. I think it makes sense to add this logging into master in the interim. Passing connection errors over FailedNotifs might be a bit odd right now though, since it's currently a chan of NotificationResults. |
@bdotdub we've only been detecting them anecdotally (reports from users) which definitely started when we switched to this library, and of course they never repro on staging. They're very hard to log on the client. But recently I saw some very specific times when I could confirm they were being dropped, so now I'm investigated... |
For me it's just from user reports. I'm not seeing them in the FailedNotifs unless I'm doing something wrong (I increment an error count when I find those, and log the error count). |
I'd prefer a callback, so I could do the logging (and other custom app error recording) myself! |
Reporting connection errors and read errors over a channel (FailedNotifs atm) seems like a reasonable solution as well. I'm curious how many messages are lost? The resend logic starts after a failed notification. What happens if the notification failed for reasons other than it was invalid? What happens if the error from Apple is Shutdown or (maybe) Processing error, or if the connection is just dropped? Does it (should it) resend the notification that "caused" the failure? |
One cause of EOF looping I've seen is expired certificates. The crypto/x509 package provides a Verify() method for certificates. If it returns an error of type x509.CertificateInvalidError, the Reason field will be set to x509.Expired if the certificate is expired. I'm not sure if it's a good idea to verify the certificates every time before connecting or just when first loaded. Verify() will also return errors of type x509.UnknownAuthorityError though, as the Apple certificate isn't in the system roots (and I haven't figured out what needs to be added to VerifyOptions.Roots). For now I just ignore this one. |
Since we switched to using this library in production, we've had many reports of dropped push notifications. I think it's because of the persistent connection, which may at times error and then disconnect. Sounds like they're trying to handle this in https://github.com/pranavraja/apns, though that looks like less mature.
Any thoughts on how to detect/avoid this, or any recommendations? Appreciate the library and the help.
Thanks,
Ben
The text was updated successfully, but these errors were encountered: