-
-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deadlock on MQTT keepalive timeout #77
Comments
Thanks for sharing. Dead lockshould indeed not occur, so if it does, it is a bug. 1hour rule, is broker disconnecting you regardless of activity after 1h? |
Okay. it does vary somewhat when it times out, according to RabbitMQ logs:
Most of the times it is around 1 hour. Never longer.
I'm not sure if I understand your question correctly, but I keep publishing a message every 60 seconds, and the keepalive is set to 10 seconds (the same as the mqtt example of this library). So yes, there should still be activity on the connection when the broker disconnects the client. I have some debug logs of the mqtt client when 'disaster' strikes:
|
Hmm, after letting it sit in the deadlock state for another 10 minutes or so, it got out of it somehow:
Though ~10 minutes is quite long for recognizing it cannot interact with the MQTT server... |
It would also help to see raw AT commands data. with Echo enabled. Can you do it? |
Sure! Give me another hour as I have to wait before it times out again haha |
If you use termite terminal for UART, please enable timestamp for both, AT port and your library debug port, so that we can match timings from both logs. Attach both files |
Here are the logs with timestamp since boot for the ESP32: (I unfortunately don't have a way of directly monitoring UART between the ESP32 and SIM7000G today, the best I could come up with is let the ESP32 report the AT communications)
From this I think we can make the following conclusions:
|
This is funny part:
Before Send fail, there was CIPCLOSE command. Can you increase a timeout for lwgsm_conn_send API function, to higher value than written in the AT commands datasheet ? |
The I tried increasing the keepalive to 60 seconds, and that results into the library sometimes being able to recover from the disconnect:
|
Sorry for the delay. Quite some challenges these days. Have you been able to find out the root cause or still not? |
Hi there!
I experienced my MQTT publisher application hanging after it get's disconnected by the broker for "keepalive timeout". The application hangs on
lwgsm_mqtt_client_api_publish
.The line where I think the deadlock is: https://github.com/MaJerle/lwgsm/blob/94dfb5067234d0da257bceee01267cee8d0fa5de/lwgsm/src/apps/mqtt/lwgsm_mqtt_client_api.c#L411
I have a feeling it may be caused by the doubly taken semaphore, where the second one is the
lwgsm_sys_sem_wait
inlwgsm_mqtt_client_api_close
, but that's a complete guess.https://github.com/MaJerle/lwgsm/blob/94dfb5067234d0da257bceee01267cee8d0fa5de/lwgsm/src/apps/mqtt/lwgsm_mqtt_client_api.c#L318
Although the application shouldn't deadlock, I also could't figure out why it times out the connection for "keepalive timeout"... It seems to consistently occur after the connection is open for 1 hour. The PINGREQ packets seem to receive the PINGRESP correctly, and also the PUBLISH packets seem to receive the PUBACK correctly. Does anything else need to happen?
System information:
System: FreeRTOS
MQTT broker: RabbitMQ
Code to reproduce:
Let me know your thoughts!
The text was updated successfully, but these errors were encountered: