Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Something is wrong with MQTT #3356

Closed
chathurangawijetunge opened this issue Dec 28, 2020 · 6 comments
Closed

Something is wrong with MQTT #3356

chathurangawijetunge opened this issue Dec 28, 2020 · 6 comments
Labels

Comments

@chathurangawijetunge
Copy link

NodeMCU 3.0.0.0 built on nodemcu-build.com provided by frightanic.com
branch: dev
commit: 0fb2a12
release:
release DTS: 202012252235
SSL: false
build type: integer
LFS: 0x40000 bytes total capacity
modules: file,gpio,mqtt,net,node,rtctime,sjson,sntp,tmr,uart,wifi
build 2020-12-27 02:08 powered by Lua 5.1.4 on SDK 3.0.1-dev(fce080e)

with even the stranded example MQTT code, it act as the connection is okay, but do not receive subscribe messages, publish do also work as if it is normal but publish masses want reach the broker, no offline message is triggered.
this happens after running for long time (over 12 hours)

@chathurangawijetunge
Copy link
Author

chathurangawijetunge commented Dec 29, 2020

URL="broker"

m = mqtt.Client(node.chipid(),30,"user","pwd")
m:lwt("test/lwt","offline",0,1)   
m:on("connect", function(client) print ("connected") end)
m:on("connfail", function(client, reason) print ("connection failed", reason) end)
m:on("offline", function(client) print ("offline") start_mqtt() end)

m:on("message", function(client, topic, data)
  print(topic .. ":" )
  if data ~= nil then
    print(data)
  end
end)

m:on("overflow", function(client, topic, data)
  print(topic .. " partial overflowed message: " .. data )
end)

function start_mqtt()
 tmr.create():alarm(3000,0, function()
   m:connect(URL, 1883, false, function(client)
     print("connected")
     client:publish("test/lwt","online",0,1)
     client:subscribe("test/lwt",0,nil) 
     client:subscribe("test",0,nil)    
   end,
   function(client, reason)
     print("failed reason: " .. reason)
     start_mqtt()
   end)
 end)
end

start_mqtt()

this simple code connect to the broker if broker get disconnect it will reconnect but with wifi.sta.disconnect() and after wifi.sta.connect() it shows connect but it does not

@nwf
Copy link
Member

nwf commented Dec 30, 2020

Many things are wrong with MQTT (#2987, #3068, doubtless many more). My https://github.com/nwf/nodemcu-firmware/tree/dev-active branch has some fixes and refactorings that may help, but many, many things remain wrong with MQTT even after all that work and it's just been too depressing to even contemplate fixing and nobody seems really bothered by it.

Please attempt packet capture and investigate what's going on at the network level, together with transcripts from your demo program and other MQTT clients of the broker. That is, it would be most helpful to have narrative logs, with packet traces and debug information of the form "NodeMCU Device Under Test (DUT) connects and sends X Y Z to broker; broker establishes subscriptions and sends A B C to DUT; a client publishes M to Q and the broker forwards that to DUT, which acknowledges; 11 hours pass with no network traffic beyond MQTT PING and PONG between broker and DUT; a client publishes N to Q; the broker sends this to DUT, which fails to acknowledge and reports internally [...]". I'm aware that this is a huge amount of work, but someone's going to have to do it, and so far nobody, including me, has really been champing at the bit.

(ETA: Making things even more depressing... Even if we get MQTT right, it's likely that there are nigh unsolvable issues below, given, for example, #3040. It's not clear that there's a better solution at present than to give up and admit that NodeMCU is not a high-reliability platform except in very constrained circumstances; in general, your application and remote endpoints should conspire to actively keep and feed watchdog timers that cause reboots rather than trying to fix anything without.)

@marcelstoer
Copy link
Member

OT but we gotta discuss this somewhere...

@nwf what is the best way out of this misery? The "upstream" https://github.com/tuanpmt/esp_mqtt has been unmaintained since 2017. Hence, we can't turn to it for fixes to port. Options:

My https://github.com/nwf/nodemcu-firmware/tree/dev-active branch has some fixes and refactorings that may help

Can we at least merge those?

@HHHartmann
Copy link
Member

This has tests opposed to tuanpmt/ESP8266MQTTClient, which might lead to better quality.

Can we at least merge those?

Sounds reasonable

@chathurangawijetunge
Copy link
Author

chathurangawijetunge commented Dec 31, 2020

i think i have found a small workaround. by adding a timer for connection it solves my issue for the time being.

URL="broker"

m = mqtt.Client(node.chipid(),30,"user","pwd")
m:lwt("test/lwt","offline",0,1)   
--m:on("connect", function(client) print ("connected") end)
m:on("offline" ,start_mqtt) 
m:on("connfail",start_mqtt)

m:on("message", function(client, topic, data)
  print(topic .. ":" )
  if data ~= nil then
    print(data)
  end
end)

--m:on("overflow", function(client, topic, data)
--  print(topic .. " partial overflowed message: " .. data )
--end)

Mqtt_Conn_tmr=tmr.create()

function start_mqtt()
 Mqtt_Conn_tmr:alarm(3000,0, function()
   m:connect(URL, 1883, false, function(client)
     print("connected")
     client:publish("test/lwt","online",0,1)
     client:subscribe("test/lwt",0,nil) 
     client:subscribe("test",0,nil)    
   end,
   function(client, reason)
     print("failed reason: " .. reason)
     start_mqtt()
   end)
 end)
end

start_mqtt()

@nwf nwf mentioned this issue Dec 31, 2020
4 tasks
@stale
Copy link

stale bot commented Apr 16, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Apr 16, 2022
@stale stale bot closed this as completed Apr 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants