-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modules with mega_20210503 all off-line after unpredictable time #3649
Comments
Yep, there's something funky with the WiFi not reconnecting. |
Yes I have also experienced 2 times that ESP node with a custom firmware built from latest sources could not reconnect even after RESET by button - it was necessary to unplug it from power and plug it again. I'll try to reproduce the issue again. I am not sure but it looks like an old, already fixed issue, reverted back - discussed here : #2757 and here: #2721 (comment) BTW. I have just tried to run a WiFi scan from Tools page on one node which is connected to WiFi & working and to my surprise it said: No Access Points found (there are plenty of around for sure). |
OK, that's an important clue for me. |
Can you test with this PR: #3650 ? |
Encountered the issue again, this is a debug log: 100925625 : Info : WIFI : Set WiFi to OFFdel if0 100925860 : Error : WiFi : Scan not allowed, unprocessed WiFi events: disconn 100928170 : Error : WIFI : Error while starting AP Mode with SSID: ESP21 IP: 192.168.4.1 100930694 : Info : WIFI : Set WiFi to OFFbcn 0 100930929 : Error : WiFi : Scan not allowed, unprocessed WiFi events: disconn 100931807 : Info : WiFi : Start network scan all channels 100936447 : Info : WIFI : Set WiFi to OFFdel if0 100936681 : Error : WiFi : Scan not allowed, unprocessed WiFi events: disconn 100937570 : Info : WiFi : Start network scan all channels 100942194 : Info : WIFI : Set WiFi to OFFdel if0 100942428 : Error : WiFi : Scan not allowed, unprocessed WiFi events: disconn 100943308 : Info : WiFi : Start network scan all channels 100947966 : Info : WIFI : Set WiFi to OFFdel if0 100948200 : Error : WiFi : Scan not allowed, unprocessed WiFi events: disconn 100950511 : Error : WIFI : Error while starting AP Mode with SSID: ESP21 IP: 192.168.4.1 100953056 : Info : WIFI : Set WiFi to OFFbcn 0 100953291 : Error : WiFi : Scan not allowed, unprocessed WiFi events: disconn 100954169 : Info : WiFi : Start network scan all channels 100958820 : Info : WIFI : Set WiFi to OFFdel if0 100959055 : Error : WiFi : Scan not allowed, unprocessed WiFi events: disconn 100959945 : Info : WiFi : Start network scan all channels 100964571 : Info : WIFI : Set WiFi to OFFdel if0 100964805 : Error : WiFi : Scan not allowed, unprocessed WiFi events: disconn 100965696 : Info : WiFi : Start network scan all channels 100970333 : Info : WIFI : Set WiFi to OFFdel if0 100970568 : Error : WiFi : Scan not allowed, unprocessed WiFi events: disconn 100972879 : Error : WIFI : Error while starting AP Mode with SSID: ESP21 IP: 192.168.4.1 100975403 : Info : WIFI : Set WiFi to OFFbcn 0 100975639 : Error : WiFi : Scan not allowed, unprocessed WiFi events: disconn 100976517 : Info : WiFi : Start network scan all channels 100981157 : Info : WIFI : Set WiFi to OFFdel if0 100981392 : Error : WiFi : Scan not allowed, unprocessed WiFi events: disconn 100982268 : Info : WiFi : Start network scan all channels 100986927 : Info : WIFI : Set WiFi to OFFdel if0 100987162 : Error : WiFi : Scan not allowed, unprocessed WiFi events: disconn 100988041 : Info : WiFi : Start network scan all channels I see it tries to connect to other BSSID than the Best AP candidate... |
I already merged it, so you can just build using the latest sources. |
Thanks for the info, I am just compiling it. Fatal exception 29(StoreProhibitedCause):
Exception (29):
ctx: sys --------------- CUT HERE FOR EXCEPTION DECODER --------------- ets Jan 8 2013,rst cause:2, boot mode:(3,7) load 0x4010f000, len 3584, room 16 INIT : Booting version: My Build: May 19 2021 09:45:40 (ESP82xx Core 2843a5ac, NONOS SDK 2.2.2-dev(38a443e), LWIP: 2.1.2 PUYA support) |
I did notice the MQTT reconnect attempts can happen quite fast after each other, so maybe that's hammering a bit to quickly on the data structures in the LWIP stack, which has shown more quirks in the past. |
Well, the unwanted quick MQTT reconnecting was due to a (probably) bad IR sensor which is connected to this node and suddenly started to receive a noise. :-( As it is configured to send the data to MQTT controller, it was the reason of crash... This should not happen under normal conditions. FirmwareBuild:⋄ | 20113 - Mega |
I looked into your log again and I guess there is still another improvement to be made in clearing the unhandled WiFi events when WiFi is turned off and on again. |
FYI in my custom build, created with PR3655, sometimes (but for a long time, even after refreshing the page) I see on Main page RSSI: of Secondary AP (Configured SSID2) despite it's in fact connected to Primary AP (Configured SSID1). The same I see on More info page - WiFi - SSID: Secondary AP (incorrect), Channel: channel of Primary AP (correct)
FirmwareBuild:⋄ | 20113 - Mega 144163148: WD : Uptime 2403 ConnectFailures 12619 FreeMem 12464 WiFiStatus WL_CONNECTED ESPeasy internal wifi status: Conn. IP Init |
The important question is, is it still working? I guess it is given your reply. So if a node is connected to the other SSID, it can mean:
|
Oh, just re-read your post.... The displayed connection isn't the actual connection. |
Yeah, everything was working fine. Just a "cosmetic" issue in Main page / Main - More info page.
Exatly, this was the case, I am not sure if / how it is reproducible - to be watched. |
I think this issue can be closed, with latest builds which already has the PR merged, from my perspective everything looks OK with WiFi reconnect. |
Thanks for reporting back. |
Yes that was me, reporting above the wrong info about AP connected. |
To test I have to compile myself, right? |
Or tell me what you need, so I can compile it for you. |
Hi TD-er, That would be great, i'll need the equivalent for ESP_Easy_mega_20210223_normal_ESP8266_4M1M |
Can you let me know how it performs (especially regarding the reported memory issues) |
Thanks! I'll monitor & will report back. |
I updated 5 out of 7 modules with this firmware and all 5 are still on-line and didn't reset since then (now 9 hours ago). Good result so far! How can I monitor the memory issues? |
You can add a task with the system info plugin and send it to a controller, so you can collect the data. |
It is more useful if we know what node uses what firmware version and whether or not it is comparable with another node. |
Well, unfortunately I have to report that one of my ESP nodes is still experiencing the issue, even with the Custom build created 2 days ago: FirmwareBuild:⋄ | 20114 - Mega It looks even though the Best AP candidate's name and BSSID are correct, the BSSID to which the ESP node is Connecting is wrong so it never connects until cold boot (RESET by button is not enough). |
OK, so there is some discrepency between what is considered the 'active' AP candidate and what is being used. |
I still don't know how to reproduce it, just experienced a reboot due to Exception and the node reconnected succesfully. RTC Struct |
Sorry not providing updates for a while. I'm using now ESP_Easy_mega_20210530_normal_ESP8266_4M1M.bin for all my modules. They all stay reachable via WIFI but they are not very stable. In average they are rebooting within 3 days. |
Can you perhaps try this build: Experimental web flasher |
Thanks TD-er, I've update all 7 modules with ESP_Easy_mega_20210615_normal_ESP8266_4M1M.bin, I'll monitor their behavior. |
1st update: 5 out of 7 modules didn't reset since firmwware update, now all running for approx 23 hours. The other 2 did reset at least once, 1 is running for 20 hours, the other for only 3 hours |
Do you have any stats of the crashing units? |
Do these units have such extremely constant memory allocations, or do you have a lot less samples? |
Should be fixed by these PRs:
Please let me know if it is indeed fixed now... finally |
Both PRs I linked were from today and yesterday, so I guess those were not yet present 3 days ago ;) If you don't have several access points with the same SSID, then I guess it will not make much of a difference with the WiFi PR. |
Aha 😀. I thought it was linked with the new version from 3 days ago ….. I do have 3 access point with same SSID and using MQTT so this change can maybe help in my situation as well. I’ll see with next firmware update release. |
As briefly mentioned in #3638 (comment)_
After updating to mega_20210503 all my (7) Wemos D1 modules become off-line after unpredictable time. Some after minutes, 1 after a couple of days. Before this happened they became aslo very unresponsive. In the mean time I have downgraded to 20210223 and they are all up and running for days again.
The text was updated successfully, but these errors were encountered: