Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix_: no peers available supporting LightPush protocol after network restored from disabled state #6153

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

qfrank
Copy link
Contributor

@qfrank qfrank commented Dec 2, 2024

fix relate mobile issues: 21452 / 21394, and might also fix part(missing messages) of 21172

relate mobile PR

@qfrank qfrank self-assigned this Dec 2, 2024
@status-im-auto
Copy link
Member

status-im-auto commented Dec 2, 2024

Jenkins Builds

Commit #️⃣ Finished (UTC) Duration Platform Result
✔️ 77bce1b #1 2024-12-02 11:15:47 ~4 min windows 📦zip
✔️ 77bce1b #1 2024-12-02 11:15:59 ~4 min linux 📦zip
✔️ 77bce1b #1 2024-12-02 11:16:46 ~5 min macos 📦zip
✔️ 77bce1b #1 2024-12-02 11:17:07 ~5 min ios 📦zip
✔️ 77bce1b #1 2024-12-02 11:17:30 ~6 min tests-rpc 📄log
✔️ 77bce1b #1 2024-12-02 11:17:34 ~6 min android 📦aar
✔️ 77bce1b #1 2024-12-02 11:19:22 ~8 min macos 📦zip
✖️ 77bce1b #1 2024-12-02 11:41:49 ~30 min tests 📄log

@@ -1707,6 +1707,10 @@ func (w *Waku) handleNetworkChangeFromApp(state connection.State) {

func (w *Waku) ConnectionChanged(state connection.State) {
isOnline := !state.Offline
if isOnline && !w.onlineChecker.IsOnline() { // we are online and we were offline before
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of the duplicated predicate logic and the comment, one idea is to create a predicate function, something like:

modified   wakuv2/waku.go
@@ -1705,9 +1705,13 @@ func (w *Waku) handleNetworkChangeFromApp(state connection.State) {
 	}
 }
 
+func (w *Waku) isGoingOnline(state connection.State) bool {
+	return !state.Offline && !w.onlineChecker.IsOnline()
+}
+
 func (w *Waku) ConnectionChanged(state connection.State) {
 	isOnline := !state.Offline
-	if isOnline && !w.onlineChecker.IsOnline() { // we are online and we were offline before
+	if w.isGoingOnline(state) {
 		//TODO: analyze if we need to discover and connect to peers for relay.
 		w.discoverAndConnectPeers()
 	}
@@ -1720,8 +1724,7 @@ func (w *Waku) ConnectionChanged(state connection.State) {
 		w.handleNetworkChangeFromApp(state)
 	} else {
 		// for lightClient state update and onlineChange is handled in filterManager.
-		// going online
-		if isOnline && !w.onlineChecker.IsOnline() {
+		if w.isGoingOnline(state) {
 			select {
 			case w.goingOnline <- struct{}{}:
 			default:

wakuv2/waku.go Show resolved Hide resolved
@chaitanyaprem
Copy link
Contributor

chaitanyaprem commented Dec 3, 2024

The root-cause for issues status-im/status-mobile#21452 / status-im/status-mobile#21394 could not be due to peers not being connected after coming online. In light-clients peers can be connected as required i.e when we want to send messages via lightpush or subscribe to a filter to receive messages.

And store nodes are pre-configured and they get connected to as and when we want to do store queries automatically. Hence the discovery of peers and connecting to them need not be done explicitly after coming back online. So far one thing that i had noticed is that the peers that we discover may not be reachable and hence it takes time to make successful connections for filter and lightpush. In order to fix this i had worked on waku-org/go-waku#1244 which would remove peers from local store if they are not reachable. This doesn't solve the problem completely but atleast prevents the node from trying to reuse peers to which connection has failed recently.

@qfrank Can you give me the reasoning as to how you identified this as the root-cause for the issues?

@qfrank
Copy link
Contributor Author

qfrank commented Dec 3, 2024

Can you give me the reasoning as to how you identified this as the root-cause for the issues?

I identified it by debug status backend server ... what I found was: when user back to online from offline(keep offline before login), there's no peers available supporting LightPush protocol (PeerManager.FilterPeersByProto), with this fix, it works like a charm. You should use status backend server, which could help you a lot @chaitanyaprem

@qfrank
Copy link
Contributor Author

qfrank commented Dec 3, 2024

BTW, I found checkForConnectionChanges will invoke ConnectionChanged , and it checks if online by isOnline := len(w.node.Host().Network().Peers()) > 0 , I think this is not good. Thinking about mobile told backend network is available in 100%, but we also check if online by another condition. so online have different meanings at the same time, maybe we should handle these onlines separately? @chaitanyaprem

@chaitanyaprem
Copy link
Contributor

I identified it by debug status backend server ... what I found was: when user back to online from offline(keep offline before login), there's no peers available supporting LightPush protocol (PeerManager.FilterPeersByProto), with this fix, it works like a charm. You should use status backend server, which could help you a lot @chaitanyaprem

ah...interesting. got it since bootstrap nodes were not reachable during login and then they were post network is back online. Got it, this fix would definitely address the issue.

Thanks for this info, i will start using status backend server to simulate these scenarios. Any instructions on how to use and run it?

@chaitanyaprem
Copy link
Contributor

chaitanyaprem commented Dec 3, 2024

BTW, I found checkForConnectionChanges will invoke ConnectionChanged , and it checks if online by isOnline := len(w.node.Host().Network().Peers()) > 0 , I think this is not good. Thinking about mobile told backend network is available in 100%, but we also check if online by another condition. so online have different meanings at the same time, maybe we should handle these onlines separately? @chaitanyaprem

This is also a good point, in case of mobile we can use network connectivity available from device to determine network status and not just peer connectivity alone. Is there a way to access this info from this layer? That way we can use either peers==0 or network-state-identified-by-app = offline to determine network status.

The peers == 0 was initially used for relay which makes more sense there because in desktop or other devices it is hard to detect network connectivity otherwise unless you ping to some standard server. Also note that this method of detecting offline is more decentralized and permissionless since we are not depending on any centralized server. But for mobile since we get info from the device that can also be used to drive online/offline status.

Copy link
Contributor

@chaitanyaprem chaitanyaprem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ilmotta
Copy link
Contributor

ilmotta commented Dec 3, 2024

The peers == 0 was initially used for relay which makes more sense there because in desktop or other devices it is hard to detect network connectivity otherwise unless you ping to some standard server.

@chaitanyaprem, do you know if we considered other ways to reliably detect network connectivity in desktop environments? I know there's no multiplatform way to solve this problem, but for such an important problem I would lean on platform specific solutions, going from most reliable to least reliable or a combo of 2 solutions (where checking the number of peers would be somewhere in the middle I guess). Probably such solutions would be better implemented outside the Waku's umbrella (?)

@osmaczko @igor-sirotin do you remember past discussions about network connectivity for the desktop app?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants