Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix reconnection timeout handler not working in the token provider phase #3513

Conversation

nuno-vieira
Copy link
Member

@nuno-vieira nuno-vieira commented Nov 28, 2024

🔗 Issue Links

Fixes https://linear.app/stream/issue/IOS-78

🎯 Goal

Fixes the reconnection timeout handler not working in the token provider phase.

🛠 Implementation

The main change is that the reconnectionTimeoutHandler was moved from the ConnectionRecoveryHandler to the ChatClient so that it can handle the token provider phase. For this to work, a couple of things required some changes. There are comments in the PR explaining each change.

Flow diagram to explain why we need to set the state to .initialize again whenever we call connect in ChatClient:

Before

stateDiagram-v2
    [*] --> ChatClient.init
    ChatClient.init--> ChatClient.Connect
    ChatClient.Connect --> TokenProvider
    
    TokenProvider --> ReconnectionTimeout: Fails
    TokenProvider --> WebSocket.Connect: Success
    
     ReconnectionTimeout --> disconnected
     disconnected --> ChatClient.Connect
    
    note right of WebSocket.Connect
        State reported: .connecting
    end note
    
    note right of ChatClient.init
        State reported: .initialize
    end note
    
    note right of disconnected
        State reported: .disconnected
    end note
    
    note left of disconnected
        On the second cycle, the state is not reported
        Because it did not change from .disconnected
    end note
Loading

After

stateDiagram-v2
    [*] --> ChatClient.init
    ChatClient.init--> ChatClient.Connect
    ChatClient.Connect --> TokenProvider
    
    TokenProvider --> ReconnectionTimeout: Fails
    TokenProvider --> WebSocket.Connect: Success
    
     ReconnectionTimeout --> disconnected
     disconnected --> ChatClient.Connect
    
    note right of WebSocket.Connect
        State reported: .connecting
    end note
    
    note right of ChatClient.Connect
        State reported: .initialize
    end note
    
    note right of disconnected
        State reported: .disconnected
    end note
Loading

🧪 Manual Testing Notes

  1. Open the Demo App Configuration
  2. Setup token refresh details and make it fail, for example, 20 times
  3. Set a reconnection timeout of 15 seconds
  4. Connect a user
  5. Wait 15 seconds
  6. Should automatically disconnect
  7. Should not report anymore "Token refresh failed" in the console

☑️ Contributor Checklist

  • I have signed the Stream CLA (required)
  • This change should be manually QAed
  • Changelog is updated with client-facing changes
  • Changelog is updated with new localization keys
  • New code is covered by unit tests
  • Comparison screenshots added for visual changes
  • Affected documentation updated (docusaurus, tutorial, CMS)

@nuno-vieira nuno-vieira added 🐞 Bug An issue or PR related to a bug 🌐 SDK: StreamChat (LLC) Tasks related to the StreamChat LLC SDK labels Nov 28, 2024
@nuno-vieira nuno-vieira requested a review from a team as a code owner November 28, 2024 10:38
Comment on lines +45 to +47
func initialize() {
webSocketClient?.initialize()
}
Copy link
Member Author

@nuno-vieira nuno-vieira Nov 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be introduced. Otherwise, we won't report the status of the new connection to the delegates. This is especially important when using ChatClient as a singleton. This is the problem:

  1. connectUser() (connectionStatus = .initialized) - keep in mind that it only goes to .connecting, after the token provider finishes
  2. timeout() (connectinStatus = .disconnected) - Disconnected status is reported ✅
  3. connectUser() again (connectionStatus == .disconnected) - Connecting status is not reported ❌ , because the status did not change

Since we don't reset the connection status, it keeps at the .disconnected, and so the .connecting or .initialized is not reported when manually reconnecting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when you call connectUser again, shouldn't it move to initialized / connecting already? Because there should be a status change when you start connecting again.

Copy link
Member Author

@nuno-vieira nuno-vieira Nov 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added more details to point 1.. The problem here is that we only set connectionState = .connecting after the token phase finishes. So, if the token provider keeps failing, we never set the websocket to connecting and it will be kept at disconnected on the second try, which means we won't report any connection state change, since it will be stuck at disconnected.

This approach actually makes sense, before the connect() call is made to the WebSocket engine, the state is initialized. So, when the token provider fails and we did not connect the websocket engine, we need to restart the state to initialized. (Since the timeout action will report disconnect)

Copy link
Member Author

@nuno-vieira nuno-vieira Nov 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall overview:

stateDiagram-v2
    [*] --> initialize
    initialize --> ChatClient.Connect
    ChatClient.Connect --> TokenProvider
    
    TokenProvider --> ReconnectionTimeout: Fails
    TokenProvider --> WebSocket.Connect: Success
    
     ReconnectionTimeout --> disconnected
     disconnected --> ChatClient.Connect
    
    note right of WebSocket.Connect
        State reported: .connecting
    end note
    
    note right of initialize
        State reported: .initialize
    end note
    
    note right of disconnected
        State reported: .disconnected
    end note
    
    note left of disconnected
        On the second cycle, the state is not reported
        Because it did not change from .disconnected
    end note
Loading

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally it would be nice to see .connecting as the state just after calling connect(). As a SDK user, who does not know internals, that would make sense. That said, it might make sense to do this separately (because we have this distinction at the moment where .initialized is used during the token fetch phase).

Especially because documentation says (one typo there):

    /// The initial state meaning that  there was no atempt to connect yet.
    case initialized

That is confusing for the SDK user.

My view is that I feel like it is OK in the PR to go for .initialized, because this is what how the status has been reported previously, but then in a follow-up PR actually change this to .connecting because that would make more sense for SDK users (thinking about splitting because no idea what side-effects it could have and that should be tested throughly).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly, I had the same thinking as Toomas. Basically, this part:
"The problem here is that we only set connectionState = .connecting after the token phase finishes.", doesn't feel correct. We are connecting (we are trying to get a token), but that's not reflected anywhere. To reduce risks, I'm fine to merge it like this as well, but yeah, we should make this more explicit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, changing this would be very risky at the moment, that is why I decided not to do, and I remember I had a couple of issues trying it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@laevandus I changed that comment since it is internal either way. I tried changing the state to .connecting but this breaks other stuff, and so for now I prefer to not risk it. At most, adding another internal state, like .fetchingToken or something like this, would be better and not cause any impact on the rest of the logic. But for now, I think this is more than enough 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that we should try to tackle in another issue & PR.

Comment on lines -98 to -105
if connectionId == nil {
if source == .userInitiated {
log.warning("The client is already disconnected. Skipping the `disconnect` call.")
}
completion()
return
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an unnecessary optimization that just complicates things. For example, when we timeout, we don't have the connectionId, so the webSocketClient? .disconnect won't be called. Which means no connection status changes will be reported.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this makes sense.

Comment on lines +144 to +148
switch connectionState {
case .initialized, .disconnected, .disconnecting:
connectionState = .disconnected(source: source)
case .connecting, .waitingForConnectionId, .connected:
connectionState = .disconnecting(source: source)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when disconnecting, if we are already disconnected, we should not put the state to disconnecting since the engine won't do anything, and so it won't report the disconnected afterwards. For this reason, we need to instantly report as disconnected for this scenario.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems tricky, previously it seems we were also processing events. Can you explain a bit more why we delete that part?
Also, the switch itself is a bit strange to me, especially the second case. If you are disconnected, and you are connecting, why should it go disconnecting?

Copy link
Member Author

@nuno-vieira nuno-vieira Nov 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not change the processing events part. The github diffing is not very clear, but the only thing changed here is the report of connectionState

Also, the switch itself is a bit strange to me, especially the second case. If you are disconnected, and you are connecting, why should it go disconnecting?

Where does this happen? We report disconnected if we were not connected previously, so theres nothing to disconnect, that is why we instantly report that we are disconnected. If we are connecting, or waiting for connection or connected, then we need to call disconnect to the engine, so the engine will eventually report the disconnected state.

Copy link
Member Author

@nuno-vieira nuno-vieira Nov 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is quite clear actually:

When the state is initialized, .disconnected, and disconnecting, and the user disconnects, we instantly report disconnected since the WebSocket engine won't report anything because it is already disconnected.

When the state is connected, connecting or waiting for connection, we need to call the engine.disconnect() and the websocket delegate will report the disconnected state. This is why here we report disconnecting instead of disconnected.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, now after seeing the diagram and the explanations, it makes more sense, thanks.

@Stream-SDK-Bot
Copy link
Collaborator

Stream-SDK-Bot commented Nov 28, 2024

SDK Size

title develop branch diff status
StreamChat 7.06 MB 7.06 MB +1 KB 🟢
StreamChatUI 4.96 MB 4.96 MB 0 KB 🟢

@Stream-SDK-Bot
Copy link
Collaborator

SDK Performance

target metric benchmark branch performance status
MessageList Hitches total duration 10 ms 6.68 ms 33.2% 🔼 🟢
Duration 2.6 s 2.55 s 1.92% 🔼 🟢
Hitch time ratio 4 ms per s 2.62 ms per s 34.5% 🔼 🟢
Frame rate 75 fps 78.33 fps 4.44% 🔼 🟢
Number of hitches 1 0.8 20.0% 🔼 🟢

@Stream-SDK-Bot
Copy link
Collaborator

Stream-SDK-Bot commented Nov 28, 2024

SDK Size

title develop branch diff status
StreamChat 7.08 MB 7.08 MB +1 KB 🟢
StreamChatUI 4.96 MB 4.96 MB 0 KB 🟢

Copy link
Contributor

@martinmitrevski martinmitrevski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More eyes would be needed here, some things are not entirely clear to me. @laevandus good if you also have a look.

Comment on lines +45 to +47
func initialize() {
webSocketClient?.initialize()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when you call connectUser again, shouldn't it move to initialized / connecting already? Because there should be a status change when you start connecting again.

Comment on lines -98 to -105
if connectionId == nil {
if source == .userInitiated {
log.warning("The client is already disconnected. Skipping the `disconnect` call.")
}
completion()
return
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this makes sense.

Comment on lines +144 to +148
switch connectionState {
case .initialized, .disconnected, .disconnecting:
connectionState = .disconnected(source: source)
case .connecting, .waitingForConnectionId, .connected:
connectionState = .disconnecting(source: source)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems tricky, previously it seems we were also processing events. Can you explain a bit more why we delete that part?
Also, the switch itself is a bit strange to me, especially the second case. If you are disconnected, and you are connecting, why should it go disconnecting?

Comment on lines 474 to +475
authenticationRepository.clearTokenProvider()
authenticationRepository.cancelTimers()
authenticationRepository.reset()
Copy link
Contributor

@laevandus laevandus Nov 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should reconnectionTimeoutHandler?.stop() be called here as well or it does not matter?

Thinking about the case of calling disconnect quickly after calling connect.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not really matter, because this is triggered by the reconnectionTimeoutHandler which has repeats: false, so it will be already stopped by the time it reaches here

Copy link
Contributor

@martinmitrevski martinmitrevski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, let's merge it with the current state. But first, @testableapple should show this PR some love.

@nuno-vieira nuno-vieira added the 🤞 Ready For QA A PR that is Ready for QA label Nov 29, 2024
@laevandus
Copy link
Contributor

laevandus commented Nov 29, 2024


Tested the branch and develop. In develop it did not disconnect after 15 seconds, but in this PR it correctly disconnected and the channel list updated (shimmering stopped).

Copy link

1 Warning
⚠️ The changes should be manually QAed before the Pull Request will be merged

Generated by 🚫 Danger

Copy link

sonarcloud bot commented Dec 3, 2024

@testableapple testableapple added 🟢 QAed A PR that was QAed and removed 🤞 Ready For QA A PR that is Ready for QA labels Dec 3, 2024
@testableapple testableapple merged commit 7941389 into develop Dec 3, 2024
12 of 14 checks passed
@testableapple testableapple deleted the fix/reconnection-timeout-handler-not-working-on-the-token-provider branch December 3, 2024 09:34
@laevandus laevandus mentioned this pull request Dec 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐞 Bug An issue or PR related to a bug 🟢 QAed A PR that was QAed 🌐 SDK: StreamChat (LLC) Tasks related to the StreamChat LLC SDK
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants