Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to execute 'setRemoteDescription' on 'RTCPeerConnection': The RTCPeerConnection's signalingState is 'closed'. #1050

Closed
HermanBilous opened this issue Feb 28, 2024 · 18 comments · Fixed by #1052
Assignees

Comments

@HermanBilous
Copy link
Contributor

Describe the bug

I have this issue that I cannot reproduce at all, but get a lot of errors in sentry for. Could you please take a look and point in a direction to how to fix it. Here's the log I can extract from sentry.

Reproduction

N/A

Logs

[
  {
    "level": "warning",
    "data": {
      "arguments": [
        "websocket closed",
        {
          "code": 1006,
          "identity": "<user_id>",
          "reason": "",
          "room": "room_b",
          "roomSid": "room_sid_b",
          "state": 1,
          "wasClean": false
        }
      ]
    }
  },
  {
    "level": "warning",
    "data": {
      "arguments": [
        "signal disconnected",
        {
          "identity": "<user_id>",
          "room": "room_b",
          "roomSid": "room_sid_b"
        }
      ]
    }
  },
  {
    "level": "info",
    "data": {
      "arguments": [
        "resuming signal connection, attempt 0",
        {
          "identity": "<user_id>",
          "room": "room_b",
          "roomSid": "room_sid_b"
        }
      ]
    }
  },
  {
    "level": "info",
    "data": {
      "arguments": [
        "Resuming signal connection",
        {
          "identity": "<user_id>",
          "room": "room_b",
          "roomSid": "room_sid_b"
        }
      ]
    }
  },
  {
    "level": "error",
    "data": {
      "arguments": [
        "Failed to execute 'setConfiguration' on 'RTCPeerConnection': The RTCPeerConnection's signalingState is 'closed'.",
        {
          "error": "[DOMException]",
          "identity": "<user_id>",
          "room": "room_b",
          "roomSid": "room_sid_b"
        }
      ]
    }
  },
  {
    "level": "warning",
    "data": {
      "arguments": [
        "reconnect disconnected",
        {
          "identity": "<user_id>",
          "room": "room_b",
          "roomSid": "room_sid_b"
        }
      ]
    }
  },
  {
    "level": "error",
    "data": {
      "arguments": [
        "unable to set offer",
        {
          "fields": "[Object]",
          "identity": "<user_id>",
          "room": "room_b",
          "roomSid": "room_sid_b"
        }
      ]
    }
  }
]


### System Info

```shell
Browser: any, but mostly Chrome
OS: Windows / MacOS

Severity

annoyance

Additional Information

No response

@lukasIO
Copy link
Contributor

lukasIO commented Feb 28, 2024

thanks for the report!
which SDK version is this? Did it start popping up after a specific upgrade?

It looks like the PeerConnection went into a closed state without that triggering a reconnect (which it should).
On a signal resume the SDK then tries to update the PeerConnection's config, but that fails as it is already closed.
Any logs that happen before the one's you shared might be helpful here as well

@HermanBilous
Copy link
Contributor Author

I'm using 2.0.3 on dev and 1.15.10 on prod. I can see this issue reported on both environments. We had our sentry disabled for some time, but I remember having this issue on older versions of sdk as well.

We also get the same amount of errors like: InvalidStateError: Failed to execute 'addIceCandidate' on 'RTCPeerConnection': The RTCPeerConnection's signalingState is 'closed'. reported as well.

I found this code:

      if (e instanceof UnexpectedConnectionState) {
        this.log.debug('received unrecoverable error', { ...this.logContext, error: e });
        // unrecoverable
        recoverable = false;
      } else if (!(e instanceof SignalReconnectError)) {
        // cannot resume
        this.fullReconnectOnNext = true;
      }

It is executed when the Failed to execute 'setRemoteDescription'... error occurs. Judging by the reported error, the code above receives e of type SignalReconnectError, so the error is considered recoverable.

I'll dig for more logs.

@lukasIO
Copy link
Contributor

lukasIO commented Feb 28, 2024

Thanks for the insight, that means that the clients to perform a reconnect then after hitting that error?

@HermanBilous
Copy link
Contributor Author

HermanBilous commented Feb 29, 2024

If it's not full reconnect, it will only do a signal reconnect. Maybe a pc reconnect at this point is better?

Nevertheless, I could not find code that will do anything when onSignalingStatechange is triggered. I mean, it calls PCTransportManager.updateState, but then in mentioned function nothing gets done about changes in pc.signalingState. Maybe that's why it never triggers a reconnect?

I looked into logs from other events, but they all have the same error pattern from original post in common.

@lukasIO
Copy link
Contributor

lukasIO commented Feb 29, 2024

Yeah, you're totally right. This code path failing should not be treated as a SignalReconnectError and instead force a full reconnect. Thanks for digging into this!

@HermanBilous
Copy link
Contributor Author

Thanks for the PR. I will update my project once it's released, and will report back if it fixed my problem.

@HermanBilous
Copy link
Contributor Author

So, the changes did not help, I think they made the issue even worse, since now I get 3 unhandled exceptions instead of 1. I created #1063 to at least add a catch clause on those errors. This should at least make the errors handled and not be reported by sentry, which helps my case. This won't help with the issue itself, however.

@davidzhao davidzhao reopened this Mar 6, 2024
@HermanBilous
Copy link
Contributor Author

Hey, I think we should close this one for now.. I thought that the fix went into the latest release somehow, I even read the release notes, and I swear I saw that PR in there.

Copy link
Contributor

lukasIO commented Mar 7, 2024

were you able to test the changes in the latest release?

@HermanBilous
Copy link
Contributor Author

yes, the error is still present. From error trace I can see that it's coming from callbacks added in setupSignalClientCallbacks()

@lukasIO
Copy link
Contributor

lukasIO commented Mar 7, 2024

could you provide the exact error and stack trace of an error that was raised in the version released yesterday (v2.0.5) ?

@lukasIO lukasIO self-assigned this Mar 7, 2024
@HermanBilous
Copy link
Contributor Author

The exact stack trace looks like this:

Error: Failed to execute 'setRemoteDescription' on 'RTCPeerConnection': The RTCPeerConnection's signalingState is 'closed'.
  at ss.<anonymous>(/assets/lk-Bsa7hxT1.js:1:214013)
  at Generator.throw(<anonymous>)
  at c(/assets/lk-Bsa7hxT1.js:1:61326)

As you can see, not much of a stacktrace on this error, because ts wrapped it in a generator. If I look into code, I see this at where the error was thrown:

ject.assign(Object.assign({},this.logContext),{fields:o})),new Hn(s)}})}}function os(i,e,t){let n=0;i.rtp.some(r=>r.codec==="opus"?(n=r.payl

Which leads me to:
PCTransport.setMungedSDP throw new NegotiationError(msg);.

Looking into livekit's console logs:

websocket closed {
code: 1006,
participant: pid_1,
pID: PA_QgjpAGj2QoVn,
reason: ,
room: rid,
roomID: RM_3jcnGDm7UKM7,
state: 1,
wasClean: false
}
signal disconnected {
participant: pid_1,
pID: PA_QgjpAGj2QoVn,
room: rid,
roomID: RM_3jcnGDm7UKM7
}
resuming signal connection, attempt 0 {
participant: pid_1,
pID: PA_QgjpAGj2QoVn,
room: rid,
roomID: RM_3jcnGDm7UKM7
}
Resuming signal connection {
participant: pid_1,
pID: PA_QgjpAGj2QoVn,
room: rid,
roomID: RM_3jcnGDm7UKM7
}
reconnect disconnected {
participant: pid_1,
pID: PA_QgjpAGj2QoVn,
room: rid,
roomID: RM_3jcnGDm7UKM7
}
unable to set offer {
fields: [Object],
participant: pid_1,
pID: PA_QgjpAGj2QoVn,
room: rid,
roomID: RM_3jcnGDm7UKM7
}

Then

Error: Failed to execute 'setRemoteDescription' on 'RTCPeerConnection': The RTCPeerConnection's signalingState is 'closed'.

@HermanBilous
Copy link
Contributor Author

Right after error on setRemoteDescription I get

InvalidStateError: Failed to execute 'addIceCandidate' on 'RTCPeerConnection': The RTCPeerConnection's signalingState is 'closed'.

It originates in client.onTrickle

@lukasIO
Copy link
Contributor

lukasIO commented Mar 8, 2024

Thanks! How do the client logs continue after the setRemoteDescriptionError? Just trying to make sure that the reconnect logic behaves as expected now.

@HermanBilous
Copy link
Contributor Author

Hm, that's a good question. Unfortunately, we don't have tracking of what's happening after the error, I'm not even sure if this is possible with sentry. But, a good thing is that there are only 1 such error reported, not 3 of them consecutively, like I saw before.

@lukasIO
Copy link
Contributor

lukasIO commented Mar 8, 2024

It would be great if we were to replicate this issue in an isolated manner. Any hints on what kind of client (OS + browser combination) runs into this most frequently or what might be causing this reconnect pattern in the first place?

@HermanBilous
Copy link
Contributor Author

I think this is somehow connected to the server. E.g. room migrates, or lk server becomes unavailable for some reason and shutdowns abruptly. As for the browser, I see that it happens on chrome / edge, on windows, mac and even linux. The livekit-server we use is 1.5.0

@lukasIO
Copy link
Contributor

lukasIO commented Aug 26, 2024

@HermanBilous I'll close this issue for now. We haven't heard any similar reports. If you're still seeing this issue with the latest client+server versions feel free to let me know and I'll reopen it.

@lukasIO lukasIO closed this as completed Aug 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants