-
Notifications
You must be signed in to change notification settings - Fork 994
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JVB doesn't resume media stream after LastN is limited for long #1648
Comments
Interesting. My first thought is the SRTP ROC getting out of sync. None of the machines on beta.meet.jit.si have any packets dropped in SRTP, so if that's the case its between the bridge and a receiver. When the issue occurs, do you see an updated "forwarded endpoints" message in the console? Do the streams appear if you switch to tile/stage view, click on a thumbnail, or just wait for 2-3 minutes? |
I waited for 2 min while toggling the tile/stage view back and forth, by selecting different tiles to be on stage and the video stream did not resume. I did not check the console earlier because with 10+ video it's a wall of text - but I can run the test again and filter "forwarded" |
So, yes, there are "forwarded" messages after setting LastN back to the big number. And it'll just keep cycling through all the endpoints w/o being able to bring any of it back live.
|
Hi @bgrozev are you able to repro the problem I encountered? I hope this can get fixed soon as I've been trying to use ReceiverVideoConstraints to make jitsi scale even better. |
Update - I ran a different test:
Given this test, is it still SRTP ROC getting out of sync? Is JVB enforced last-n different from the client requested constraints? |
I think this issue is caused by the JVB incorrectly resuming the VP8 RPT’s SEQ number resulting in the WebRTC library discarding the packets as part of its replay protection. Steps to reproduceJoin meet.jit.si with 5 people. Ensure at least one participant is Firefox or Safari (to drop back to VP8). Ensure all participants are in tile mode. Reduce a participant’s window size until tiles are 2x2 so that one video stream is not displayed and inactive. Wait 20 minutes. Reactivate video stream by increasing window size, or scrolling down, or switching to stage view (so reactivated stream will be shown in filmstrip). Reactivated stream is now “corrupted” and will report packet losses resulting in reduced BWE and other video streams being degraded. SummaryAfter an endpoint’s stream has been suspended (due to bandwidth allocation, last-n or pagination) and later resumes, RPT’s SEQ number should continue incrementing without a gap in the sequence. In the case of VP9 packets, or VP8 packets where no other endpoint is requesting that SSRC, the SEQ number correctly “pauses” then “resumes” from the previous count. However, VP8 packets (with at least one other endpoint requesting the SSRC) resume after a pause with a gap in the SEQ number as though the counter continued incrementing during the paused stage. If the pause was sufficiently long (15-20 mins) the discontiguous SEQ number of the resumed stream will appear to the WebRTC library as a replay attack and all of the packets for that SSRC will be discarded. The browser will then report all the packets as lost (via the TCC in its RTCP) and the JVB’s TCC node will reduce the BWE and suspend endpoints. After the problematic SSRC becomes inactive the packet losses will stop, the BWE will increase, the allocator will reactivate the endpoint, then lost packets will be reported again, rinse and repeat. If the “corrupted” video is the current speaker then you won’t see any video. If the corrupted video is one of the thumbnails on a stage view, the on-stage video will be degraded and some thumbnails might become inactive. If the corrupted video is included in tile view, most tiles will show as either low frame rates, and/or numerous tiles switching on and off (flicking every few seconds) without any improvement. In the case of larger meetings, if at least one person is viewing 5x5 tiles then other participants viewing stage mode will have approximately 6 videos active (in the filmstrip) and 19 videos inactive and vulnerable to this issue. After 20 minutes if a participant switches to tile mode they will experience the issue. Or, if one of the inactive participants speaks, they become a recent speaker and will appear in the other participants’ filmstrips which will trigger the issue. DetailsThe first image shows the RTP original SEQ (green) and the RTP projected SEQ (yellow) sent by the JVB to the participant. The JVB stores a delta value and uses this to calculate the projected SEQ. In the image the first gaps illustrate the bug where the SEQ effectively keeps incrementing during the period when the endpoint doesn’t require that SSRC, but at least one other participant is receiving the stream. This results in a jump in the SEQ when the stream resumes. The last gap in the image illustrates the case where no participant is receiving the SSRC and there’s no jump in the SEQ when it resumes. The bottom chart in the second image shows the bug being triggered and the results. The red dotted lines were added to illustrate the continuation of the SEQ increments during the participant’s video’s inactive period. The chart shows that when the video resumed, the SEQ counter jumped 35,000. This jump is more than half of 2^16 and therefore WebRTC will assume the latest SEQ value is less than the previous SEQ value (ROC of -1) and will discard the RTP packet as a replay attack. This behavior can be observed by watching the packets and noticing the relevant RTP packets being received, but the RTCP TCC packets reporting them as lost. You can also debug WebRTC in Chrome:
and look for errors such as “Failed to unprotect SRTP packet, err=9, previous failure count: 100". Err 9 indicates “replay check failed (bad index)” which means the index is in the list of recently received SEQ indexes; and Err 10 indicates replay check failed (index too old)” which means the index is before the list of recent SEQ indexes. In both cases it’s only “appearing” to be a replay attack because the index has “wrapped around”. The error messages are reported every 100x instances. The SEQ counter eventually increases back into the valid range and the system returns back to a stable state, however, the incrementing rate slows down after the counter has become corrupted (shown in second image) and thus it takes a long time. Also, this issue results in other videos becoming inactive (due to reduced BWE) and it’s likely those other streams will also become corrupted for the same reasons. HD Streams increment the SEQ counter significantly faster. The second image’s steep slopes (at the beginning of the chart) are an example of this. At this rate it takes a pause of 200 seconds to cause the large SEQ gap. I experienced this during the initial exploration of this issue, however, further testing wasn’t able to reproduce this again. It’s possible I was mistaken and the issue only occurs when the uploader’s HD stream has been suspended (i.e. only when no other participant is requesting the HD video). Let me know if any additional details would be helpful.
|
I'm also can confirm that issue is actual for our jitsi-meet installation with latest packages |
This Issue tracker is only for reporting bugs and tracking code related issues.
Before posting, please make sure you check community.jitsi.org to see if the same or similar bugs have already been discussed. General questions, installation help, and feature requests can also be posted to community.jitsi.org.
Description
We use lib-jitsi-meet to do custom layout and utilize the new VideoConstraints. I found that after video streams are suspended (by setting LastN to be small) for a long time (e.g. 15-20 min), it becomes impossible to get the streams back, by setting LastN to large number or -1.
All those suspended streams would stay suspended, even setting them as "on stage" doesn't work.
Other viewers without doing LastN change are receiving from everyone else just fine.
I have then tested against https://beta.meet.jit.si/ to verify it's not my setup or code.
Current behavior
Have 10+ video.
Set LastN to a small number - in my own setup, it's set to match what is on the screen, e.g. screenshare means 1 on-stage with 4 selected, so LastN is set to 5.
After 15-20 min (e.g. screenshare stops), LastN is set back to 20 (e.g. everyone)
Problem: the suspended streams don't come back.
Expected Behavior
All the video streams flows again after the LastN is large again.
Possible Solution
Turning the camera on/off by others may restore the stream (not sure if it's always the case)
Steps to reproduce
APP.conference._room.setLastN(20)
Environment details
lastest jitsi unstable on beta.meet
The text was updated successfully, but these errors were encountered: