Bugzilla

Comment 2

•

7 years ago

So Jitsi is not aware of any such reports or anything which they have deployed which could have caused this. I'll have a look.

Comment 3

•

7 years ago

I was able to repro the problem. Call initiator receiving video froze for at least 30s. Video on the other end froze also but only for 5s or so.

RTP stats on about:webrtc kept increasing through the freeze. So it's probably not a actual RTP transport issue itself. Maybe some RTCP problem?

Comment 4

•

7 years ago

Looking at some NSPR and webrtc.org log files I see that the first couple of seconds (in this case) everything looks normal. Then suddenly Firefox sends three RTCP on the receiving video stream. And from there on at least every second received video RTP packet gets "answered" by Firefox with an RTCP packet.

So that looks like Firefox is requesting some form of retransmission, but is not happy with what it gets from the SFU.

Comment 5

•

7 years ago

Looking at Wireshark trace with logs combined: a single missing RTP video packet appear to generate the picture freeze and RTCP storm.

Assignee

Comment 6

•

7 years ago

I wonder if this is https://bugs.chromium.org/p/webrtc/issues/detail?id=7143
You could try commenting out     bytes_sent += SendPadData(bytes - bytes_sent, false, 0, 0);
 in rtp_sender.cc in TimeToSendPadding(), and replace it with "bytes_sent = bytes".  Note: this is on the sending side...

Also: seeing the logs might be useful.

Comment 7

•

7 years ago

Yes I also wondered if bug 1339270 is the reason. Especially as I was only able to repro this with Nightly so far, but not with release (although missing a single packet is obviously not very deterministic way to repro something).

Comment 8

•

7 years ago

Verified that bug 1339270 is not causing this. Which would not have made too much sense as I test with a local build and release on the other end, but the video freezes on Nightly (which would be the one sending the padding in this case).

Comment 9

•

7 years ago

Attached file mozLogs.zip — Details

Here are logs from such a frozen call; local buid; I shut down Firefox while the video was frozen.
If you grep for ac2f53d5-b310-49df-bd72-1f339bedec02 in the child log file you will see that up to '2017-02-14 18:30:05.808549 UTC' we receive lots of video RTP packets and only send out occasionally an RTCP packet. But after that timestamp Firefox starts to send a lot of RTCP. Unfortunately error log messages indicate why.

Updated

•

7 years ago

status-firefox51: affected → wontfix

status-firefox52: affected → fix-optional

Comment 10

•

7 years ago

So I did verify today that Fx is sending out NACKs. But what the logging showed me is that it sends out the NACKs for some time and then stops sending NACKs even though the video is still frozen. Once the video recovers I saw NACK being send again.

Randel does that sound like a behavior to be expected?

Flags: needinfo?(rjesup)

Assignee

Comment 11

•

7 years ago

At first though yes, I would expect it to stop trying to use NACK to recover after a very short time (<1sec), since after that it's cheaper/faster/better to just get an iframe (send PLI or FIR).  I'll look into exactly what the trigger to stop sending NACKs is (I know when the queue of packets waiting for a NACK to unblock them overflows it will ask for an iframe)

Flags: needinfo?(rjesup)

Assignee

Updated

•

7 years ago

Flags: needinfo?(rjesup)

Comment 12

•

7 years ago

So even more logging statements show that at the time of the video freeze Fx starts to switch from sending NACK's to sending out PLI's. But apparently as the video stays frozen it never receives an I frame.

Comment 13

•

7 years ago

So the new RTP clear text logger from Bug 1343640 tells me that the Fx on both sides apparently send PLI's, but never receive any PLI from the SFU.

One things which sticks out about the PLIs is that the sender SSRC in the receiver report and the PLI is not used in any of the streams. Although the source SSRC correctly points to the stream with the missing packets.

Interestingly I can reproduce similar problems with in our SFU room on appear.in. But I haven't looked at the clear text RTCP for that case.

Assignee

Comment 14

•

7 years ago

Attached patch use sender SSRC for receivers (obsolete) — Details — Splinter Review

This appears to fix the wrong-SSRC for receviers - give this a try.   https://treeherder.mozilla.org/#/jobs?repo=try&revision=ea77d78b045337653d94a22b3268bd725eccee87

Attachment #8843546 - Flags: review?(drno)

Assignee

Updated

•

7 years ago

Assignee: drno → rjesup

Status: NEW → ASSIGNED

Comment 15

•

7 years ago

Thanks Randell. I'll give it a try on Monday to check if this makes a difference.

Assignee

Comment 16

•

7 years ago

Attached patch use sender SSRC for receivers — Details — Splinter Review

had some oranges on linux only; suspect it's uninitialized RTT values if we don't exchange RCTP before it's measured.  Also added a debug

Attachment #8843602 - Flags: review?(drno)

Assignee

Updated

•

7 years ago

Attachment #8843546 - Attachment is obsolete: true

Attachment #8843546 - Flags: review?(drno)

Assignee

Comment 17

•

7 years ago

Much happier with the last patch: https://treeherder.mozilla.org/#/jobs?repo=try&author=rjesup@wgate.com&selectedJob=81673271

Flags: needinfo?(rjesup)

Comment 18

•

7 years ago

Today I mostly did not get the video working reliably at all. As it turned out the problem was that when you use iptables to randomly drop UDP packets sometime NSPR will report that as a permission error, which then terminates the connection at the transportlayer.
With attachment #8843602 [details] [diff] [review] applied plus a hack to ignore the NSPR permission errors I no longer saw any video freezes. Just a few short hangs which quickly (within 1-2s) recovered.
I did not have the time to test again without attachment #8843602 [details] [diff] [review] to check if the combination fixed things ultimately. I'll try to do that soon, because last week the video freezes recovered after a long time, where todays video freezes never recovered (I gave up waiting after >= 30s).

Comment 19

•

7 years ago

Just verified that in the RTP logs from the run with all patches and hacks PLI's appear and stop after big video packets are send/received.

Updated

•

7 years ago

Attachment #8843602 - Flags: review?(drno) → review+

Pulsebot

Comment 20

•

7 years ago

Pushed by rjesup@wgate.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/cfe4b63ed4f7
use sender SSRC for receivers r=drno

Assignee

Comment 21

•

7 years ago

Reminder to nominate for 54 and 53

Flags: needinfo?(rjesup)

Updated

•

7 years ago

Updated

•

7 years ago

Blocks: 1335469

Carsten Book [:Tomcat]

Comment 22

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/cfe4b63ed4f7

Status: ASSIGNED → RESOLVED

Closed: 7 years ago

status-firefox55: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla55

Jan-Ivar Bruaroey [:jib] (needinfo? me)

Updated

•

7 years ago

status-firefox52: fix-optional → wontfix

status-firefox-esr52: --- → fix-optional

Updated

•

7 years ago

Blocks: 1343691

Iulia Cristescu, QA [:JuliaC] (away, please needinfo? cornel.ionce@softvision.ro)

Assignee

Comment 24

•

7 years ago

Comment on attachment 8843602 [details] [diff] [review]
use sender SSRC for receivers

Approval Request Comment
[Feature/Bug causing the regression]: webrtc 49 update (FF53)

[User impact if declined]: Some webrtc conference servers may fail (i.e.. jitsi, a large provider)

[Is this code covered by automated tests?]: No; we don't care about this value in our code or check it (nor does chrome).  

[Has the fix been verified in Nightly?]: yes

[Needs manual test from QE? If yes, steps to reproduce]: We verified the fix using jitsi (we were able to repro it easily, and could see the server not forwarding packets). QE could do the same tests we did, but I don't think that adds anything.

[List of other uplifts needed for the feature/fix]: None

[Is the change risky?]: No 

[Why is the change risky/not risky?]: very simple change (which also fixes a timing hole that could make tests orange).

[String changes made/needed]: none

Flags: needinfo?(rjesup)

Attachment #8843602 - Flags: approval-mozilla-beta?

Attachment #8843602 - Flags: approval-mozilla-aurora?

Gerry Chang [:gchang]

Comment 25

•

7 years ago

Comment on attachment 8843602 [details] [diff] [review]
use sender SSRC for receivers

Fix a webrtc regression issue and was verified locally. Aurora54+ & Beta53+.

Attachment #8843602 - Flags: approval-mozilla-beta?

Attachment #8843602 - Flags: approval-mozilla-beta+

Attachment #8843602 - Flags: approval-mozilla-aurora?

Attachment #8843602 - Flags: approval-mozilla-aurora+

Carsten Book [:Tomcat]

Comment 26

•

7 years ago

bugherder uplift

https://hg.mozilla.org/releases/mozilla-aurora/rev/637d3f09f285

status-firefox54: affected → fixed

Carsten Book [:Tomcat]

Comment 27

•

7 years ago

bugherder uplift

https://hg.mozilla.org/releases/mozilla-beta/rev/01a36582b13a

status-firefox53: affected → fixed

Andrei Vaida [:avaida]

Comment 28

•

7 years ago

Iulia, could you please spot-check this on 53?

Flags: needinfo?(iulia.cristescu)

Reporter

Comment 29

•

7 years ago

I still manage to reproduce the issue on Windows 10 x64, Ubuntu 16.04 x64 and Mac OS X 10.11.6, using 53.0b5 build1 (20170320143328). The bug is also still encountered on 54.0a2 (2017-03-21) and 55.0a1 (2017-03-21), but the freeze lasts only for about 1 minute.

Flags: needinfo?(iulia.cristescu)

Assignee

Comment 30

•

7 years ago

Nils - please re-check ASAP on Nightly (and beta if nightly is working); check if the NACKs are getting through as well.

Status: RESOLVED → REOPENED

Flags: needinfo?(rjesup)

Resolution: FIXED → ---

Comment 31

•

7 years ago

Lulia can you get us a MOZ log file with Nightly with the following value:
  MOZ_LOG=timestamp,signaling:5,jsep:5,rtplogger:5

It would be ideal if you can get us the logs from both sides of the same identical call. Note: the logging from above only works in Nightly.
Then I'll have a look at the RTP packets in Wireshark from the log. That should hopefully give us quickly an idea of what kind of freeze you actually see there.

Flags: needinfo?(iulia.cristescu)

Iulia Cristescu, QA [:JuliaC] (away, please needinfo? cornel.ionce@softvision.ro)

Comment 32

•

7 years ago

I was able to repro short hangs in the video on Jitsi myself.

By looking at the new RTP logging I was able to find out that Firefox sends out NACK's for missing frames to the Jitsi server. These NACKs then appear to get forwarded to the sender. But the sender is using a completely different RTP sequence number range. So it is not surprising that it never sends the requested missing frame.

I contacted Jitsi and they already located and fixed the issue which caused the NACK's to be forwarded. The Jitsi server is suppose to terminate the NACKs and not forward them to the sender.
It is my understanding that they will also look into why their server is not responding to Firefox NACK's packets.

Flags: needinfo?(rjesup)

Reporter

Comment 33

•

7 years ago

(In reply to Nils Ohlmeier [:drno] from comment #31)
> Lulia can you get us a MOZ log file with Nightly with the following value:
>   MOZ_LOG=timestamp,signaling:5,jsep:5,rtplogger:5
> 
> It would be ideal if you can get us the logs from both sides of the same
> identical call. Note: the logging from above only works in Nightly.
> Then I'll have a look at the RTP packets in Wireshark from the log. That
> should hopefully give us quickly an idea of what kind of freeze you actually
> see there.

I held calls between 2 Windows 10 x64 stations, using 55.0a1 (2017-03-24) and having 

cd c:\
set MOZ_LOG=timestamp,signaling:5,jsep:5,rtplogger:5
set MOZ_LOG_FILE=%TEMP%\log.txt
cd "Program Files\Nightly"
.\firefox.exe

previously set. 

Once I've reproduced the problem and exited Firefox, looked for the generated log file, but this was empty every time. Is there anything wrong in the above environment settings?

Flags: needinfo?(iulia.cristescu) → needinfo?(drno)

Comment 34

•

7 years ago

Sorry I forgot to mention that rtp logging only work with e10s off (because the sandboxed content process can't write to file any more).

I got confirmation from Jitsi that remaining NACK issue got fixed on their end. And I just verified that NACKs now work as expected again. The fix on Jitis side is currently on beta.meet.jit.si, so probably not yet deployed on the production environment. I can check with Jitsi when the fix will be available in production.

Closing the bug since this was not a Firefox issue, and it's working now again.

Status: REOPENED → RESOLVED

Closed: 7 years ago → 7 years ago

Flags: needinfo?(drno)

Resolution: --- → FIXED

Comment 35

•

7 years ago

Is this something we want to uplift to ESR52 as well?

Flags: needinfo?(rjesup)

Assignee

Comment 36

•

7 years ago

I believe this is fallout from bug 1250356 - Webrtc 49 update, landed in FF 53

Blocks: 1250356

Flags: needinfo?(rjesup)

Assignee

Comment 37

•

7 years ago

Or rather - the code changed in 53.  It may be incorrect in 52 as well (and the initial report indicates this).  If so, it will need a slightly different patch I suspect.  Let's keep it on the radar for 52ESR, given the XP population ending up there

Flags: needinfo?(rjesup)

Comment 38

•

7 years ago

If we want this on ESR52, we should probably get this nominated for approval soon.

Assignee

Comment 39

•

7 years ago

Attached patch ESR52 patch for setting receive SSRC same as send SSRC (obsolete) — Details — Splinter Review

possible fix for this issue in 52 -- 52 is very different in this area, so the fix would need to be verified before uplift

Attachment #8855625 - Flags: review?(drno)

Assignee

Comment 40

•

7 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=ed3c9da09f44db1d12507cd5ecb5d85b93643cd7

Flags: needinfo?(rjesup)

Assignee

Comment 41

•

7 years ago

Could you try the 52ESR Try run in comment 40?  Thanks!

Flags: needinfo?(iulia.cristescu)

Comment 42

•

7 years ago

Comment on attachment 8855625 [details] [diff] [review]
ESR52 patch for setting receive SSRC same as send SSRC

Review of attachment 8855625 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM

::: media/webrtc/signaling/src/peerconnection/MediaPipelineFactory.cpp
@@ +826,5 @@
> +    if (aTrackPair.mSending) {
> +      auto ssrcs = &aTrackPair.mSending->GetSsrcs();
> +      if (!ssrcs->empty()) {
> +        conduit->SetLocalSSRC(ssrcs->front());
> +      }

I think we should have MOZ_MTLOG() and return NS_ERROR_FAILURE for the else case here to prevent silent failure when not setting the local SSRC.

Attachment #8855625 - Flags: review?(drno) → review+

Assignee

Comment 43

•

7 years ago

Attached patch ESR52 patch for setting receive SSRC same as send SSRC — Details — Splinter Review

Attachment #8855939 - Flags: review?(drno)

Assignee

Updated

•

7 years ago

Attachment #8855625 - Attachment is obsolete: true

Comment 44

•

7 years ago

Comment on attachment 8855939 [details] [diff] [review]
ESR52 patch for setting receive SSRC same as send SSRC

Review of attachment 8855939 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM

Attachment #8855939 - Flags: review?(drno) → review+

Iulia Cristescu, QA [:JuliaC] (away, please needinfo? cornel.ionce@softvision.ro)

Assignee

Comment 45

•

7 years ago

NI to nils to test this on Monday; once verified I'll put it up for esr52

Flags: needinfo?(drno)

Reporter

Comment 46

•

7 years ago

(In reply to Randell Jesup [:jesup] from comment #41)
> Could you try the 52ESR Try run in comment 40?  Thanks!

The issue is not reproducible anymore on the the 52ESR Try run in comment 40.

Flags: needinfo?(iulia.cristescu)

Iulia Cristescu, QA [:JuliaC] (away, please needinfo? cornel.ionce@softvision.ro)

Reporter

Comment 47

•

7 years ago

Also, verified 55.0a1 (2017-04-10), 54.0a2 (2017-04-10) and 53.0b9 (20170403072723) builds.  Nightly and Aurora builds are verified fixed, but the issue is still reproducible on the beta build.

Iulia Cristescu, QA [:JuliaC] (away, please needinfo? cornel.ionce@softvision.ro)

Reporter

Updated

•

7 years ago

Status: RESOLVED → VERIFIED

status-firefox54: fixed → verified

status-firefox55: fixed → verified

Comment 48

•

7 years ago

I also tried the ESR try build for 5 min call. I had some very small video freezes, but they recovered within 1-3 seconds. After about 4min or so on side the remote video got greyed out with a message that the video got turned of due to bandwidth limitations. But that sounds like a Jitsi issue to me. So the ESR build seems to work fine.

Flags: needinfo?(drno)

Comment 49

•

7 years ago

Please request ESR52 approval on this when you get a chance.

Flags: needinfo?(rjesup)

Ritu Kothari (:ritu) (Inactive, please n-i to RyanVM, jcristau, or pascal)

Assignee

Comment 50

•

7 years ago

Comment on attachment 8855939 [details] [diff] [review]
ESR52 patch for setting receive SSRC same as send SSRC

[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration: Failure of a major WebRTC provider

User impact if declined: ESR users unable to use Jitsi (and maybe some others)

Fix Landed on Version: 55

Risk to taking this patch (and alternatives if risky): Very low risk, just changes where it gets the default SRRC value.

String or UUID changes made by this patch: none

See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info.

Flags: needinfo?(rjesup)

Attachment #8855939 - Flags: approval-mozilla-esr52?

Comment 51

•

7 years ago

Comment on attachment 8855939 [details] [diff] [review]
ESR52 patch for setting receive SSRC same as send SSRC

Video freezing isn't a good experience, the fix has been verified in a few places already, ESR52.2+

Attachment #8855939 - Flags: approval-mozilla-esr52? → approval-mozilla-esr52+