Closed Bug 1603474 Opened 4 years ago Closed 4 years ago

[Intermittent] Broken connectivity on herokuapp.com

Tracking

()

Status:

RESOLVED INCOMPLETE

Tracking Flags:

Tracking

Status

firefox72

---

affected

firefox73

---

affected

People

(Reporter: asoncutean, Assigned: dminor)

References

(Blocks 2 open bugs)

Details

Attachments

(2 files, 1 obsolete file)

share not active.png 4 years ago Catalin Sasca, Desktop QA [:csasca] 33.67 KB, image/png		Details
webrtc.png 4 years ago Catalin Sasca, Desktop QA [:csasca] 57.53 KB, image/png		Details
Bug 1603474 - Pair reflex candidates when hostname obfuscation used; r=bwc! 4 years ago Dan Minor [:dminor] 47 bytes, text/x-phabricator-request		Details \| Review

Anca Soncutean, Desktop QA

Reporter

Description

•

4 years ago

[Affected versions]:

Fx 72.0b5
Fx 73.0a1

[Affected platforms]:

I don’t see a general pattern here, the issue appears between different Platforms/ Browsers (Windows 10, Windows 7, Ubuntu 18.04, macOS 10.15 / Firefox, Chrome, Safari) combination, regarding on which end the call initiator, respectively the receiver is.

[Steps to reproduce]:

Open https://evening-thicket-98446.herokuapp.com/src/content/peerconnection/filetransfer-b2b/#callingId=192378 on one end
Copy-paste “ this link ” on another end
Initiate a call
Observe the Share button

[Expected result]:

The Share button is active

[Actual result]:

The Share button is inactive

[Regression range]:

Not sure if we can determine one since the issue is intermittent, but I will give it a try asap.

[Additional notes]:

It looks like media.peerconnection.ice.obfuscate_host_addresses set on False still triggers this issue intermittently

Jan-Ivar Bruaroey [:jib] (needinfo? me)

Comment 1

•

4 years ago

I'm not able to reproduce on a MBP between Nightly (other end in Chrome).

Do you see any error messages in web console or browser console when the button is inactive? On either end? Any indication of failures in about:webrtc?

Are you only seeing this problem in Firefox?

Flags: needinfo?(anca.soncutean)

Catalin Sasca, Desktop QA [:csasca]

Comment 2

•

4 years ago

Attached image share not active.png — Details

Ok, so I reproduced it on second try. on Windows 10 [72.0b5] to macOS 10.15 [Chrome]. I'll attach a screenshot with console browser here.

Flags: needinfo?(anca.soncutean)

Catalin Sasca, Desktop QA [:csasca]

Comment 3

•

4 years ago

•

Edited

Attached image webrtc.png — Details

And here is a screenshot with the data from about:webrtc page on Windows 10's side.

Dan Minor [:dminor]

Assignee

Comment 4

•

4 years ago

What I find strange is that Firefox never seems to use the srflx candidate even though it shows up in the local sdp. So if something goes wrong with the host candidate, then the connection fails. From comment#0, this seems to happen occasionally even if the mDNS stuff is disabled.

When testing, you have to be very careful to only have two systems using the same callingId. I thought I had it reproducing 100% of the time between Firefox on Windows 10 and Chrome on OS X only to realize I had left a Chrome instance open on the Windows 10 machine using the same callingId.

There are a few timing things that might be at fault here. We shouldn't fail ICE if we have a pending mDNS query, but that might be happening. We might need to extend the ICE trickle timeout period to allow for time to resolve mDNS addresses. We might need to allow more time in the mDNS resolver itself. We might need to see why we're never using the local srflx address.

This didn't show up with QA was testing with Nightly a few weeks ago, but both Anca and I have reproduced it with Nightly builds from that timeframe. So it's possible that Chrome has tightened up their timing and that is why we're seeing this problem now.

Assignee: nobody → dminor

Dan Minor [:dminor]

Assignee

Comment 5

•

4 years ago

It looks to me like Firefox will never pair a srflx or prflx address [1]. So what we're seeing above is that the host candidate is failing for one reason or another and the connection fails. If I simulate this by commenting out host candidate generation here [2], the connection fails every single time. If I remove the "goto: done" from [1], it will succeed using a srflx or prflx address.

There's likely some improvements to be made to the timing of the mDNS stuff to make failures on the local network less likely, but mDNS host candidates are never going to work across network boundaries, so it seems like we need to be using reflex candidates if the host candidates fail.

:bwc, do you know why we don't pair srflx and prflx addresses? Is there somewhere else where reflex candidates should be used that we're somehow missing with the mDNS stuff enabled? Thanks!

[1] https://searchfox.org/mozilla-central/rev/2f09184ec781a2667feec87499d4b81b32b6c48e/media/mtransport/third_party/nICEr/src/ice/ice_component.c#1082
[2] https://searchfox.org/mozilla-central/rev/2f09184ec781a2667feec87499d4b81b32b6c48e/media/mtransport/third_party/nICEr/src/ice/ice_component.c#239

Flags: needinfo?(docfaraday)

Byron Campen [:bwc]

Comment 6

•

4 years ago

The thought is that if we have a prflx or srflx, we have already paired the host candidate that they are based on, because we don't want to create redundant pairs. But I guess if we fail to gather the host candidate we end up in this weird situation where we can send packets and get srflx/prflx without ever learning what the host candidate is, and so there's never a pair.

Flags: needinfo?(docfaraday)

Dan Minor [:dminor]

Assignee

Comment 7

•

4 years ago

Attached file Bug 1603474 - Pair reflex candidates when hostname obfuscation used; r=bwc! (obsolete) — Details

Under normal circumstances reflex candidates are not paired because they will
be redundant with host candidates. When hostname obfuscation is used, we can
get in a situation where host candidates fail but reflex candidates will
succeed so it makes sense to pair them in this case.

Mihai Boldan, Desktop QA [:mboldan]

Updated

•

4 years ago

Has Regression Range: --- → no

Keywords: regressionwindow-wanted

Anca Soncutean, Desktop QA

Reporter

Comment 8

•

4 years ago

It looks like this issue is not a regression, I’ve reproduced it way back to Fx 45.0a1 (older builds are either broken, or the site is not functional on those earlier versions). “Share” button remains inactive , regardless from where the call is initiated (Firefox or Chrome).

Has Regression Range: no → ---

Keywords: regressionwindow-wanted

Phabricator Automation

Updated

•

4 years ago

Attachment #9115902 - Attachment is obsolete: true

Michael Froman [:mjf]

Updated

•

4 years ago

Priority: -- → P2

Jan-Ivar Bruaroey [:jib] (needinfo? me)

Comment 9

•

4 years ago

•

Edited

Based on rise in prevalence of symptoms—which seems relevant to determining severity—should this perhaps be viewed as a regression from the introduction of mDNS concealment of host candidates?

Dan Minor [:dminor]

Assignee

Comment 10

•

4 years ago

(In reply to Jan-Ivar Bruaroey [:jib] (needinfo? me) from comment #9)

Based on rise in prevalence of symptoms—which seems relevant to determining severity—should this perhaps be viewed as a regression from the introduction of mDNS concealment of host candidates?

Well, this is already marked as blocking the mDNS meta bug. I'm not even sure we can say there is a rise in prevalence of symptoms, this site was used for testing mDNS for Nightly 72 and no problems were noticed at that time.

Dan Minor [:dminor]

Assignee

Comment 11

•

4 years ago

One thing that just occurred to me is that we have the network/socket process enabled on Firefox Nightly, but as far as I know not on Beta and perhaps that is why things are behaving differently. I've been doing most of my testing on Nightly and have not seen consistent problems.

BugBot [:suhaib / :marco/ :calixte]

Comment 12

•

4 years ago

Bugbug thinks this bug is a regression, but please revert this change in case of error.

Keywords: regression

Julien Cristau [:jcristau]

Updated

•

4 years ago

Keywords: regression

Dan Minor [:dminor]

Assignee

Updated

•

4 years ago

Status: NEW → RESOLVED

Closed: 4 years ago

Resolution: --- → INCOMPLETE

You need to log in before you can comment on or make changes to this bug.