Closed Bug 1620844 Opened 4 years ago Closed 4 years ago

InvalidStateError: Cannot set remote offer in state have-local-offer

Categories

(Core :: WebRTC: Signaling, defect, P2)

75 Branch
Desktop
All
defect

Tracking

()

VERIFIED FIXED
mozilla76
Tracking Status
firefox-esr68 --- unaffected
firefox74 --- unaffected
firefox75 --- verified
firefox76 --- verified

People

(Reporter: jib, Assigned: jib)

References

(Regression)

Details

(Keywords: regression)

Crash Data

Attachments

(1 file)

The above error should never happen according to the spec, which says SRD(offer) should roll back from state "have-local-offer".

STRs:

  1. Open https://jsfiddle.net/jib1/3ewa1vuj/14
  2. Hit JSFiddle's Run button to produce the error (happens 1st time for me, but try ~10 times)

Expected result:

  • No error

Actual result (shown outside frame):

213: MSG InvalidStateError: Cannot set remote offer in state have-local-offer
214: MSG InvalidStateError: Cannot add ICE candidate when there is no remote SDP
214: MSG InvalidStateError: Cannot add ICE candidate when there is no remote SDP
215: MSG InvalidStateError: Cannot add ICE candidate when there is no remote SDP

Regression range:

17:49.38 INFO: No more inbound revisions, bisection finished.
17:49.38 INFO: Last good revision: dd067bf8dec86029202bdef6ea7ab9bc36687f49
17:49.38 INFO: First bad revision: 61ca258cdbbfc126019450087db4c97b1c27d2f3
17:49.38 INFO: Pushlog:
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=dd067bf8dec86029202bdef6ea7ab9bc36687f49&tochange=61ca258cdbbfc126019450087db4c97b1c27d2f3

I did some instrumentation locally in PeerConnection.jsm's SRD rollback code:

     return this._chain(async () => {
       const haveSetRemote = (async () => {
+        dump(`JIB1: in SRD chain. (type = ${type}, this.signalingState = ${this.signalingState})\n`);
         if (type == "offer" && this.signalingState == "have-local-offer") {
+          dump(`JIB1: rolling back...\n`);
           await new Promise((resolve, reject) => {
             this._onSetDescriptionSuccess = resolve;
             this._onSetDescriptionFailure = reject;
             this._impl.setLocalDescription(
               Ci.IPeerConnection.kActionRollback,
               ""
             );
           });
+          dump(`JIB1: ...rollback succeeding\n`);
           this._processTrackAdditionsAndRemovals();
           this._fireLegacyAddStreamEvents();
           this._transceivers = this._transceivers.filter(t => !t.shouldRemove);
           this._updateCanTrickle();
+          dump(`JIB1: SRD continuing.\n`);
         }
+        dump(`JIB1: regular SRD\n`);
         this._sanityCheckSdp(sdp);
         const p = this._getPermission();
         await new Promise((resolve, reject) => {
           this._onSetDescriptionSuccess = resolve;
           this._onSetDescriptionFailure = reject;
           this._impl.setRemoteDescription(this._actions[type], sdp);
         });
         await p;

..and here's the output I get on repro:

JIB1: in SRD chain. (type = offer, this.signalingState = stable)
JIB1: regular SRD

This is when it breakpoints in JsepSessionImpl here:

  switch (mState) {
    case kJsepStateStable:
      if (type != kJsepSdpOffer) {
        JSEP_SET_ERROR("Cannot set remote answer in state "
                       << GetStateStr(mState));
        return dom::PCError::InvalidStateError;
      }
      break;
    case kJsepStateHaveLocalOffer:
    case kJsepStateHaveRemotePranswer:
      if (type != kJsepSdpAnswer && type != kJsepSdpPranswer) {
-->     JSEP_SET_ERROR("Cannot set remote offer in state "
                       << GetStateStr(mState));
        return dom::PCError::InvalidStateError;

...which shows that mState is kJsepStateHaveLocalOffer while this.signalingState is "stable", inconsistent state, likely from a race.

Byron, thoughts about this?

Flags: needinfo?(docfaraday)

Nevermind, something is broken in the operations chain here. this._operations goes to 0 when it shouldn't... https://jsfiddle.net/jib1/buneL4dq/15/

I'll look into it.

Assignee: nobody → jib
Flags: needinfo?(docfaraday)

Found it using simpler https://jsfiddle.net/jib1/buneL4dq/15/ STRs. Our operations chain code is vulnerable to recursion, which now happens because we chain negotiationneeded. I'll rewrite it to match the spec better.

Pushed by jbruaroey@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/22c1c148d495
Fix operations chain to withstand recursion by negotiationneeded. r=bwc
Failed to create upstream wpt PR due to merge conflicts. This requires fixup from a wpt sync admin.
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla76
See Also: → 1621770

Comment on attachment 9131889 [details]
Bug 1620844 - Fix operations chain to withstand recursion by negotiationneeded.

Beta/Release Uplift Approval Request

  • User impact if declined: WebRTC calls may intermittently fail to connect or update with changes.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce: Repro steps in comment 0.
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Well-tested behaviors. New code follows spec verbatim, and is testing well.

Regression was caused by attempt to fine-tune Firefox's operations chain which deviated from the spec algorithm in a way that made it handle recursion wrong (and recursion became critical to handle right with code introduced in bug 1614803). More tests coming in bug 1621399.

  • String changes made/needed:
Attachment #9131889 - Flags: approval-mozilla-beta?

Comment on attachment 9131889 [details]
Bug 1620844 - Fix operations chain to withstand recursion by negotiationneeded.

webrtc regression fix, approved for 75.0b5

Attachment #9131889 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Flags: qe-verify+
QA Whiteboard: [qa-triaged]

I have reproduced the issue in Nightly v76.0a1 from 2020-03-15 and Beta v75.0b4 and verified the fix in Nightly v76.0a1 from 2020-03-18 and Beta v758.0b5 with the test case from comment 2 on Windows 10 and Ubuntu 18.

Status: RESOLVED → VERIFIED
Flags: qe-verify+
Hardware: Unspecified → Desktop
Crash Signature: [@ mozilla::JsepSessionImpl::CheckNegotiationNeeded]

(In reply to Web Platform Test Sync Bot (Matrix: #interop:mozilla.org) from comment #7)

Failed to create upstream wpt PR due to merge conflicts. This requires fixup
from a wpt sync admin.

Blocked on https://github.com/web-platform-tests/wpt/pull/22128

Flags: needinfo?(james)
Created web-platform-tests PR https://github.com/web-platform-tests/wpt/pull/22390 for changes under testing/web-platform/tests
Upstream PR merged by moz-wptsync-bot
Flags: needinfo?(james)
Has Regression Range: --- → yes
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: