Closed Bug 1138794 Opened 9 years ago Closed 9 years ago

crash-reports.m.c doesn't accept crash reports from Windows XP SP2

Categories

(Toolkit :: Crash Reporting, defect)

x86_64
Windows 7
defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla39
Tracking Status
firefox36 ? affected
firefox37 + verified
firefox38 --- verified
firefox39 --- verified

People

(Reporter: away, Assigned: away)

References

Details

(Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/668] )

Attachments

(1 file)

On my XPSP2 VM, I forced a crash with CrashMe and tried to send. The crash reporter app said "There was a problem submitting your report."

I debugged it down to a failed HttpSendRequest call. While debugging, I couldn't use .srcfix, because it would give me messages like: The file 'https://hg.mozilla.org/releases/mozilla-release/raw-file/88c5342693e3/toolkit/crashreporter/client/crashreporter_win.cpp' cannot be opened

I can't open that URL in IE on that machine. But I can if I take off the https. Which makes me wonder: is the crash reporter issue fallout from Poodle?
hg.mozilla.org enables only AES which is unavailable on WinXP (even with SP3).
Also, crash-stats.mozilla.com uses SHA2 certificate, so brekepad couldn't send the crash report from SP2, I guess.
Oh, that's bad. We should fix that. We submit to https://crash-reports.mozilla.com, FWIW, so we only have to change the cert for that domain and not all of crash-stats.
To be clear--I think the fix here is going to be "change the cert we're using for crash-reports".
I'd like OpSec to weigh in on this; pinging :ulfr in the hopes that he can point us in the right direction. :)
Flags: needinfo?(jvehent)
:emk is correct

$ ./cipherscan hg.mozilla.org
.....
Target: hg.mozilla.org:443

prio  ciphersuite         protocols              pfs_keysize
1     DHE-RSA-AES128-SHA  TLSv1,TLSv1.1,TLSv1.2  DH,1024bits
2     DHE-RSA-AES256-SHA  TLSv1,TLSv1.1,TLSv1.2  DH,1024bits
3     AES128-SHA          TLSv1,TLSv1.1,TLSv1.2
4     AES256-SHA          TLSv1,TLSv1.1,TLSv1.2

Certificate: trusted, 2048 bit, sha1WithRSAEncryption signature
TLS ticket lifetime hint: None
OCSP stapling: supported
Server side cipher ordering


You should modify the ZLB conf for hg.m.o to use the "Old backward compatibility" [1] like www.mozilla.org does. Talk to :jakem.

[1] https://wiki.mozilla.org/Security/Server_Side_TLS#Old_backward_compatibility
Flags: needinfo?(jvehent)
(In reply to Julien Vehent [:ulfr] (use needinfo) from comment #5)
> You should modify the ZLB conf for hg.m.o to use the "Old backward
> compatibility" [1] like www.mozilla.org does. Talk to :jakem.
> 
> [1]
> https://wiki.mozilla.org/Security/Server_Side_TLS#Old_backward_compatibility

Calling Dr. Maul - Dr. Maul to the bug, please.
Flags: needinfo?(nmaul)
Tracking to keep it on our radar as we had a few XP issues lately!
So uh, no, I don't think we care if https://hg.mozilla.org/ works from XP SP2, those people can just suck it up (or use Firefox). The problem is https://crash-reports.mozilla.com/ not working from XP SP2, which means we can't submit crash reports from XP SP2.
(In reply to Daniel Maher [:phrawzty] from comment #6)
> (In reply to Julien Vehent [:ulfr] (use needinfo) from comment #5)
> > You should modify the ZLB conf for hg.m.o to use the "Old backward
> > compatibility" [1] like www.mozilla.org does. Talk to :jakem.
> > 
> > [1]
> > https://wiki.mozilla.org/Security/Server_Side_TLS#Old_backward_compatibility
> 
> Calling Dr. Maul - Dr. Maul to the bug, please.

jakem - per comment 8 we're not worried about hg.m.o but we do need to do this for crash-reports.mozilla.com

Not sure what module/component this should live in, feel free to move to somewhere more appropriate.
Flags: needinfo?(nmaul)
Summary: Can't send crash report from XPSP2 → crash-reports.m.c doesn't accept crash reports from Windows XP SP2
(In reply to Robert Helmer [:rhelmer] from comment #9)
> jakem - per comment 8 we're not worried about hg.m.o but we do need to do
> this for crash-reports.mozilla.com
> 
> Not sure what module/component this should live in, feel free to move to
> somewhere more appropriate.
Flags: needinfo?(nmaul)
I think something may be wrong with ulfr's test in comment 5. When I run the same tool (from people.mozilla.org), I do get a DES-CBC3-SHA cipher:

Target: crash-reports.mozilla.com:443

prio  ciphersuite         protocols              pfs_keysize
1     DHE-RSA-AES128-SHA  TLSv1,TLSv1.1,TLSv1.2  DH,1024bits
2     DHE-RSA-AES256-SHA  TLSv1,TLSv1.1,TLSv1.2  DH,1024bits
3     AES128-SHA          TLSv1,TLSv1.1,TLSv1.2
4     AES256-SHA          TLSv1,TLSv1.1,TLSv1.2
5     DES-CBC3-SHA        TLSv1,TLSv1.1,TLSv1.2

Certificate: trusted, 2048 bit, sha256WithRSAEncryption signature
TLS ticket lifetime hint: None
OCSP stapling: supported
Server side cipher ordering


The real problem here, I'm guessing is one of 2 things:

1) No SSLv3 support.
2) The current certificate (issued Feb 11, *2014* - just over a year old now) is a SHA-2 cert.

Any idea which might be the bigger problem here?


I can fix both of them, but there are deeper ramifications to the latter. SHA-1 certs are increasingly looked-down-upon, and I don't know how the Firefox crash reporter would handle it these days. If it's like Chrome or the current Nightly, it will complain... and that might be far, far worse to crash collection than doing nothing.

The ramifications to SSLv3 are entirely security-related, and :ulfr is better positioned to address that than I am.


Who can give us some direction on what we should really be doing here? I'm concerned that making this change might be bad for crash collection from newer systems, and I don't want to hobble reporting for the majority of users so that we can better support a (shrinking) minority. Maybe the next step is for someone to chime in with what the crash reporter will actually do when faced with a reporting server using a SHA-1 cert.
Flags: needinfo?(nmaul)
Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/668]
(In reply to Jake Maul [:jakem] from comment #11)
> Who can give us some direction on what we should really be doing here? I'm
> concerned that making this change might be bad for crash collection from
> newer systems, and I don't want to hobble reporting for the majority of
> users so that we can better support a (shrinking) minority. Maybe the next
> step is for someone to chime in with what the crash reporter will actually
> do when faced with a reporting server using a SHA-1 cert.

dmajor/ted, any thoughts here? How can we test that we're not breaking modern systems? Do we know how many users are on XP SP2 (I didn't see it in ADI data?)

We have a staging site (crash-reports.allizom.org), so we if we can round up enough testing to feel comfortable we could use that.
Flags: needinfo?(ted)
Flags: needinfo?(dmajor)
We have enough users on XPSP2 that we need to keep supporting them for a while.
The solution is to make a special endpoints for XPSP2 users that points to a version of crash reports with SSLv3 and a SHA-1 cert.

The rest of crash reports should use the main crash report ssite with the intermediate TLS conf (TLS1, sha-256 cert, PFS, ...).

Is that doable?
After some more testing, I think this is related to the machine's IE version rather than the XP service pack (because the crash reporter uses wininet APIs). I have an XPSP3 VM that's still on IE6 and it gets the same error.

So instead of testing the service pack version, perhaps we could have a general fallback, where if the first send fails then try an alternate server. Not sure if there are security implications to that.
Flags: needinfo?(dmajor)
(In reply to Jake Maul [:jakem] from comment #11)
> Certificate: trusted, 2048 bit, sha256WithRSAEncryption signature

Could you please read my comment again?
Quoting here:
> Also, crash-stats.mozilla.com uses SHA2 certificate, so brekepad
> couldn't send the crash report from SP2, I guess.

XPSP2 doesn't support SHA2 certificate (Google it if you don't believe).
(In reply to David Major [:dmajor] (UTC+13) from comment #15)
> After some more testing, I think this is related to the machine's IE version
> rather than the XP service pack (because the crash reporter uses wininet
> APIs). I have an XPSP3 VM that's still on IE6 and it gets the same error.

IE uses schannel, so the operating system version is relevant rather than IE. I don't know why XPSP3 fails.
More properly, IE uses wininet that uses schannel for secure connections.
> (In reply to David Major [:dmajor] (UTC+13) from comment #15)
> > After some more testing, I think this is related to the machine's IE version
> > rather than the XP service pack (because the crash reporter uses wininet
> > APIs). I have an XPSP3 VM that's still on IE6 and it gets the same error.
> 
> IE uses schannel, so the operating system version is relevant rather than
> IE. I don't know why XPSP3 fails.

I recalled WinXP with IE6 didn't enable TLS 1.0 by default. Did you enable TLS 1.0 on the VM?

That said, we can't assume every users will enable TLS 1.0. Users can search for Firefox on Google (SSLv3 enabled), then download Firefox on www.mozilla.org (SSLv3 enabled). Perhaps we will have to enable TLS 1.0 on the breakpad process.
We can override the system default by setting grbitEnabledProtocols in the SCHANNEL_CRED structure:
https://msdn.microsoft.com/en-us/library/windows/desktop/aa379810%28v=vs.85%29.aspx

Unfortunately I don't know how to use this from wininet (if at all possible).
(In reply to Masatoshi Kimura [:emk] from comment #19)
> I recalled WinXP with IE6 didn't enable TLS 1.0 by default. Did you enable
> TLS 1.0 on the VM?

Good catch. After enabling TLS 1.0 on SP3, I can send the crash report. On SP2 it doesn't help.
[Tracking Requested - why for this release]:
Given that the latest comments sound like we might need product/Breakpad changes to get this resolved, I feel inclined to nominate this for 36 even. :-/
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #22)
> [Tracking Requested - why for this release]:
> Given that the latest comments sound like we might need product/Breakpad
> changes to get this resolved, I feel inclined to nominate this for 36 even.
> :-/

Another way is to enable SSLv3 on crash-report.mozilla.com, just like www.mozilla.org. Also crash-report.mozilla.com will have to use a SHA1 certificate to support SP2.
(In reply to Robert Helmer [:rhelmer] from comment #12)
> dmajor/ted, any thoughts here? How can we test that we're not breaking
> modern systems? Do we know how many users are on XP SP2 (I didn't see it in
> ADI data?)

We use OS-provided networking APIs for crash report submission, so it's a bit tricky to verify, but if we set up a test cert in staging we could certainly test submission on the latest versions of Windows/OS X/Linux.

Per other discussion above talking about falling back, a plausible way to fix that would be to note when the native crash reporter client fails to send, and just attempt to automatically resubmit the report from within Firefox when it restarts using the same code that about:crashes uses to submit. (This is basically how B2G submits crash reports.) Since that would use NSS and Necko it wouldn't hit the same problem as the native crash reporter. The only downside would be not being able to submit crashes for users whose browsers crash on startup on XP SP2.
Flags: needinfo?(ted)
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #24)
> The only downside would be not being able to submit crashes for
> users whose browsers crash on startup on XP SP2.

Bug 1137609 is the very example of those failures :(
(In reply to Masatoshi Kimura [:emk] from comment #23)
> Another way is to enable SSLv3 on crash-report.mozilla.com, just like
> www.mozilla.org. Also crash-report.mozilla.com will have to use a SHA1
> certificate to support SP2.

My understanding is that crash reports contain sensitive user data, and as such should benefit from the best available encryption possible. We shouldn't diminish the security provided to the majority of our users because of a few blocked on XPSP2.
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #13)
> We have enough users on XPSP2 that we need to keep supporting them for a while.

If that is the ask, I propose that Firefox points to a special crash-report endpoint when running on XPSP2, such as crash-report-xp.mozilla.com, where a SHA-1 certificate and SSLv3 will be enabled.

If this solution is chosen, the next step would be to request such an endpoint with the IT Webops team.
Changing the URL based on the windows version is very difficult from the client.
Then the alternative is to diminish the security of crash-reports entirely, enable SSLv3 and use a SHA-1 cert. It's only going to work until clients block SHA-1 certs for good, at which point we will need to deprecate XPSP2 entirely, or engineer alternatives.

:bsmedberg - could you confirm that we should enable SSLv3 and set a SHA-1 cert on crash-report today?
Flags: needinfo?(benjamin)
> Changing the URL based on the windows version is very difficult from the client.
Out of curiosity, what makes it difficult?
Currently the server URL is specified in an ini file:
https://hg.mozilla.org/mozilla-central/annotate/eab4a81e4457/build/application.ini#l50

which is compiled into a header file that gets included:
https://hg.mozilla.org/mozilla-central/annotate/eab4a81e4457/browser/app/nsBrowserApp.cpp#l8

which gets passed down to the crash reporter code during startup:
https://hg.mozilla.org/mozilla-central/annotate/eab4a81e4457/toolkit/xre/nsAppRunner.cpp#l3178

and gets written out to the .extra file as an annotation:
https://hg.mozilla.org/mozilla-central/annotate/eab4a81e4457/toolkit/crashreporter/nsExceptionHandler.cpp#l1919

and pulled out by the client for use:
https://hg.mozilla.org/mozilla-central/annotate/eab4a81e4457/toolkit/crashreporter/client/crashreporter.cpp#l611

So! I think bsmedberg's contention is simply that doing it in a clean way would be difficult, but we could certainly do a crappy hack in the crashreporter client, somewhere around here:
https://hg.mozilla.org/mozilla-central/annotate/eab4a81e4457/toolkit/crashreporter/client/crashreporter_win.cpp#l1301

like:
if (wcsncmp(sendURL, u"https://crash-reports.mozilla.org/submit", 40) && isXPSP2()) {
  sendURL = u"https://crash-reports-xpsp2.mozilla.org/submit"
}

}
I may be a crappy hack, but I like that a lot better than enabling weak crypto for the entirety of our user base. Particularly for data as sensitive as crash reports.
We need to make a call on how to proceed and get the fix into 37.

Julien, Benjamin - Are you the right two people to work this out?
Flags: needinfo?(jvehent)
My understanding was that we were going to be able to have a TLS system for sites that need to support both WinXP and newer versions and negotiated the correct signing system based on the OS information.

If that's not true, then I guess we should try Ted's hack. It's pretty gross, but for lack of a better option...

dmajor, can you take this and run some tests?
Assignee: nobody → dmajor
Flags: needinfo?(benjamin)
As for the tracking status, this wasn't caused by a client regression but by a server-side change. We should revert the server config until the new client is in the field.
I'm *hoping* that we can do dual-stack on our existing ZLBs, but it isn't certain yet. I asked webops to test this. If crash-reports is ever moved to AWS though, and terminated by ELBs, we won't have a way to do dual stack.

r+ for the config rollback, we'll use the old conf on that site [1]. Jake: Is this something you can take care of?

[1] https://wiki.mozilla.org/Security/Server_Side_TLS#Old_backward_compatibility
Flags: needinfo?(jvehent) → needinfo?(nmaul)
Flags: needinfo?(dmajor)
(In reply to David Major [:dmajor] (UTC+13) from comment #21)
> (In reply to Masatoshi Kimura [:emk] from comment #19)
> > I recalled WinXP with IE6 didn't enable TLS 1.0 by default. Did you enable
> > TLS 1.0 on the VM?
> 
> Good catch. After enabling TLS 1.0 on SP3, I can send the crash report. On
> SP2 it doesn't help.

Should we do anything for SP3-with-IE6 in this bug?
Flags: needinfo?(benjamin)
Well I can't exactly test this without crash-reports-xpsp2 existing, but I stepped through the version check and string replace in a debugger.

First reviewer feel free to cancel the other.
Flags: needinfo?(dmajor)
Attachment #8577079 - Flags: review?(ted)
Attachment #8577079 - Flags: review?(benjamin)
(In reply to David Major [:dmajor] (UTC+13) from comment #37)
> (In reply to David Major [:dmajor] (UTC+13) from comment #21)
> > (In reply to Masatoshi Kimura [:emk] from comment #19)
> > > I recalled WinXP with IE6 didn't enable TLS 1.0 by default. Did you enable
> > > TLS 1.0 on the VM?
> > 
> > Good catch. After enabling TLS 1.0 on SP3, I can send the crash report. On
> > SP2 it doesn't help.
> 
> Should we do anything for SP3-with-IE6 in this bug?

If the user doesn't enable TLS 1.0, the user will not be able to send a crash report. See th second paragraph of comment #19.
(In reply to David Major [:dmajor] (UTC+13) from comment #37)
> (In reply to David Major [:dmajor] (UTC+13) from comment #21)
> > (In reply to Masatoshi Kimura [:emk] from comment #19)
> > > I recalled WinXP with IE6 didn't enable TLS 1.0 by default. Did you enable
> > > TLS 1.0 on the VM?
> > 
> > Good catch. After enabling TLS 1.0 on SP3, I can send the crash report. On
> > SP2 it doesn't help.
> 
> Should we do anything for SP3-with-IE6 in this bug?

Good point. If IE6 is used and doesn't enable TLS1 by default, then maybe all versions of XP should be pointed to the special XP URL, with SSL3 and SHA-1 certs?
(In reply to Julien Vehent [:ulfr] (use needinfo) from comment #40)
> Good point. If IE6 is used and doesn't enable TLS1 by default, then maybe
> all versions of XP should be pointed to the special XP URL, with SSL3 and
> SHA-1 certs?

That seems pretty reasonable.
Comment on attachment 8577079 [details] [diff] [review]
Hack to use an alternate server on SP2

Review of attachment 8577079 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/crashreporter/client/crashreporter_win.cpp
@@ +1305,5 @@
>    gSendData.sendURL = UTF8ToWide(sendURL);
>  
> +  // Windows XP SP2 doesn't support the crash report server's crypto.
> +  // This is a hack to use an alternate server.
> +  if (!IsWindowsXPSP3OrGreater() &&

Per the comment above maybe we just want to do this on all WinXP? Otherwise this looks about as good as this hack can look.
Attachment #8577079 - Flags: review?(ted)
Attachment #8577079 - Flags: review?(benjamin)
Attachment #8577079 - Flags: review+
(In reply to Julien Vehent [:ulfr] (use needinfo) from comment #36)
> I'm *hoping* that we can do dual-stack on our existing ZLBs, but it isn't
> certain yet. I asked webops to test this. If crash-reports is ever moved to
> AWS though, and terminated by ELBs, we won't have a way to do dual stack.

We are literally in the process of moving the entire crash-stats stack to AWS and ELBs are absolutely part of the equation.  We expect to have a Staging environment up by the end of the month.  Production will occur within 2015.

Just putting that out there. :)
(In reply to Daniel Maher [:phrawzty] from comment #43)
> (In reply to Julien Vehent [:ulfr] (use needinfo) from comment #36)
> > I'm *hoping* that we can do dual-stack on our existing ZLBs, but it isn't
> > certain yet. I asked webops to test this. If crash-reports is ever moved to
> > AWS though, and terminated by ELBs, we won't have a way to do dual stack.
> 
> We are literally in the process of moving the entire crash-stats stack to
> AWS and ELBs are absolutely part of the equation.  We expect to have a
> Staging environment up by the end of the month.  Production will occur
> within 2015.
> 
> Just putting that out there. :)

If dual-stack is the right way to do this, we can have two ELBs - one with SSL configured to accept crashes from XP, and the other with a more standard modern configuration. The AWS docs say that we can register multiple load balancers with a single Auto Scaling group.
(In reply to Robert Helmer [:rhelmer] from comment #44)
> If dual-stack is the right way to do this, we can have two ELBs

By dual-stack, I meant having a single endpoint that handles both old and intermediate configurations. This is not possible with ELBs, because there is no way to run custom traffic scripts on them.

That reduces our options to dmajor's patch: using 2 crash-reports URL that will point to 2 separate ELB stacks, one with SSLv3/SHA-1/3DES, and one with TLS1+/SHA-256/PFS.
I'll point out that if we do include SP3 in the patch, then:
- We'll be reducing security for the TLS1-enabled SP3 users who can send reports successfully today (I have a suspicion that they may be the majority)
- The patch can't land until the new server (shall we call it "crash-reports-xp" in that case?) comes online. If it's just SP2 then I can land whenever since broken can't get worse :) So, let me know.
The patch does do this from how I read it, but I just wanted to mention that on those affected XP versions, the native client needs to send (browser) crashes to that "insecure" XP-specific URL (through the OS network stack) while the crash reports sent from within Firefox (plugin/content crashes, etc.) need to be sent to the normal, fully secure URL as it uses our own Necko-based stack.
Yes, that's the right behavior and that's how the patch implements it. (Thanks for the sanity check.)
Shyam: could webops create a ZLB endpoint for "crash-reports-xpsp2.mozilla.com", give it a valid SHA-1 cert and an OLD ssl configuration (same as mozilla.org), and point it to the same backends as crash-reports.mozilla.com ?
Flags: needinfo?(smani)
Flags: needinfo?(benjamin)
37 Beta 7 (the final Beta in this cycle) goes to build on Thu, Mar 19 (tomorrow). We need the client side change to land now. We also need the server side piece to be stood up really soon - early next week will allow us to do some testing before release.

David - Can you land your change on m-c, m-a, and m-b with a=lmandel. Please add an approval request to the bug that I can approve after the fact for the record.
Flags: needinfo?(dmajor)
I kept the change limited to SP2.

https://hg.mozilla.org/mozilla-central/rev/adb10dab9357
https://hg.mozilla.org/releases/mozilla-aurora/rev/b3f7cab19d3a
https://hg.mozilla.org/releases/mozilla-beta/rev/caf324dbb13f
Status: NEW → RESOLVED
Closed: 9 years ago
Flags: needinfo?(dmajor)
Resolution: --- → FIXED
Target Milestone: --- → mozilla39
Comment on attachment 8577079 [details] [diff] [review]
Hack to use an alternate server on SP2

Approval Request Comment
[Feature/regressing bug #]: Server-side
[User impact if declined]: Can't submit crash reports on XP SP2
[Describe test coverage new/current, TreeHerder]: Green on m-c
[Risks and why]: Functionally, can't really get any worse. This will result in a lower level of security for SP2 crash uploads, but that's already a general cost of using SP2 nowadays.
[String/UUID change made/needed]: No UUID change, no l10n string impact. There is a server URL change so we'll need to bring up a server with that name.
Attachment #8577079 - Flags: approval-mozilla-beta?
Attachment #8577079 - Flags: approval-mozilla-aurora?
Comment on attachment 8577079 [details] [diff] [review]
Hack to use an alternate server on SP2

Thank you for getting this landed today. Beta+ Aurora+
Attachment #8577079 - Flags: approval-mozilla-beta?
Attachment #8577079 - Flags: approval-mozilla-beta+
Attachment #8577079 - Flags: approval-mozilla-aurora?
Attachment #8577079 - Flags: approval-mozilla-aurora+
Flags: qe-verify+
We have performed some exploratory testing today using Firefox 37 Beta 7 (BuildID: 20150319212106) to see that the Crash Reporter still works fine. We covered Windows 7 x64, Windows 8.1 x86, Mac OS X 10.9.5, and Ubuntu 14.04 x64. No issues were encountered, so it looks like this fix did not cause any major regressions.

Once we have the server part for this, then we can also validate the fix on a Windows XP SP2 x86 machine where we reproduce the issue.
crash-reports-xpsp2.mozilla.com is live and should be working as expected, serving an SSLv3 SHA1 EV cert for XP SP2 users.
Flags: needinfo?(smani)
Flags: needinfo?(nmaul)
Verified today on the Windows XP SP2 x86 machine where this was reproduced initially, with Firefox 37 Beta 7, and this time I did not encounter any issue: crashes were submitted without errors, and I was able to find and access them afterwards in about:crashes.

Given previous testing in comment 54, I'm marking this as verified for Firefox 37.
Depends on: 1154298
Verified on Firefox 38 Beta 8 (build2) and latest Firefox 39 DevEdition, on Windows XP x86, and the crashes are submitted without error. There is still the remaining issue tracked in bug 1154298. Closing this bug since the remaining issue is tracked separately.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: