Closed Bug 887313 Opened 11 years ago Closed 10 years ago

Starring bug 696306 causes an "missing low surrogate character in surrogate pair" error on tbpl

Categories

(Tree Management Graveyard :: TBPL, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: RyanVM, Unassigned)

Details

Attachments

(1 file)

When I star bug 696306 on tbpl (usually on b2g18), I get the following error at the top:

submitBugzillaComment.php error: Content-Type application/json had a problem with your request. ***ERROR*** missing low surrogate character in surrogate pair, at character offset 1971 ["dc00b, expected ab\n..."] at /usr/local/share/perl/5.10.0/Catalyst/Action/Deserialize/JSON.pm line 26, <$fh> line 1.

It's erroring out on the ± character in the bug summary.
Any ideas about this?
Flags: needinfo?(glob)
the bzapi is throwing that error, not bugzilla.  redirecting the needinfo to gerv.
Flags: needinfo?(gerv)
Flags: needinfo?(glob)
This is dying in Deserialize, which means it doesn't like the JSON you are sending it. Is there any possibility of getting access to that? Then, we can make sure it's actually sending valid JSON. If it is, then I'll look at fixing BzAPI in some way.

Gerv
Flags: needinfo?(gerv)
I'm not sure if I can easily recreate it without finding somewhere where this test failure has occurred again.
Surely you can reproduce this by adding the ± character to the summary of a bug and starring it? So, file a test bug which is designed to match a common test failure, and add a ± character to its summary. Use a testing instance of tbpl if you don't want to mess with the production one.

Gerv
Just hit the error on this push (WinXP debug M4):
https://tbpl.mozilla.org/?tree=Mozilla-B2g18&rev=6107703a0cdc

I would think we could capture it on the dev instance of tbpl.
https://tbpl-dev.allizom.org/?tree=Mozilla-B2g18&rev=6107703a0cdc

Request URL: 	https://tbpl-dev.allizom.org/php/submitBugzillaComment.php
Request Method: 	POST
Request body:
{
id=696306&comment=edmorley%23tbpl-dev%0Ahttps%3A%2F%2Ftbpl.mozilla.org%2Fphp%2FgetParsedLog.php%3Fid%3D25240510%26tree%3DMozilla-B2g18%0ARev3+WINNT+5.1+mozilla-b2g18+debug+test+mochitest-4+on+2013-07-12+15%3A13%3A59%0Arevision%3A+6107703a0cdc%0Aslave%3A+talos-r3-xp-114%0A%0A119+ERROR+TEST-UNEXPECTED-FAIL+%7C+%2Ftests%2Fdom%2Ftests%2Fmochitest%2Fwhatwg%2Ftest_bug500328.html+%7C+uncaught+exception+-+NS_ERROR_FAILURE%3A+Component+returned+failure+code%3A+0x80004005+(NS_ERROR_FAILURE)+%5BnsIDOMHistory.pushState%5D+at+http%3A%2F%2Fmochi.test%3A8888%2Ftests%2Fdom%2Ftests%2Fmochitest%2Fwhatwg%2Ftest_bug500328.html%3A459%0A1433+ERROR+TEST-UNEXPECTED-FAIL+%7C+%2Ftests%2Feditor%2Flibeditor%2Fbase%2Ftests%2Ftest_bug586662.html+%7C+Test+timed+out.%0A1434+ERROR+TEST-UNEXPECTED-FAIL+%7C+%2Ftests%2Feditor%2Flibeditor%2Fbase%2Ftests%2Ftest_bug586662.html+%7C+%5BSimpleTest.finish()%5D+waitForFocus()+was+called+a+different+number+of+times+from+the+number+of+callbacks+run.++Maybe+the+test+terminated+prematurely+--+be+sure+to+use+SimpleTest.waitForExplicitFinish().+-+got+1%2C+expected+0%0A1476+ERROR+TEST-UNEXPECTED-FAIL+%7C+%2Ftests%2Feditor%2Flibeditor%2Fbase%2Ftests%2Ftest_bug795785.html+%7C+Test+timed+out.%0A1477+ERROR+TEST-UNEXPECTED-FAIL+%7C+%2Ftests%2Feditor%2Flibeditor%2Fbase%2Ftests%2Ftest_bug795785.html+%7C+%5BSimpleTest.finish()%5D+waitForFocus()+was+called+a+different+number+of+times+from+the+number+of+callbacks+run.++Maybe+the+test+terminated+prematurely+--+be+sure+to+use+SimpleTest.waitForExplicitFinish().+-+got+1%2C+expected+0%0A6355+ERROR+TEST-UNEXPECTED-FAIL+%7C+%2Ftests%2Feditor%2Flibeditor%2Fhtml%2Ftests%2Ftest_bug332636.html+%7C+The+backspace+key+should+delete+the+UTF-16+surrogate+pair+correctly+-+got+axb%2C+expected+ab%0A6356+ERROR+TEST-UNEXPECTED-FAIL+%7C+%2Ftests%2Feditor%2Flibeditor%2Fhtml%2Ftests%2Ftest_bug332636.html+%7C+The+backspace+key+should+delete+the+UTF-16+surrogate+pair+correctly+-+got+a%CC%88b%2C+expected+ab%0A6357+ERROR+TEST-UNEXPECTED-FAIL+%7C+%2Ftests%2Feditor%2Flibeditor%2Fhtml%2Ftests%2Ftest_bug332636.html+%7C+The+backspace+key+should+delete+the+UTF-16+surrogate+pair+correctly+-+got+a%F0%90%90%80b%2C+expected+ab%0A6358+ERROR+TEST-UNEXPECTED-FAIL+%7C+%2Ftests%2Feditor%2Flibeditor%2Fhtml%2Ftests%2Ftest_bug332636.html+%7C+The+backspace+key+should+delete+the+UTF-16+surrogate+pair+correctly+-+got+a%F0%90%A8%8Fb%2C+expected+ab%0A6415+ERROR+TEST-UNEXPECTED-FAIL+%7C+%2Ftests%2Feditor%2Flibeditor%2Fhtml%2Ftests%2Ftest_bug410986.html+%7C+Test+timed+out.%0A6416+ERROR+TEST-UNEXPECTED-FAIL+%7C+%2Ftests%2Feditor%2Flibeditor%2Fhtml%2Ftests%2Ftest_bug410986.html+%7C+%5BSimpleTest.finish()%5D+waitForFocus()+was+called+a+different+number+of+times+from+the+number+of+callbacks+run.++Maybe+the+test+terminated+prematurely+--+be+sure+to+use+SimpleTest.waitForExplicitFinish().+-+got+1%2C+expected+0%0A6422+ERROR+TEST-UNEXPECTED-FAIL+%7C+%2Ftests%2Feditor%2Flibeditor%2Fhtml%2Ftests%2Ftest_bug414526.html+%7C+Test+timed+out.%0A6423+ERROR+TEST-UNEXPECTED-FAIL+%7C+(SimpleTest%2FTestRunner.js)+%7C+4+test+timeouts%2C+giving+up.%0A6424+ERROR+TEST-UNEXPECTED-FAIL+%7C+(SimpleTest%2FTestRunner.js)+%7C+Skipping+466+remaining+tests.%0A6425+ERROR+TEST-UNEXPECTED-FAIL+%7C+(SimpleTest%2FTestRunner.js)+%7C+%5BSimpleTest.finish()%5D+waitForFocus()+was+called+a+different+number+of+times+from+the+number+of+callbacks+run.++Maybe+the+test+terminated+prematurely+--+be+sure+to+use+SimpleTest.waitForExplicitFinish().+-+got+1%2C+expected+0%0A6426+ERROR+TEST-UNEXPECTED-FAIL+%7C+(SimpleTest%2FTestRunner.js)+%7C+%2Ftests%2Feditor%2Flibeditor%2Fhtml%2Ftests%2Ftest_bug414526.html+finished+in+a+non-clean+fashion%2C+probably+because+it+didn't+call+SimpleTest.finish()
}

Response:
{
Content-Type application/json had a problem with your request. ***ERROR*** missing low surrogate character in surrogate pair, at character offset 1982 ["dc00b, expected ab\n..."] at /usr/local/share/perl/5.10.0/Catalyst/Action/Deserialize/JSON.pm line 26, <$fh> line 1. 
}

I'll derive the bzapi call now.
Reduced:
POST request to https://tbpl-dev.allizom.org/php/submitBugzillaComment.php
Content-Type:	application/x-www-form-urlencoded; charset=UTF-8
Body:
{
id=696306&comment=%F0%90%A8%8F
}

Will now try locally and annotate https://hg.mozilla.org/webtools/tbpl/file/991419b41511/php/submitBugzillaComment.php
At first glance those chars seem very odd to me as a UTF-8 sequence:
http://software.hixie.ch/utilities/cgi/unicode-decoder/utf8-decoder?encoding=hex&bytes=F090A88F

Gerv
POST request to:
https://api-dev.bugzilla.mozilla.org/latest/bug/696306/comment?username=tbplbot%40gmail.com&password=<snip>

Accept: application/json
Content-Type: application/json

Post body:
{"text":"\ud802de0f"}

Response:
Status 400

Response body:
Content-Type application/json had a problem with your request.
***ERROR***
missing low surrogate character in surrogate pair, at character offset 15 ["de0f"}"] at /usr/local/share/perl/5.10.0/Catalyst/Action/Deserialize/JSON.pm line 26, <$fh> line 1.
Should we be sanitizing that more, or should bzapi handle it?
You are sending invalid JSON - json.org says the only valid Unicode escapes in JSON are exactly 4 hex chars, and this one is 8. So that's wrong to start with. Secondly, decoding \ud802de0f makes no sense:
http://software.hixie.ch/utilities/cgi/unicode-decoder/utf8-decoder?encoding=hex&bytes=d802de0f

So something has got corrupted somewhere along the way. Note that the sequence there is _different_ to the one in the post to submitBugzillaComment.php, which is:

http://software.hixie.ch/utilities/cgi/unicode-decoder/utf8-decoder?encoding=hex&bytes=F090A88F

That one is at least a valid Unicode character, although a very strange one: U+10A0F	KHAROSHTHI SIGN VISARGA. If you have a font for this, you are a better man than I am: " 
I manually reduced the string to get that in comment 11, so guess I just chopped off too much.
Oops, Bugzilla chopped my comment at the unicode char :-( The rest of it read:

Looking closer, this seems to be a test related to odd Unicode so perhaps this is in fact the character expected. Still, it's getting mangled by submitBugzillaComment.php into something which is definitely not right.

Only a few characters _have_ to be escaped in JSON; perhaps it would be easier to send the chars directly?

Gerv

(But maybe that's not such a great idea with the Bugzilla Unicode Chopping Bug still outstanding...)
To reiterate the story so far: it is the KHAROSHTHI SIGN VISARGA that we're mangling. This character is showing up in the test failure - its appearance is while the test failed, after all.

jQuery.ajax is sending UTF-8 - and KHAROSHTHI... is F090A88F in UTF-8 as Gerv noted in comment 13.

It looks like TBPL's PHP back end is trying to re-encode as UTF-16, and KARAOSHTHI in UTF-16 is low/high 0xd802 0xde0f (http://www.fileformat.info/info/unicode/char/10a0f/index.htm)

>Post body:
>{"text":"\ud802de0f"}
It almost got there, but for a missing \u in the middle there.

It looks like http://hg.mozilla.org/webtools/tbpl/file/991419b41511/php/inc/JSON.php#l337 is producing the incorrect output, not inserting a \u between the surrogate pairs.

I guess we can fix this, and the other cases, but what version of PHP are we running on tbpl.m.o? If >= 5.2.0 we could switch to PHP's native JSON support, which one would presume is less buggy. (Certainly on my machine:
php -r 'echo json_encode(chr(0xf0) . chr (0x90) . chr(0xa8). chr(0x8f));'
produces the correct "\ud802\ude0f").
I was always in favor of going with the native php function. I think ehsan had some objections, can’t remember exactly.
Attached patch Rip out inc/JSONSplinter Review
I pushed http://hg.mozilla.org/webtools/tbpl/rev/999b3ed2b3a3 to ask tbpl-dev what version it was running, and backed it out after the cron job ran and I got my answer.

Turns out I didn't need to. A grep for JSON reveals we are already using PHPs native JSON support all over the place, with the exception of 2 files, one of which we obviously know is submitBugzillaComment.php. It's nonsensical to have these files using this buggy library file. Let's switch.
Attachment #777277 - Flags: review?(emorley)
(In reply to Arpad Borsos (Swatinem) from comment #17)
> I was always in favor of going with the native php function. I think ehsan
> had some objections, can’t remember exactly.

Ah. Didn't see your comment until I posted the patch.
Comment on attachment 777277 [details] [diff] [review]
Rip out inc/JSON

Bug 561985 has the history
Attachment #777277 - Flags: review?(emorley)
The native PHP JSON function at least used to be pretty broken, not sure if things have improved since three years ago (I have pretty much stopped following PHP development.)  Specifically, please retest bug 561985 comment 2 at the very least.
Note that we can also fix the Services_JSON bug and upstream that! <http://pear.php.net/pepr/pepr-proposal-show.php?id=198>
:graememcc: do you need to ask someone to review your patch?

Gerv
Summary: Starring bug 696306 causes an error on tbpl → Starring bug 696306 causes an "missing low surrogate character in surrogate pair" error on tbpl
:graememcc: ping?

Gerv
(In reply to Gervase Markham [:gerv] from comment #23)
> :graememcc: do you need to ask someone to review your patch?
> 
> Gerv

I cancelled the initial review request as I was unaware of the previous history (this approach has been tried before and caused problems). I haven't yet had a chance to perform the suggested in comment 21. I'll try to get to this as soon as I can post-Summit, and re-request review or obsolete as appropriate.
Product: Webtools → Tree Management
Wontfix since TBPL is being replaced by Treeherder.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → WONTFIX
Product: Tree Management → Tree Management Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: