1309112 - Detect and linkify GitHub issue in comment

Reporter

Description

•

8 years ago

Nowadays many Mozilla projects are developed on GitHub. It would be helpful if Bugzilla can linkfy GitHub issues.

On GitHub, issues are referred in the form of "owner/repo#number", e.g. servo/servo#13134. Bugzilla can recognize this pattern and create a link to github.com/owner/repo/issues/number.

Pull requests share the number with issues, but if the target is a pull request, GitHub will automatically redirect to pull/number, so we don't need to distinguish between them from our side.

If there could be too many false positives of this patten (which I doubt), the owner part can be restricted to some known mozilla and other web-related orgs.

Sebastin Santy [:seban]

Assignee

Updated

•

8 years ago

Assignee: nobody → sebastinssanty

Sebastin Santy [:seban]

Assignee

Comment 1

•

8 years ago

Attached file patch — Details

Attachment #8799737 - Flags: review?(dylan)

Dylan Hardison [:dylan] (he/him)

Comment 2

•

8 years ago

Comment on attachment 8799737 [details] [review]
patch

Because we have over 11 million comments in BMO, we have to be careful about regexes that require continuous look-ahead.

Take a look at running this regex against the reporter's comment (this is using the lovely CPAN module Regexp::Debugger)
This ends up taking at least 1200 steps and a lot of back-tracking
and that's just to get to the first match.

https://www.youtube.com/watch?v=WzzHvtozzMg

However I think this is a useful feature. We should have a list of known github projects,
and change the first part of the RE to be  m{\b((?:servo|mozilla|etc)\/[A-Za-z0-9_\.-]+\#[0-9]+)\b}g;

To be efficient, we should use Data::Munge's list2re, which takes a list of static strings and produces an efficient regex.

I don't think we need all of Data::Munge, so for now you can just copy the function into the BMO extension:
https://metacpan.org/source/MAUKE/Data-Munge-0.096/lib/Data/Munge.pm#L59-64

Usage:

my $projects = list2re('servo', 'mozilla', 'mozilla-bteam', 'bugzilla');
m{\b($projects\/[A-Za-z0-9_\.-]+#[0-9]+)\b}g;

For now the list of projects should be a constant at the top of the BMO extension, but I can picture
adding "associated github repos" as metadata to Bugzilla components.

Probably this comment is not enough to go on, but we can chat about it in #bteam at your time allows.

Also, thank you, sebastin for taking on this bug! Good initiative.

Attachment #8799737 - Flags: review?(dylan) → review-

Sebastin Santy [:seban]

Assignee

Comment 3

•

8 years ago

Thanks Dylan,

Yeah, I didn't take the aspect of backtracking over all those comments. I'll have a look at the list2re, I think it will be a better idea too.

Xidorn: I guess the idea mentioned by dylan, would be much more feasible for BMO. Any opinion on this?

Flags: needinfo?(xidorn+moz)

Xidorn Quan [:xidorn] UTC+11

Reporter

Comment 4

•

8 years ago

I think that is fine, but I can imagine there may be requests for adding more orgs / users into that list.

If performance issue is mainly because it starts with a rather arbitrary string which takes too much time on trying matching unrelated text, I have a suggestion that, we can start the regex with "/", and once we find a pattern "/[A-Za-z0-9_\.-]+#\d+\b", we do a manual backtrack to find the owner part. That should be faster I guess.

Flags: needinfo?(xidorn+moz)

Xidorn Quan [:xidorn] UTC+11

Reporter

Comment 5

•

8 years ago

I just rethought about that regexp, and watched the video again. Then I realized that it is just that the regexp debugger is too threatening, and the regexp itself isn't that bad.

As a comparison, let's pick an innocent-looking regexp listed in the same function: qr/(^|\s)r(\d{4,})\b/. If you replace the match in comment 0 with a match of this regexp, how many steps do you think it would take before the first match? 1400+.

IIUC, most of the url matching regexp would take something about half of the steps of that regexp, and that's all.

In majority of cases, because of the existence of "\b" at the beginning, the naive regexp should just take 1x more steps than others, which isn't really harmful. And actually, the regexp you suggested in comment 2 (via list2re, which is something like /\b((?:mozilla\-bteam|bugzilla|mozilla|servo)\/[A-Za-z0-9_\.-]+\#[0-9]+)\b/) takes 1400+ steps to get to the first match.

There is one case that the naive regexp would be a bit worse than normal cases, which is long url. In that case, it could take 2x more steps because of the rematching of "/". But that is still O(n), so not too bad.

The worst case of that regexp is a O(n^2) regression when the string includes lots of "-" or ".", because those characters generates word boundary, but is matched by [A-Za-z0-9_\.-]. I don't think that's something very common, but if you want to avoid this worst case, you can just replace the \b with something like [^A-Za-z0-9_\.-], then it would take O(n) for any case. And if you include "/" in that, it would have the same performance in long urls as well.

Xidorn Quan [:xidorn] UTC+11

Reporter

Comment 6

•

8 years ago

Given comment 5, please reconsider the original patch.

Flags: needinfo?(dylan)

Dylan Hardison [:dylan] (he/him)

Comment 7

•

8 years ago

Comment on attachment 8799737 [details] [review]
patch

Based on Xidorn's astute observations in comment 5, I revised this:

So what I did was run this over every comment in my copy of the BMO db (which is 10 million comments),
with and without this patch, and I benchmarked it. There is no significant difference in timings. We still might get some false positives but we can fix those.

Flags: needinfo?(dylan)

Attachment #8799737 - Flags: review- → review+

Xidorn Quan [:xidorn] UTC+11

Reporter

Comment 8

•

8 years ago

I think the regexp should just be /\s([A-Za-z0-9_\.-]+\/[A-Za-z0-9_\.-]+\#[0-9]+)\b/, and you should probably replace the svn matching to just use \s as prefix as well, since it seems a grouping "or" takes lots of extra steps in Perl's regexp engine, and it is just not worth for a "^".

If you want to make it exactly correct, it might be more performant to prepend a space to every comment before matching than applying "(^|\s)" to them. But I guess that issue is unimportant and isn't really worth this trouble.

Dylan Hardison [:dylan] (he/him)

Comment 9

•

8 years ago

Sebastin: do you want to change the regexps per comment 8? Thanks!


Also, 
Xidorn: Thank you very much for your insight into this bug. It is very much appreciated.

Flags: needinfo?(sebastinssanty)

Sebastin Santy [:seban]

Assignee

Comment 10

•

8 years ago

Dylan, Xidorn: I have changed the regex expression to include whitespace. But I am a bit unsure about it, because, as Xidorn had earlier mentioned about spaces, it won't detect 'See this:mozilla-bteam/bmo.'. Though I guess, these cases would be very less.

Flags: needinfo?(sebastinssanty)

Xidorn Quan [:xidorn] UTC+11

Reporter

Comment 11

•

8 years ago

Did I say that?

I'm not very worried about that case. I'm more concerned about the ending, which should be a "\b" rather than "\s", because people may add punctuation after that. The beginning shouldn't be a big issue.

Sebastin Santy [:seban]

Assignee

Comment 12

•

8 years ago

Right, Earlier you had mentioned about the space in the ending part. But I was thinking about the beginning, but that won't be a big issue, as you mentioned above.

Xidorn Quan [:xidorn] UTC+11

Reporter

Comment 13

•

8 years ago

It seems the GitHub PR is pending for your review. Could you have a look at that?

Flags: needinfo?(dylan)

Dylan Hardison [:dylan] (he/him)

Comment 14

•

8 years ago

Yes, I'll be taking a look at this now. I didn't notice as the review flag was still set, it should pretty good to go now.

Flags: needinfo?(dylan)

Dylan Hardison [:dylan] (he/him)

Updated

•

8 years ago

Attachment #8799737 - Flags: review+ → review?(dylan)

Sebastin Santy [:seban]

Assignee

Comment 15

•

8 years ago

The PR review is long pending, can you have a look at it? :)

Flags: needinfo?(dylan)

Dylan Hardison [:dylan] (he/him)

Comment 16

•

7 years ago

Comment on attachment 8799737 [details] [review]
patch

r=dylan

Finally got to this.

Flags: needinfo?(dylan)

Attachment #8799737 - Flags: review?(dylan) → review+

Dylan Hardison [:dylan] (he/him)

Comment 17

•

7 years ago

To git@github.com:mozilla-bteam/bmo.git
   4da65f2..d7dacb9  master -> master

Status: NEW → RESOLVED

Closed: 7 years ago

Resolution: --- → FIXED

Dylan Hardison [:dylan] (he/him)

Comment 18

•

7 years ago

This is now live, mozilla-bteam/bmo#26 as an example. Xidorn's servo example also works.

Dylan Hardison [:dylan] (he/him)

Updated

•

7 years ago

Status: RESOLVED → VERIFIED

Xidorn Quan [:xidorn] UTC+11

Reporter

Comment 19

•

7 years ago

Great! Thanks for working on this.