Closed Bug 1247835 Opened 8 years ago Closed 8 years ago

Shrink nsEffectiveTLDService by using binary search instead of a hash table

Categories

(Core :: Networking: DNS, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla47
Tracking Status
firefox47 --- fixed

People

(Reporter: n.nethercote, Assigned: n.nethercote)

Details

(Whiteboard: [MemShrink:P2][necko-active])

Attachments

(3 files, 2 obsolete files)

nsEffectiveTLDService is a three-level data structure:

(1) An array of tightly packed domain names (C strings).

(2) An array of tightly packed triples: (uint, bool, bool), where the uint is an index into the string table.

(3) A hash table of the same triples. This takes up a lot more space than (2) but because it's structured as a hash table it facilitates fast lookups of domain names. With 6271 entries, this takes up 16,384 * 8 = 128 KiB.

I propose removing (3), and instead doing lookups by using binary search on (2). This requires putting (2) in sorted order, which isn't hard. This will save 128 KiB per process.

This won't block the possibility of dynamic modifications to the table (bug 1083971); that would just require making (2) a dynamic array instead of a static array.

This isn't hard. The main question is whether binary search is slower than a hash table, and if so, whether lookups are frequent enough for that to matter.
Assignee: nobody → n.nethercote
Status: NEW → ASSIGNED
Attachment #8718695 - Flags: review?(luke) → review+
Any suggestions on how to measure the performance of binary search vs. hash
table lookup would be welcome!

kmckinley, FYI this could cause conflicts with your patch in bug 1196364. I
don't think the conflicts would be hard to resolve, though.
Attachment #8718700 - Flags: review?(jduell.mcbugs)
Comment on attachment 8718700 [details] [diff] [review]
(part 1) - Use binary search instead of a hash table in nsEffectiveTLDService

Review of attachment 8718700 [details] [diff] [review]:
-----------------------------------------------------------------

So this patch looks fine AFAICT.  I'd feel happier if I knew of a good set of tests for this stuff... The only one I know of is test_tldservice_nextsubdomain.js, which only has a few test cases.  Perhaps that's enough?

> Any suggestions on how to measure the performance of binary search vs. hash
> table lookup would be welcome!

We also need to figure this out before this can get r+.  There are around 10K entries in the eTLD database, so a log(2) algorithm is going to be around 13 times slower than a constant lookup (all things being equal, which of course they aren't).  I'm guessing it won't matter much, particularly if the entries can stay in cache, but we should test this. But maybe simply writing an xpcshell test that looks up a bunch of hostnames over and over with a timer (and seeing how the hash vs binary search does) would be a good first step.

The real "proof of the pudding" is how overall browser performance is affected.  I see there's some functionality in Talos for setting cookies (the most common use for eTLD AFAICT).  I'll find someone to ask about that.

::: netwerk/dns/nsEffectiveTLDService.cpp
@@ +120,5 @@
>  MOZ_DEFINE_MALLOC_SIZE_OF(EffectiveTLDServiceMallocSizeOf)
>  
> +// The amount of heap memory measured here is tiny. It used to be bigger when
> +// nsEffectiveTLDService used a separate hash table instead of binary search.
> +// Nonethelss, we keep this code here in anticipation of bug 1083971 which will

Nonetheless

@@ +276,5 @@
>      // embedded '..' sequence.
>      if (*currDomain == '.')
>        return NS_ERROR_INVALID_ARG;
>  
>      // perform the hash lookup.

change to "perform the lookup" (or nuke the comment entirely)
Attachment #8718700 - Flags: feedback+
Joel:  Do you know if Talos sets cookies in any tests?  We're trying to figure out if changing from a hashtable lookup to a binary search in some internal data structures will affect overall browser performance.  I don't expect it will, but I'm hoping we can use Talos to sniff that out.
Flags: needinfo?(jmaher)
Great question here.  I don't believe we do anything specifically to interact with cookies- maybe as a side effect of our tests we do.

What is great is you can push to try:
try: -b o -p all -u none -t all --rebuild 5

do that pre/post your patch (2 different try pushes), and then compare the differences:
https://treeherder.allizom.org/perf.html#/comparechooser

Alternatively, for ^, you can just retrigger the jobs on the base revision that your try push comes from, so you would do 1 try push and then retrigger something on inbound/fx-team.  I have a tool in treeherder for sheriffs that makes that a 2 click operation.

there might be some false positives in the compare, but looking at the graphs should help you get data.  Feel free to do the try push and ask me to look over the compare results as well.
Flags: needinfo?(jmaher)
Nicholas: I suppose you could try a Talos build with some arbitrary large delay (ex: sleep(1)) in the eTLD code, and if Talos hits it often enough to show a large perf difference that'll tell us whether Talos will be a useful tool here.  (There are a lot of other codepaths that hit eTLD besides cookies: I'm not sure which of them get called the most, hence the shotgun approach).
Flags: needinfo?(n.nethercote)
Attached patch xpcshell performance test (obsolete) — Splinter Review
I extracted a sequence of eTLD lookups during some basic browsing of
techcrunch.com and nytimes.com and put them into this xpcshell test. There are
2053 lookups and I put them in a loop that executes 100 times, for 205,300
lookups in total.

The binary search version was actually slightly *faster*, taking 1250--1350 ms.
The hash table version took 1300--1450 ms. It's possible that the eTLD lookup
only accounts for a small fraction of the execution time of this test, but if
that's true it means that eTLD lookup time isn't that important, because
regular browsing won't hammer the table anything like as much as this test.
Comment on attachment 8718700 [details] [diff] [review]
(part 1) - Use binary search instead of a hash table in nsEffectiveTLDService

Review of attachment 8718700 [details] [diff] [review]:
-----------------------------------------------------------------

Given the xpcshell test result, I'll re-request review. Thank you.
Attachment #8718700 - Flags: review?(jduell.mcbugs)
I addressed the review comments from comment 3.
Attachment #8722341 - Flags: review?(jduell.mcbugs)
Attachment #8718700 - Attachment is obsolete: true
(I forgot to |hg add| the test file.)
Attachment #8722340 - Attachment is obsolete: true
Flags: needinfo?(n.nethercote)
Whiteboard: [MemShrink] → [MemShrink:P2]
Comment on attachment 8722341 [details] [diff] [review]
(part 1) - Use binary search instead of a hash table in nsEffectiveTLDService

Ship it!
Attachment #8722341 - Flags: review?(jduell.mcbugs) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/bc5e02dcb4015e4c3ae674fcf68e8a53e935b6c7
Bug 1247835 (part 0) - Minor comment and style tweaks in BinarySearch.h. r=luke.

https://hg.mozilla.org/integration/mozilla-inbound/rev/d1a8a43006ec0b186429362e3cc241dacf39ae94
Bug 1247835 (part 1) - Use binary search instead of a hash table in nsEffectiveTLDService. r=jduell.
Whiteboard: [MemShrink:P2] → [MemShrink:P2][necko-active]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: