Closed Bug 1348454 Opened 7 years ago Closed 7 years ago

KeyPair Sign threads are leaked on every use of the feature

Categories

(Core Graveyard :: Identity, defect)

defect
Not set
normal

Tracking

(firefox-esr45 wontfix, firefox52 wontfix, firefox-esr5254+ fixed, firefox53+ fixed, firefox54+ fixed, firefox55+ fixed)

RESOLVED FIXED
mozilla55
Tracking Status
firefox-esr45 --- wontfix
firefox52 --- wontfix
firefox-esr52 54+ fixed
firefox53 + fixed
firefox54 + fixed
firefox55 + fixed

People

(Reporter: jesup, Assigned: froydnj)

References

Details

(Keywords: memory-leak, Whiteboard: [MemShrink:P2])

Attachments

(4 files, 1 obsolete file)

sheppy noted multiple "KeyPair Sign" threads after doing some example development.  Looking at the code, it's leaking every thread it creates - it provides a runnable to NS_NewNamedThread().  That runnable sends itself to MainThread to do it's callback -- but it doesn't do thread->Shutdown().
Note that this isn't a great solution, but it shouldn't leak.  A true singleton or a pool that shuts down idle threads would probably be better
Attachment #8848680 - Flags: review?(nfroyd)
Assignee: nobody → rjesup
Status: NEW → ASSIGNED
Comment on attachment 8848680 [details] [diff] [review]
shut down KeyPair Sign threads after using them

Review of attachment 8848680 [details] [diff] [review]:
-----------------------------------------------------------------

The commit message says that this addresses signing...but the patch itself only looks at key generation.  So we have two issues to fix here.

Having a worker thread or a thread pool or virtually anything better than "spin up a separate thread for every operation" sounds like a better fix at this point.
Attachment #8848680 - Flags: review?(nfroyd) → review-
Tracking requested (no little prompt in the comment, weird) because leaking threads on every operation like this is no good.  Wild guessing says that ESR52 crypto things *might* see more widespread use in the ESR specifically than in general-purpose installations, so even though this isn't a security bug, it might be worth fixing on ESR for the memory leak.
(In reply to Nathan Froyd [:froydnj] from comment #4)
> Tracking requested (no little prompt in the comment, weird) because leaking
> threads on every operation like this is no good.  Wild guessing says that
> ESR52 crypto things *might* see more widespread use in the ESR specifically
> than in general-purpose installations, so even though this isn't a security
> bug, it might be worth fixing on ESR for the memory leak.

Tracking+ for all branches based on this comment.
Fernando, are you able to find somebody to fix this memory leak?
Flags: needinfo?(ferjmoreno)
Blocks: 753238
Drive-by comment: this code might benefit from using the preexisting CryptoTask infrastructure: https://dxr.mozilla.org/mozilla-central/source/security/manager/ssl/CryptoTask.h
Flags: needinfo?(rfkelly)
Flags: needinfo?(ferjmoreno)
Flags: needinfo?(ckarlof)
Mark (Sync Desktop Lead) or Ryan (FxA Lead), can you please investigate?
Flags: needinfo?(ckarlof) → needinfo?(markh)
I believe this code will be hit every 12 hours for Sync users, which is probably why we've never noticed the leak before. While I agree this sucks, ISTM that starting a thread and stopping it on every operation is better than starting a thread and *not* stopping it, so maybe it makes sense to land this and have a followup to improve it later, especially when there might be a risk that "later" means after the next train? I'm afraid I'm not going to have time to look at this for a couple of weeks at least...
Flags: needinfo?(markh)
Too late for esr52.0.x, updating tracking flag.
(In reply to Mark Hammond [:markh] from comment #10)
> I believe this code will be hit every 12 hours for Sync users, which is
> probably why we've never noticed the leak before. While I agree this sucks,
> ISTM that starting a thread and stopping it on every operation is better
> than starting a thread and *not* stopping it,

better in what sense?  If we started a single thread for the IdentityCryptoService, that would only be one thread for the life of the process, rather than N threads for the life of the process, some of which might have a hard time finding enough contiguous memory for the thread stack (though a couple searches at crash-stats isn't turning up any OOM issues).

My preference for a bandaid fix would be to start a single thread when IdentityCryptoService is instantiated, and use that for everything.  I think we could improve that later if we needed.
Flags: needinfo?(markh)
(In reply to Nathan Froyd [:froydnj] from comment #12)
> (In reply to Mark Hammond [:markh] from comment #10)
> > I believe this code will be hit every 12 hours for Sync users, which is
> > probably why we've never noticed the leak before. While I agree this sucks,
> > ISTM that starting a thread and stopping it on every operation is better
> > than starting a thread and *not* stopping it,
> 
> better in what sense?

Sorry I wasn't clear - I just meant better than the status quo.

> My preference for a bandaid fix would be to start a single thread when
> IdentityCryptoService is instantiated, and use that for everything.  I think
> we could improve that later if we needed.

I agree that's certainly better if it can get done this train. However, if resourcing means it can't get done this train, I believe the attached patch would be better than nothing.
Flags: needinfo?(markh)
I agree with Mark - land the bandaid and uplift it to 53.  Then land a persistent-thread patch (I'd prefer a SharedThreadPool with (perhaps) a maximum number of 1), in order to get the shutdown-when-idle behavior (esp. for mobile), but we can argue that one.
clearing my ni? since :markh appears to have this under control :-)
Flags: needinfo?(rfkelly)
Whiteboard: [memshrink] → [MemShrink:P2]
Rather than starting a single thread for every key generation and key
signing operation started by IdentityCryptoService, and then leaking
those threads, we can at least start a single thread to be used by all
operations and leak that instead.

If we want to go with the startup/shutdown thread routine for every operation,
I could do that too, but this wasn't much more code to write and is just about
as efficient.

Note that CryptoTask, suggested in comment 8, also starts a separate thread for
every operation.
Attachment #8851050 - Flags: review?(rjesup)
Attachment #8848680 - Attachment is obsolete: true
Comment on attachment 8851050 [details] [diff] [review]
use a single thread for IdentityCryptoService

Review of attachment 8851050 [details] [diff] [review]:
-----------------------------------------------------------------

I still dislike leaking threads when we don't need to.  I'll give r+ to the code here in that it's better than what we have now, but I'd prefer it with the (simple) mods below, which provides a singleton thread but doesn't leak.  (Patterned on MediaTimer, etc).

::: services/crypto/component/IdentityCryptoService.cpp
@@ +53,5 @@
>    NS_DECL_THREADSAFE_ISUPPORTS
>    NS_DECL_NSIIDENTITYKEYPAIR
>  
> +  KeyPair(SECKEYPrivateKey* aPrivateKey, SECKEYPublicKey* aPublicKey,
> +          nsIThread* aOperationThread);

nsIEventTarget

@@ +81,5 @@
>    }
>  
>    SECKEYPrivateKey * mPrivateKey;
>    SECKEYPublicKey * mPublicKey;
> +  nsCOMPtr<nsIThread> mThread;

nsIEventTarget

@@ +121,5 @@
>    const KeyType mKeyType; // in
>    nsMainThreadPtrHandle<nsIIdentityKeyGenCallback> mCallback; // in
>    nsresult mRv; // out
>    nsCOMPtr<nsIIdentityKeyPair> mKeyPair; // out
> +  nsCOMPtr<nsIThread> mThread;

nsIEventTarget

@@ +183,5 @@
>        = do_GetService("@mozilla.org/psm;1", &rv);
>      NS_ENSURE_SUCCESS(rv, rv);
>  
> +    rv = NS_NewNamedThread("IdentityCrypto", getter_AddRefs(mThread));
> +    NS_ENSURE_SUCCESS(rv, rv);

Instead:
  RefPtr<SharedThreadPool> threadPool(
    SharedTHreadPool::Get(NS_LITERAL_STRING("IdentityCrypto"), 1));
  mThread = threadPool.get();

If you want to be pedantic, put "if (!threadPool) { return NS_ERROR_FAILURE; }" inbetween those statements.

@@ +193,5 @@
>    ~IdentityCryptoService() = default;
>    IdentityCryptoService(const KeyPair &) = delete;
>    void operator=(const IdentityCryptoService &) = delete;
> +
> +  nsCOMPtr<nsIThread> mThread;

nsIEventTarget

@@ +228,5 @@
>      Base64URLEncodePaddingPolicy::Include, result);
>  }
>  
> +KeyPair::KeyPair(SECKEYPrivateKey * privateKey, SECKEYPublicKey * publicKey,
> +                 nsIThread* operationThread)

nsIEventTarget

@@ +322,5 @@
>  }
>  
>  KeyGenRunnable::KeyGenRunnable(KeyType keyType,
> +                               nsIIdentityKeyGenCallback * callback,
> +                               nsIThread* operationThread)

nsIEventTarget
Attachment #8851050 - Flags: review?(rjesup) → review+
Probably will need #include "mozilla/SharedThreadPool.h" as well
Returned to my desk after a day or so away from it, and found all my apps paused because my Mac had run out of memory. Took a look and found that Nightly was using an insane amount of memory (over 17 GB), as evidenced by the screenshot.
Here's a sample I took using activity monitor while my Mac had apps suspended due to OOM condition. I don't know for certain it's related to this bug, but that sure is a lot of KeyPair Sign processes...
This wouldn't be the cause of your 17GB Master Process, but it's certainly and example of the leak.
Let's land this, and uplift.  I'd love to see it in 53 and it's tracking+, but it's not critical.  Patch is a simple one (with or without the minor  mods I suggested).  Or I can land it, let me know
Flags: needinfo?(nfroyd)
(In reply to Randell Jesup [:jesup] from comment #22)
> Let's land this, and uplift.  I'd love to see it in 53 and it's tracking+,
> but it's not critical.  Patch is a simple one (with or without the minor 
> mods I suggested).  Or I can land it, let me know

I will land it without the shared thread pool, but with the nsIEventTarget* parameters; using shared thread pools causes test timeouts:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=0e19c1da85270b969518563eafb015bfc861bb8c&selectedJob=88332129

for reasons that I don't really understand...
Flags: needinfo?(nfroyd)
Pushed by nfroyd@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/6ef3f769288f
use a single thread for IdentityCryptoService; r=jesup
Assignee: rjesup → nfroyd
Comment on attachment 8851050 [details] [diff] [review]
use a single thread for IdentityCryptoService

Asking ahead of merge to central so this can possibly get to Beta.

Approval Request Comment
[Feature/Bug causing the regression]: bug 753238
[User impact if declined]: Memory leaks for people using FxAccounts.
[Is this code covered by automated tests?]: Yes.
[Has the fix been verified in Nightly?]: No.
[Needs manual test from QE? If yes, steps to reproduce]: No.
[List of other uplifts needed for the feature/fix]: None.
[Is the change risky?]: No.
[Why is the change risky/not risky?]: We're changing where a thread for operations are spawned; no significant logic changes.
[String changes made/needed]: None.
Attachment #8851050 - Flags: approval-mozilla-beta?
Attachment #8851050 - Flags: approval-mozilla-aurora?
https://hg.mozilla.org/mozilla-central/rev/6ef3f769288f
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Comment on attachment 8851050 [details] [diff] [review]
use a single thread for IdentityCryptoService

Memory leaks are bad, let's uplift this for the beta 10 build.
Attachment #8851050 - Flags: approval-mozilla-beta?
Attachment #8851050 - Flags: approval-mozilla-beta+
Attachment #8851050 - Flags: approval-mozilla-aurora?
Attachment #8851050 - Flags: approval-mozilla-aurora+
Comment on attachment 8851050 [details] [diff] [review]
use a single thread for IdentityCryptoService

Sounds important for ESR52 too.
Attachment #8851050 - Flags: approval-mozilla-esr52?
Setting qe-verify- based on Nathan's assessment on manual testing needs (see Comment 25) and the fact that this fix has automated coverage.
Flags: qe-verify-
I've now running 55.0a1 (2017-04-05) for a full day and have zero "KeyPair Sign" threads running. Hurray!
Comment on attachment 8851050 [details] [diff] [review]
use a single thread for IdentityCryptoService

Memory leak fix that was uplifted 3 weeks ago on 53 and so far no damage done with one good verification data point. Let's uplift to ESR52.2
Attachment #8851050 - Flags: approval-mozilla-esr52? → approval-mozilla-esr52+
Needs a bit of rebasing for ESR52.
Flags: needinfo?(nfroyd)
Attached patch patch for esr52Splinter Review
This is a straightforward port of the patch that compiles locally for me.
Flags: needinfo?(nfroyd)
Blocks: 1404936
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: