Closed Bug 1455597 Opened 6 years ago Closed 6 years ago

Crash: Didn't find a cached resource with that ID!

Categories

(Core :: Graphics: WebRender, defect, P2)

x86_64
All
defect

Tracking

()

RESOLVED FIXED
mozilla62
Tracking Status
firefox-esr52 --- unaffected
firefox-esr60 --- unaffected
firefox59 --- unaffected
firefox60 --- unaffected
firefox61 --- disabled
firefox62 --- disabled

People

(Reporter: jan, Assigned: kats)

References

(Blocks 1 open bug)

Details

(Keywords: crash, nightly-community)

Crash Data

Attachments

(1 file)

Seen on Socorro. bug 1446588 didn't stop this crash reason.

[@ static void core::option::expect_failed | static union core::result::Result<T> webrender::resource_cache::ResourceCache::get_cached_image ]
bp-a5506d84-c971-4af5-aa1e-4c7ed0180420 20180418230818 Win10
> Didn't find a cached resource with that ID!

[@ static void std::panicking::rust_panic_with_hook ]
bp-c51d97f8-b0da-4554-80e7-ea5640180418 20180417100054 Win10
> Didn't find a cached resource with that ID!
Crash Signature: [@ static void core::option::expect_failed | static union core::result::Result<T> webrender::resource_cache::ResourceCache::get_cached_image ] [@ static void std::panicking::rust_panic_with_hook ] → [@ static void core::option::expect_failed | static union core::result::Result<T> webrender::resource_cache::ResourceCache::get_cached_image ] [@ static void std::panicking::rust_panic_with_hook ] [@ mozalloc_abort | abort | panic_abort::__rust_start_pa…
OS: Windows 10 → All
Crash Signature: panic_abort::__rust_start_panic::abort::h091e61b1e9ef8f82 | panic_abort::__rust_start_panic | core::option::expect_failed::h665835dead85fc51 | webrender::resource_cache::ResourceCache::get_cached_image::hfd1d2328a8cb4c77 ] → panic_abort::__rust_start_panic::abort::h091e61b1e9ef8f82 | panic_abort::__rust_start_panic | core::option::expect_failed::h665835dead85fc51 | webrender::resource_cache::ResourceCache::get_cached_image::hfd1d2328a8cb4c77 ] [@ mozalloc_abort | abort | co…
Crash Signature: core::option::expect_failed::h7f635057bfba806a | webrender::resource_cache::ResourceCache::get_cached_image::hffd646c4c32aa8bc ] → core::option::expect_failed::h7f635057bfba806a | webrender::resource_cache::ResourceCache::get_cached_image::hffd646c4c32aa8bc ] [@ mozalloc_abort | abort | core::option::expect_failed::h7f635057bfba806a | webrender::resource_cache::ResourceCache::get_c…
Crash Signature: webrender::resource_cache::ResourceCache::get_cached_image::h6a69122e591884d8 ] [@ mozalloc_abort | abort | core::option::expect_failed::h7f635057bfba806a | webrender::resource_cache::ResourceCache::get_cached_image::hf33258fd364ddd14 ] → webrender::resource_cache::ResourceCache::get_cached_image::h6a69122e591884d8 ] [@ mozalloc_abort | abort | core::option::expect_failed::h7f635057bfba806a | webrender::resource_cache::ResourceCache::get_cached_image::hf33258fd364ddd14 ] [@ mozalloc_abo…
I'm seeing this crash pretty much every time I shut down or update WR-enabled nightly on macOS.
Crash Signature: mozalloc_abort | abort | core::option::expect_failed | webrender::resource_cache::ResourceCache::get_cached_image ] → mozalloc_abort | abort | core::option::expect_failed | webrender::resource_cache::ResourceCache::get_cached_image ] [@ mozalloc_abort | abort | core::option::expect_failed::h7f635057bfba806a | webrender::resource_cache::ResourceCache::get_cached_image::…
This is the #2 WR-specific topcrash. We should address this before turning on WR in nightly. I seem to be able to reproduce it so I'll take a look.
Assignee: nobody → bugmail
Blocks: stage-wr-nightly
No longer blocks: stage-wr-trains
This might actually be #1 after combining the signatures.
Crash Signature: webrender::resource_cache::ResourceCache::get_cached_image::h8b37fbc8303bff09 ] → webrender::resource_cache::ResourceCache::get_cached_image::h8b37fbc8303bff09 ] [@ mozalloc_abort | abort | core::option::expect_failed::hd528a7bd2cab115d | webrender::resource_cache::ResourceCache::get_cached_image::h4a711d8897b20ee7 ]
I have seen the crash 5 times within the last 60 minutes, linux/ubuntu 18.04 with Intel GPU. Even just moving away from my laptop.
And just to be clear, this started just after I upgraded from 20180530220105 to 20180531101452.
I'm wondering if there's any link with those kernel messages:
> Jun  1 11:39:44 portable-alex kernel: [1329561.799780] [drm] Reducing the compressed framebuffer size. This may lead to less power savings than a non-reduced-size. Try to increase stolen memory size if available in BIOS.
> Jun  1 11:40:03 portable-alex kernel: [1329581.395141] [drm] Reducing the compressed framebuffer size. This may lead to less power savings than a non-reduced-size. Try to increase stolen memory size if available in BIOS.

At least, it looks like each time there's one, I get a crash very soon after (< 60s). I've already set the maximum video memory in the BIOS (512MB).
I haven't been able to reproduce it on a local build yet, so I haven't been able to confirm. But my theory is this: when a tab is closed, we run through WRBP::ClearResources. This sends a transaction to clear out the display list and remove the pipeline at [1]. This transaction gets sent to the scene builder thread where it may take a bit of time.

Meanwhile, a few lines down, at [2], we release the WebRenderAPI clone for that tab, which should trigger the ClearNamespace at [3]. This happens in a message sent directly to the RenderBackend thread, and so can jump in front of the DL/pipeline removal, and put WR into a state where it still has the DL/pipeline, but cleared all the cached images. If, in that state, it tries to do a render (there might be one inflight already) then it might result in this error.

[1] https://searchfox.org/mozilla-central/rev/38bcf897f1fa19c1eba441a611cf309482e0d6e5/gfx/layers/wr/WebRenderBridgeParent.cpp#1607
[2] https://searchfox.org/mozilla-central/rev/38bcf897f1fa19c1eba441a611cf309482e0d6e5/gfx/layers/wr/WebRenderBridgeParent.cpp#1621
[3] https://searchfox.org/mozilla-central/rev/38bcf897f1fa19c1eba441a611cf309482e0d6e5/gfx/webrender_api/src/api.rs#988
I made a try push with logging [1] and I was able to reproduce on that. Indeed, it seems like the namespace is being cleared and then we request the image after that. Will try to put together a fix.

[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=0919751458bc2db546a9a3dc2aac55143e7d23a2
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #9)
> This might fix it:
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=fd2323486caab76edd458c4b46c444f551d819c5

Thanks. So far, toggling the pref |gfx.webrender.async-scene-build| seems to have helped a lot: nearly three hours without any crash! I'm going to test with your binaries out of TaskCluster :)
Crash Signature: webrender::resource_cache::ResourceCache::get_cached_image::h8b37fbc8303bff09 ] [@ mozalloc_abort | abort | core::option::expect_failed::hd528a7bd2cab115d | webrender::resource_cache::ResourceCache::get_cached_image::h4a711d8897b20ee7 ] → webrender::resource_cache::ResourceCache::get_cached_image::h8b37fbc8303bff09 ] [@ mozalloc_abort | abort | core::option::expect_failed::hd528a7bd2cab115d | webrender::resource_cache::ResourceCache::get_cached_image::h4a711d8897b20ee7 ] [@ mozalloc_abo…
Crash Signature: mozalloc_abort | abort | core::option::expect_failed::h7f635057bfba806a | webrender::resource_cache::ResourceCache::get_cached_image::h8b37fbc8303bff09 ] → mozalloc_abort | abort | core::option::expect_failed::h7f635057bfba806a | webrender::resource_cache::ResourceCache::get_cached_image::h8b37fbc8303bff09 ] [@ mozalloc_abort | abort | core::option::expect_failed::hd528a7bd2cab115d | webrender::resource_ca…
Hm, for some reason the try push seems to indicate the windows debug build is hanging on startup. Not sure what's going on there, will investigate.
(In reply to Alexandre LISSY :gerard-majax from comment #11)
> (In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #9)
> > This might fix it:
> > https://treeherder.mozilla.org/#/
> > jobs?repo=try&revision=fd2323486caab76edd458c4b46c444f551d819c5
> 
> Thanks. So far, toggling the pref |gfx.webrender.async-scene-build| seems to
> have helped a lot: nearly three hours without any crash! I'm going to test
> with your binaries out of TaskCluster :)

So, this has been rock stable since my comment. Looks good, :kats !
Shoot, the patch introduces a deadlock situation because it blocks on WR while holding the indirect layer trees lock at https://searchfox.org/mozilla-central/rev/38bcf897f1fa19c1eba441a611cf309482e0d6e5/gfx/layers/ipc/CompositorBridgeParent.cpp#504
That's looking good :)
Comment on attachment 8982595 [details]
Bug 1455597 - Flush the transaction to remove the pipeline before shutting down the WebRenderAPI.

https://reviewboard.mozilla.org/r/248570/#review254944

Loos good!
Attachment #8982595 - Flags: review?(sotaro.ikeda.g) → review+
Pushed by kgupta@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/bcebfd54b4e1
Flush the transaction to remove the pipeline before shutting down the WebRenderAPI. r=sotaro
https://hg.mozilla.org/mozilla-central/rev/bcebfd54b4e1
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla62
See Also: → 1479912
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: