Closed Bug 1695933 Opened 3 years ago Closed 3 years ago

Enable EGL by default for Mesa 21 and Nvidia driver 470

Categories

(Core :: Graphics, task, P3)

Desktop
Linux
task

Tracking

()

RESOLVED FIXED
94 Branch
Tracking Status
firefox94 --- fixed

People

(Reporter: aosmond, Assigned: aosmond)

References

Details

Attachments

(1 file, 1 obsolete file)

Leveraging the blocklist changes, let's turn EGL on for most users in nightly.

Depends on: 1695997

FYI, swrast_dri.so leaks a small amount of heap memory when loaded/unloaded, so I'd expect this change to break ASan test runs (and suppressions won't work, because stack walking doesn't work, because the module that allocated the memory isn't present when leak checking happens at shutdown).

I ran into this problem while adapting the GLX backend to open/close its own display, and I have a workaround (finding the library and leaking a reference to it) that should work for EGL as well.

(In reply to Jed Davis [:jld] ⟨⏰|UTC-6⟩ ⟦he/him⟧ from comment #2)

I ran into this problem while adapting the GLX backend to open/close its own display, and I have a workaround (finding the library and leaking a reference to it) that should work for EGL as well.

It would be great if you could open an issue pointing to that workaround :)
Odd that it hasn't yet been fixed in Mesa, maybe we can do that.

Flags: needinfo?(jld)

All blocker appear to be fixed now - Andrew, can we go forward with this?

Flags: needinfo?(jld) → needinfo?(aosmond)

It looks like we see similar X crashes as before with other talos tests. Also seems to cause marionette test failures.

Depends on: 1709584
Depends on: 1709585
Depends on: 1709586
Depends on: 1712665

Are these crashes really blockers?
EGL bug 1709584: "But it happens on both, EGL and GLX"
EGL bug 1709586 <-> Existing non-EGL bug: bug 1707268
EGL bug 1709585 <-> Other intermittent [@ _g_log_abort] crashes already exist: https://bugzilla.mozilla.org/buglist.cgi?quicksearch=_g_log_abort&list_id=15803504

EGL on X11 has been reported to work well on both Mesa and
prop. Nvidia, but has been blocked by some CI failures.
As these so far can not get reproduced on recent driver version,
require such very recent versions.

For Nvidia it's also important to note that the 470 series is
the first to support DmaBuf and thus benefits much more than
older drivers.

Try run for the patch above: https://treeherder.mozilla.org/jobs?repo=try&revision=7eeab9fde139c8f1471c5d5cd1a8605464f15f19

As it only activates EGL on recent drivers, it should not trigger the CI failures still blocking this bug. None of them reproduces locally for me, so I suspect them to be caused by bugs in other parts of the stack - hopefully the required recent versions will make sure no users will be affected by them.

Depends on: 1670545

My Debian Testing doesn't have Mesa 21 yet because someone reported a Firefox regression for it:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=994057

After upgraded libegl-mesa0 libgbm1 libgl1-mesa-dri libglapi-mesa libglx-mesa0 libllvm11 to 21.2.1-2, I have artifacts with firefox-esr. For example, with right click on a tab, I can see artifact.
For the moment, I have not seen any problem with other software.

If I downgrade to 20.3.5-1, the problem is no longer present.

https://tracker.debian.org/pkg/mesa
https://packages.debian.org/testing/libegl-mesa0

I replied to the bug and asked for reporting to BMO with a screenshot.
My assumption: The user might have seen intermittent bug 1678804 (bug 1655924/bug 1635153 might be related).
EGL reduces the chance for it to occur, but if one uses slow llvmpipe, it's perfectly reproducible with EGL as well:
$ LIBGL_ALWAYS_SOFTWARE=1 MOZ_X11_EGL=1 firefox-esr

Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 94 Branch
Summary: Enable EGL on nightly by default → Enable EGL by default
No longer depends on: 1709584
No longer depends on: 1709585
No longer depends on: 1709586
Blocks: 1730671
Summary: Enable EGL by default → Enable EGL by default for Mesa 21 and Nvidia driver 470
Regressions: 1730822
Regressions: 1731172
Regressions: 1731251
See Also: → 1732002
Attachment #9206362 - Attachment is obsolete: true

This does not work for me with Mesa 21.2.3/AMDGPU. FF fall-backs to software rendering.

ATTENTION: default value of option mesa_glthread overridden by environment.
ATTENTION: default value of option mesa_glthread overridden by environment.
ATTENTION: default value of option mesa_glthread overridden by environment.
ATTENTION: default value of option mesa_glthread overridden by environment.
[GFX1-]: Failed to create EGLConfig for WebRender!
ATTENTION: default value of option mesa_glthread overridden by environment.
[GFX1-]: Failed to create EGLConfig for WebRender!
[GFX1-]: Failed GL context creation for hardware WebRender: true
ATTENTION: default value of option mesa_glthread overridden by environment.
[GFX1-]: Failed to create EGLConfig for WebRender!
ATTENTION: default value of option mesa_glthread overridden by environment.
[GFX1-]: Failed to create EGLConfig for WebRender!
[GFX1-]: Failed GL context creation for hardware WebRender: true
ATTENTION: default value of option mesa_glthread overridden by environment.
[GFX1-]: Failed to create EGLConfig for WebRender!
ATTENTION: default value of option mesa_glthread overridden by environment.
[GFX1-]: Failed to create EGLConfig for WebRender!
[GFX1-]: Failed GL context creation for hardware WebRender: true
[GFX1-]: Failed to get shared GL context
ATTENTION: default value of option mesa_glthread overridden by environment.
[GFX1-]: Failed to create EGLConfig for WebRender!
ATTENTION: default value of option mesa_glthread overridden by environment.
[GFX1-]: Failed to create EGLConfig for WebRender!
[GFX1-]: Failed GL context creation for hardware WebRender: true
ATTENTION: default value of option mesa_glthread overridden by environment.
[GFX1-]: Failed to create EGLConfig for WebRender!
ATTENTION: default value of option mesa_glthread overridden by environment.
[GFX1-]: Failed to create EGLConfig for WebRender!
[GFX1-]: Failed GL context creation for WebRender: 0
[GFX1-]: FEATURE_FAILURE_WEBRENDER_INITIALIZE_UNSPECIFIED
[GFX1-]: Failed to connect WebRenderBridgeChild.
[GFX1-]: Fallback WR to SW-WR

First reported here.

Regressions: 1735045

Since last update, I am experiencing somewhat unusual amounts of CPU usage, on Linux / Mesa. eglinfo at the end shows

Device platform:
eglinfo: eglInitialize failed

that's why I assume this issue is to blame? If not, please redirect me. I don't know what my system's issue with egl is, but at least xwininfo -root |grep Depth reports 24.

It also happens on a pristine nightly build, in its own new profile folder:
See process information here: While htop reports relatively low cpu percentage, ps -eo pcpu=,comm:6= --sort=-pcpu |head reports a pretty serious value (15.8). The laptop fan goes wild even when setting all cores' cpu max freq to their minimum. In other words, this indeed looks like a renderer issue.

OpenGL renderer string: Mesa Intel(R) HD Graphics 530 (SKL GT2)
x86_64 Linux 5.4.150-1-MANJARO

I hope this helps you people out a bit.

(no idea why Nightly sets its language to German, even though my IP is German, my system's language is set to English. Worrying :-/)

(In reply to eochgls from comment #15)

Since last update, I am experiencing somewhat unusual amounts of CPU usage, on Linux / Mesa. eglinfo at the end shows

Device platform:
eglinfo: eglInitialize failed

Device platform is irrelevant. X11 or Wayland platform are used.

that's why I assume this issue is to blame? If not, please redirect me. I don't know what my system's issue with egl is, but at least xwininfo -root |grep Depth reports 24.

I have the same on Gnome Xwayland.

It also happens on a pristine nightly build, in its own new profile folder:
See process information here: While htop reports relatively low cpu percentage, ps -eo pcpu=,comm:6= --sort=-pcpu |head reports a pretty serious value (15.8). The laptop fan goes wild even when setting all cores' cpu max freq to their minimum. In other words, this indeed looks like a renderer issue.

You can set gfx.x11-egl.force-disabled to true and restart Firefox to check how GLX behaves in comparison.
My worst problem was assumingly caused by bug 1382886 (network.process.enabled, network.http.http3.enabled, network.http.http3.enable_0rtt).
Please try to capture such a moment of unusual CPU usage with https://profiler.firefox.com/, click to share a link and file separate a bug about it. Thanks!

See Also: → 1739924
Regressions: 1739924
See Also: 1739924
See Also: → 1818992
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: