Closed Bug 1750388 Opened 2 years ago Closed 2 years ago

VA-API 4320p (8k) video playback stutters

Categories

(Core :: Audio/Video: Playback, defect, P3)

Firefox 96
defect

Tracking

()

RESOLVED FIXED
102 Branch
Tracking Status
firefox96 --- disabled
firefox102 --- fixed

People

(Reporter: elfarto, Assigned: stransky)

References

(Blocks 1 open bug)

Details

Attachments

(7 files)

Steps to reproduce:

Using the nvidia-vaapi-driver[1] to playback an 8k video (either on Youtube or a simple <video/> element from a local file).

[1] https://github.com/elFarto/nvidia-vaapi-driver

Actual results:

It does not result in smooth playback. I've traced this issue back to a lack of VSYNC compositing. A 4k video plays back fine, and I can see each frame being requested due to a VSYNC.

Doing anything like moving the mouse will cause new frames to be drawn, but as soon as you stop, it stops drawing new frames.

I've tried debugging the issue, but I can't find why it never enables VSYNC for only 8k videos. This only happens with VA-API, software decoding plays it back fine. I've tested my library on the 8k video separately through mpv, and it can play the video back without issue.

So the issue appears to be with large videos through VA-API in Firefox that leads to compositing on each VSYNC to never be enabled.

I've uploaded a profile of the issue: https://share.firefox.dev/327Su0g

I realise this is a fairly niche issue, so I'm just looking for some pointers on where to look for the issue.

Expected results:

It should playback smoothly.

Component: Untriaged → Audio/Video: Playback
Product: Firefox → Core
Severity: -- → S3
Priority: -- → P3

Is nvidia-vaapi-driver supposed to work on Wayland already? It would be great if somebody with >=495 driver capable hardware could check this also happens on the native Wayland backend.

Just got GNOME on Wayland setup on my machine, and I can confirm the same thing happens.

(In reply to Stephen from comment #2)

Just got GNOME on Wayland setup on my machine, and I can confirm the same thing happens.

Just to be sure: you started FF with the native Wayland backend (MOZ_ENABLE_WAYLAND=1), not Xwayland?

(In reply to Robert Mader [:rmader] from comment #3)

Just to be sure: you started FF with the native Wayland backend (MOZ_ENABLE_WAYLAND=1), not Xwayland?

I didn't specifically start it with that variable set, but I did check that the 'Window Protocol' field in about:support reported 'wayland'.
I've also just retested with that env var set, and I get the same issue.

Thanks! A next step may be to run with MOZ_LOG="Dmabuf:5, PlatformDecoderModule:5" to get lots of dmabuf related debug output.

Here's the log of an 8k local video with the following env vars set:

NVD_LOG=1 MOZ_LOG="Dmabuf:5, PlatformDecoderModule:5"

As far as I can tell this doesn't show anything useful. It still seems to be decoding correctly, it's just the display that's not working correctly.

Attachment #9259277 - Attachment mime type: application/octet-stream → text/plain

(In reply to Stephen from comment #0)

I've uploaded a profile of the issue: https://share.firefox.dev/327Su0g

This was made with Firefox 96. Does the problem still occur with https://nightly.mozilla.org?

(In reply to Darkspirit from comment #7)

This was made with Firefox 96. Does the problem still occur with https://nightly.mozilla.org?
Yes, this issue still occurs with nightly.

I think I've gotten a bit closer to the issue. I've worked my way to WebRenderBridgeParent::MaybeGenerateFrame, and noticed the first real difference between a 4k and 8k video.
On the 4k video, MaybeGenerateFrame is called with the reasons VSYNC | ASYNC_IMAGE, but on the 8k video, MaybeGenerateFrame is rarely called, and never with ASYNC_IMAGE.

Yes, I can reproduce it too with 8K/AV1/VA-API.

I've captured some logs with MOZ_LOG="MediaDecoder:5" set, showing that for 8k videos, it's just dropping the frame before it ever gets a chance to do anything with it. The other difference I've noticed is that it's only ever queuing 1 frame for 8k videos, but 3-4 for 4k.

Attached file log-4k
Attached file log-8k

Alastor, any idea here? I think you did some work in this area recently.
Thanks.

Flags: needinfo?(alwu)

I've been debugging it further, and managed to 'fix' it, but effectively changing mMinVideoQueueSize to 1. I don't think this is the correct fix, just working around another bug.

The other thing I noticed were the timestamps of the dropped frames vs the clock time. There's no actual audio track on the video I'm using, so I'm not sure where it's coming from, but the audio clock is faster than the video. This combined with only 1 frame in the buffer leads to every frame getting dropped.

Is the video you were testing in the profiled result in comment0 only containing video track? By checking that result, in MediaDecoderStateMachine thread, when you check the markers, you can see the time spending on the first video frame was a lot (1.2s), and all following video frames were discarded by VideoSink.

Would this situation only happen on 8K + VAAPI? Or it would also happen on 8K + sw decoding or 4K + VAAPI? Would you mind to help me capture the log with MOZ_LOG=MediaDecoder:5,PlatformDecoderModule:5?

Thank you.

Flags: needinfo?(alwu) → needinfo?(elfarto)
Attached file log-4k-vaapi

Log of https://www.youtube.com/watch?v=linlz7-Pnvw playing at 4k with VA-API enabled.

Flags: needinfo?(elfarto)
Attached file log-8k-vaapi

Log of https://www.youtube.com/watch?v=linlz7-Pnvw playing at 8k with VA-API enabled.

Attached file log-8k-sw

Log of https://www.youtube.com/watch?v=linlz7-Pnvw playing at 8k with VA-API disabled.

I've attached the logs of the same video playing in different scenarios, with the specified MOZ_LOG set.

The original profile was with the same youtube video used for the new log files so it had audio, only my recent testing has been using a video without audio.

I've only seen this issue in 8k with VA-API enabled, everything else has been fine.

As I can reproduce that with any AV1 8K video played via VA-API I'll look at it. Also only VA-API playback is broken, SW decode is ok.

After several more hours debugging this issue, I've come to the unfortunate conclusion that the way Firefox is driving VA-API, in combination with how my VA-API over NVDECODE library operates means and my GPU, mean that Firefox can't play 8k video. Given that Martin is having this issue too, I would say that the implementation has a inherent bottleneck, which only 8k video is showing up.

Firefox is calling vaBegin/Render/EndPicture then immediately calling vaExportSurfaceHandle. This causes the vaExportSurfaceHandle call itself to take 17ms (plus the decode step takes it to 20ms), which exceeds the frame time for 60fps video.

I had missed this initially as I didn't directly look at my library, as it operates correctly when mpv is driving it. When mpv drives my library, the export only takes 3ms. This is due to mpv threading the decode, one thread calls Begin/Render/End, and the other Export and has a few frames buffered.

The key thing here mpv allows multiple frames to be decodes in parallel, Firefox causes the decoding to become serialised.

Great, thanks for investigation! We may enable threaded decode or decode more frames if possible.

When testing I see more significant delays here but not caused by va-api. vaExportSurfaceHandle mat be a factor here but from the log I see we even fail to submit frames for decoding - decoder waits for frames to decode. I see ~4 fps while playing via VA-API but SW decode works 60 fps on my box.

So it looks like we fail to even submit frames to decoder when va-api is enabled.

If you're using a different VA-API implementation, then you may find that the delay is introduced somewhere else. Due to the current design of it I end up waiting for the GPU in the vaExportSurfaceHandle call, which is not technically correct, as it's not required to call that method for each frame. I'm working on moving the blocking call to a background thread.

Well, looks like we're really hitting some HW limitations here. I testes VA-API playback with different player (mpv) and it behaves similarly as Firefox, there's huge frame drop while playing 8K@60Hz clips like 400 frames per 10 seconds.

OTOH 8K@23Hz clip is played OK with both Firefox & mpv.
Also SW playback of 8K@60Hz is fine by both Firefox & mpv.

I guess we may need to disable VA-API for 8K clips due to this limitation for now.

(In reply to Stephen from comment #22)

The key thing here mpv allows multiple frames to be decodes in parallel, Firefox causes the decoding to become serialised.

Have you configured mpv for that or is it a default setup? Because I see huge frame drop even with mpv.

Flags: needinfo?(elfarto)

From my understanding of it, decoding multiple frames in parallel is the default for mpv, but I'm no expert on the matter.

I have seen the issue where Firefox can't playback a 8k/60hz video, but mpv via VA-API can on my Geforce 1060, so it's possible that your specific hardware isn't capable of it.

I'm not sure disabling 8k video decode via VA-API across the board is the best idea. It would be nicer to fallback to software if the frame decode keeps taking too long, although I understand that's a trickier fix to implement.

Flags: needinfo?(elfarto)

This seems to be a bug in Firefox itself how frame drop is handled on Firefox side - we fail to show the pictures if frame drop is significant.

For the VA-API performance issue it seems to be a problem with specific video clip - I tested different 8K AV1 clip and va-api decoding takes ~0ms but with this one it takes ~40ms. May be related to 'av1_frame_split' used there.

I wonder how to handled that on FF side - we may switch back to SW decode in such case but we may also improve the decoding as the framedrop also affects SW decode.

btw. I see the same va-api performance drop while playing via. mpv/va-api.

I think this bug has two parts:

  1. When video decode is slow, we don't decode any frames but all ones are dropped. That applies to SW decode too.
  2. When VA-API decode is slow we should switch back to SW decode.

(In reply to Martin Stránský [:stransky] (ni? me) from comment #32)

I think this bug has two parts:

  1. When video decode is slow, we don't decode any frames but all ones are dropped. That applies to SW decode too.

This can be easily fixed by setting media.ruin-av-sync.enabled to true. That pref ensures that video queue contains at least one frame we can display:
https://searchfox.org/mozilla-central/rev/dd404f43c7198b1076fe5d7e05b1e6b1a03bdfeb/dom/media/mediasink/VideoSink.cpp#505
when this pref is set off (default) we drop all out-of-time frames regardless of the result.

As for media.ruin-av-sync.enabled pref:
https://bugzilla.mozilla.org/show_bug.cgi?id=1414759#c4

media.ruin-av-sync.enabled causes the decode path to not drop frames that are late. This means we'll paint frames that are late, so A/V sync will be out. When the decode can't keep up video frames often come out of the decode pipeline after they're supposed to keep painting. There's a talos test that counts how many frames we can render per second. When I fixed our A/V sync logic to drop those frames, we got a talos regression, as we don't paint as many frames anymore (but the ones we do paint, we paint in time with the audio). So to placate the people who care about such things, we added a pref so that the gfx team can continue to test how fast their pipeline is at painting (the wrong thing).

I think that radical approach does not apply any more. When we fail in such radical way (nothing is visible) we should abandon the AV sync in favor of showing something at least.

media.ruin-av-sync.enabled heuristics may be Bug 1316571 Bug 1616500

Used Bug 1316571 for the general playback issue.

Depends on: 1768191

With Bug 1768191 we know if HW decode is slow on Linux for a particular clip. So we may disable VA-API and switch back to SW decode in such case, for instance the 8k clips are decoded ok by SW decoder on my box but fails to decode with VA-API.

Assignee: nobody → stransky
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Pushed by stransky@redhat.com:
https://hg.mozilla.org/integration/autoland/rev/e48cb076df43
[Linux] Switch back to SW decode if HW decode is slow r=alwu
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 102 Branch
Flags: qe-verify+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: