Closed Bug 599814 Opened 14 years ago Closed 10 years ago

Some pages like sinclairhoerspiele.de come up with all accessibles unknown and the windows having the "invisible" state

Categories

(Core :: Disability Access APIs, defect)

x86
Windows 7
defect
Not set
critical

Tracking

()

RESOLVED WORKSFORME
Tracking Status
blocking2.0 --- final+

People

(Reporter: MarcoZ, Assigned: davidb)

References

()

Details

(Keywords: regression, Whiteboard: [hardblocker][?workaround: disable IPC])

Attachments

(5 files, 2 obsolete files)

STR:
1. With NVDA 2010.2 running, open above URL

Expected: After the page loads, NVDA should start reading the page.
Actual: NVDA says "unknown", followed by the document name, and has no virtual buffer. Arrow keys don't produce any results.

Exploring the hierarchy with the object navigator shows that there are two invisible windows, with only 1 unknown child accessible.

This is a regression from the post-bug 130078 fixes.
Trying to bring up the same page with JAWS 11 and current Minefield freezes the browser completely. It can be killed via Task manager, doesn't generate a crash, just goes away.
blocking2.0: --- → ?
Yep, blocking. The main window seems to change (?) when this site is the active tab and I see a lot of 'unqueued messages' shell output. Eventually I can get more of an a11y tree in accprobe... needs more investigation.
blocking2.0: ? → final+
A page that might show a related symptom, although not this drastically, is this with these STR:
1. With NVDA running, go to http://www.mtv.de/charts/germany
2. After the page loads, press the letter t to move the virtual cursor to the table with the music chart listing.
3. Now, down arrow line by line and let NVDA read each table cell.

Observation: After a few lines, after finishing to read the table cell text, NVDA will suddenly announce the document title, as if focus had shifted to that accessible. Virtual buffer is then gone. arrow keys no longer move any caret, virtual or otherwise.

4. Press Tab.

Result: NVDA will most likely announce "unknown".

5. Press Tab again.

Result: This time, NVDA will recover and show a virtual buffer representation again, with the virtual cursor on the item that now has focus visually, too.

The above steps 3 and following can be observed numerous times in this table over and over.
Marco, it appears I can't reproduce the bug following bug description with applied wip7 from bug 570275. Could you confirm, please?
Assigning to Alexander because I think bug 570275 will fix this and all the toher bugs in the universe.
Assignee: nobody → surkov.alexander
(In reply to comment #4)
> Marco, it appears I can't reproduce the bug following bug description with
> applied wip7 from bug 570275. Could you confirm, please?

I think I can still reproduce, although it is behaving a bit differently. The deciding factor may be that I also have braille enabled, but I definitely get a browser hang on www.sinclairhoerspiele.de still, and the virtual buffer still doesn't come up for me.
(In reply to comment #1)
> Trying to bring up the same page with JAWS 11 and current Minefield freezes the
> browser completely. It can be killed via Task manager, doesn't generate a
> crash, just goes away.

I'm not sure whether the comment #6 is about JAWS, but JAWS with bug 570275 doens't hand for me and I can arrow down whole the page. Perhaps this is debug build. I'll try release one.
(In reply to comment #7)
> (In reply to comment #1)
> > Trying to bring up the same page with JAWS 11 and current Minefield freezes the
> > browser completely. It can be killed via Task manager, doesn't generate a
> > crash, just goes away.
> 
> I'm not sure whether the comment #6 is about JAWS, but JAWS with bug 570275
> doens't hand for me and I can arrow down whole the page. Perhaps this is debug
> build. I'll try release one.

That must flash player hang. Once I installed it, Firefox hangs with JAWS on this page.
(In reply to comment #8)

> That must flash player hang. Once I installed it, Firefox hangs with JAWS on
> this page.

and release build hangs only
The problem is likely we don't answer properly on WM_GETOBJECT events (Jamie confirmed this on irc). I think the reason is Firefox is busy and he defers window messages to deal with them later and calls default window proc (http://mxr.mozilla.org/mozilla-central/source/ipc/glue/WindowsMessageLoop.cpp). That works nicely for notification messages but it doesn't for WM_GETOBJECT where we should return accessible object (for the reference, here's expected  WM_GETOBJECT message processing - http://mxr.mozilla.org/mozilla-central/source/widget/src/windows/nsWindow.cpp#5251).

Cc'ing Ben to get his opinion.
bumping to critical since it makes Firefox unusable on web pages with AT.
Severity: major → critical
cc'ing more people hoping to clear things.

When and why the window procedure is neutered? I see lot of WM_GETOBJECT events handled by that stub window procedure what means AT can't get an access to Firefox. Is there a way to get presshell associated with HWND that windows message is handled for? Is it safe to do it? What kind of operations are allowed at this point?
(In reply to comment #12)
> cc'ing more people hoping to clear things.
> 
> When and why the window procedure is neutered? I see lot of WM_GETOBJECT events
> handled by that stub window procedure what means AT can't get an access to
> Firefox. Is there a way to get presshell associated with HWND that windows
> message is handled for? Is it safe to do it? What kind of operations are
> allowed at this point?

WindowsMessageLoop.cpp handles intercepts windowing events when we're waiting on an ipc message sent to another process. We have to trap events that might loop around and trigger additional ipc messages, which can cause dead locks.

We're not currently handling wm_getobject here and we might be able to pass it through so that it gets default behavior. It really depends on what 3rd party code that sends this is doing. You can test to see what happens by adding this event to ProcessOrDeferMessage, grabbing the old window procedure using GetProp -

WNDPROC oldWndProc = (WNDPROC)GetProp(hwnd, kOldWndProcProp);

and pass it on.

I'm not sure about retrieving pres shell, but I imagine that's not possible for 3rd party apps.
It is not safe to dispatch WM_GETOBJECT while there is a paint pending, although async-painting might relieve that particular problem. The third-party code is JAWS, requesting the IAccessible for the browser.
(In reply to comment #13)

> WindowsMessageLoop.cpp handles intercepts windowing events when we're waiting
> on an ipc message sent to another process. 

It sounds it may take forever and it happens so often that screen reader doesn't read anything on the web page or reads inconsistent things after huge delay.

> We're not currently handling wm_getobject here and we might be able to pass it
> through so that it gets default behavior. It really depends on what 3rd party
> code that sends this is doing.

screen reader can do anything that Gecko accessibility API allows. I suspect the following things may be dangerous: changing DOM, operations with editor, flush layout.

on the another hand we can't use defWindowProc because it returns system generated accessible object and screen reader gets cheated thinking the firefox accessible object was returned and that leads to inconsistent results.

I kept in mind Gecko internals, we need presshell to returned accessible object for it. Having oldWndProc that's not necessary.
What would happen if WM_GETOBJECT occasionally failed? Would screen readers retry?
(In reply to comment #16)
> What would happen if WM_GETOBJECT occasionally failed? Would screen readers
> retry?

I don't think so they will retry because if we don't return our accessible object then Windows takes care to return stub accessible object and AT thinks everything is ok while it's not.
I filed try server build that calls oldWndProc on WM_GETOBJECT message. Marco, can you try please it to see whether it improves the things? - http://ftp.mozilla.org/pub/mozilla.org/firefox/tryserver-builds/surkov.alexander@gmail.com-aececc47b761
The try server build gives me the following results:
1. On http://www.amazon.de, the behavior is unchanged. The virtual buffer of NVDA doesn't come up until I alt-tab away and back to Firefox. It announces that the document is in a "busy" state before I alt-tab away, it appears as if it never finishes loading, yet when I alt-tab away and back, everything works. If I activate links from this main page or execute a search, the subsequent pages load fine.

2. On http://www.sinclairhoerspiele.de, I now do get a virtual buffer, but the virtual buffer navigation is EXTREMELY sluggish, like taking a second or two to move from line to line. Also when closing the tab, Firefox freezes NVDA. I have to kill the Firefox process manually to get NVDA speech back. So the closing of the tab may not even work right. It is the only tab open.
With JAWS, the results are also not better at all, more to the contrary:
1. When launching this try-server build, the initial focus to the awesome bar is not seen by JAWS at all. It reports the previous window (the Windows explorer window from where I launched the try-server build).
2. After alt-tabbing away and back to Minefield once, JAWS reports "lost focus" when pressing INSERT+T for the window title.
3. After a second alt-tab away and back, it finally gets the right focus and reports automatically that we're on the awesome bar.

When trying to launch http://www.sinclairhoerspiele.de, there's an immediate freeze. I have to kill the Firefox process via Task manager to get JAWS speech back. This is with JAWS 11 release, not a 12 beta.
http://hg.mozilla.org/try/rev/aececc47b761

The freezes are likely caused by this. A stack of one of the freezes would confirm it, the delivery of WM_GETOBJECT would be on the stack.

Can we try testing a nightly with ipc disabled? I'm curious if we have multiple bugs here or if all these issues are related directly to WM_GETOBJECT.
(In reply to comment #21)

> Can we try testing a nightly with ipc disabled? I'm curious if we have multiple
> bugs here or if all these issues are related directly to WM_GETOBJECT.

The option ac_add_options --disable-ipc should be added to .mozconfig, right?

Marco, could you try?
(In reply to comment #22)
> (In reply to comment #21)
> 
> > Can we try testing a nightly with ipc disabled? I'm curious if we have multiple
> > bugs here or if all these issues are related directly to WM_GETOBJECT.
> 
> The option ac_add_options --disable-ipc should be added to .mozconfig, right?
> 
> Marco, could you try?

Just disable it via the pref 'dom.ipc.plugins.enabled'.
I disabled IPC and tested both www.sinclairhoerspiele.de and www.amazon.de with both JAWS and NVDA, with a regular nightly build. Here are the results:

1. JAWS + www.sinclairhoerspiele.de: No freezing. Works as expected.
2. JAWS + www.amazon.de: Normal behavior.
3. NVDA + www.sinclairhoerspiele.de: Sluggish, but no freezes.
4. NVDA + www.amazon.de: Same as with IPC enabled: The document never appears to finish loading, and I have to alt-tab away and back to Minefield to get a virtual buffer.
It sounds like we have two different issues here. 

Jamie, would you get some time to look at issues with NVDA with ipc disabled?
(In reply to comment #24)

> 3. NVDA + www.sinclairhoerspiele.de: Sluggish, but no freezes.
> 4. NVDA + www.amazon.de: Same as with IPC enabled: The document never appears
> to finish loading, and I have to alt-tab away and back to Minefield to get a
> virtual buffer.

Marco, could you check earlier NVDA versions?
(In reply to comment #26)
> Marco, could you check earlier NVDA versions?
He can't. Earlier versions won't work with Firefox 4.
(In reply to comment #25)
> It sounds like we have two different issues here.
I agree. However, it's hard to determine whether the WM_GETOBJECT issue is influencing the other.

> Jamie, would you get some time to look at issues with NVDA with ipc disabled?
sinclairhoerspiele.de with IPC disabled works fine for me here, if a little sluggish from time to time. I suspect this sluggishness is due to the Flash content. In fact, disabling Flash but leaving IPC enabled also works, which makes sense, since Flash is a plugin which would be doing IPC if enabled.

I think amazon.de is a different issue. I suspect the document isn't firing a documentLoadComplete event, but haven't confirmed that yet.

One question: does disabling IPC have any effect if no pages are using any plugins? I assume not, as only plugins do IPC, but I want to be sure. If so, this issue should only occur on pages that use plugins (which is probably quite a lot these days).
(In reply to comment #28)
> I think amazon.de is a different issue. I suspect the document isn't firing a
> documentLoadComplete event
Not quite. It *is* firing the event, but the document still has the busy state when it gets fired, so NVDA ignores it. Here's the entry from AccProbe's event monitor:
IA2_EVENT_DOCUMENT_LOAD_COMPLETE Accessible Name: Amazon.de: Günstige Preise bei Elektronik & Foto, DVD, Musik, Bücher, Games, Sp, Accessible Role: document, Accessible State: [focusable, readOnly, focused, busy], Event Data: hwnd=1700236; objectId=-4; childId=-154822576; windowClass=MozillaWindowClass;
I filed bug 613146 for amazon.de issue
Whiteboard: [harblocker]
Whiteboard: [harblocker] → [hardblocker]
So for the IPC part of this bug my guess is that the dll the screen reader injects into our main and plugin container processes is doing IPC that conflicts with the messaging between the main and plugin processes? I don't know enough about how our IPC is implemented on Windows to be sure, but a hunch is that this might be related to bug 580644.

I've gotten a lot of twitter feedback about hangs and crashes with Jaws and flash.

Jimm, would love your thoughts here or maybe a phone call/IRC.
Whiteboard: [hardblocker] → [hardblocker][?workaround: disable IPC]
(In reply to comment #31)
> So for the IPC part of this bug my guess is that the dll the screen reader
> injects into our main and plugin container processes is doing IPC that
> conflicts with the messaging between the main and plugin processes?
I don't think so. Once the buffer is rendered, NVDA's  in-process dlls don't do much that is time consuming. Also, we use RPC to communicate with NVDA's main process, not window messages, which is where the problem lies.

> I've gotten a lot of twitter feedback about hangs and crashes with Jaws and
> flash.
Anything about NVDA and Flash? I can't speak for how JAWS does IPC.
(In reply to comment #32)
> (In reply to comment #31)
> Anything about NVDA and Flash?

Not specifically. Any flash bugs open on NVDA side?
(In reply to comment #18)
> I filed try server build that calls oldWndProc on WM_GETOBJECT message. Marco,
> can you try please it to see whether it improves the things? -
> http://ftp.mozilla.org/pub/mozilla.org/firefox/tryserver-builds/surkov.alexander@gmail.com-aececc47b761

Alexander can you attach the WIP for this?
What is the significance of returning PR_FALSE from nsWindow::ProcessMessage? I notice we never do this in the WM_GETOBJECT case?
Assignee: surkov.alexander → bolterbugz
Note related investigation is also happening on bug 624786.
Attachment #503373 - Attachment is obsolete: true
I'm not sure how to proceed here.
Whiteboard: [hardblocker][?workaround: disable IPC] → [hardblocker][?workaround: disable IPC][help needed]
Does disabling ipc.plugins require a restart?
Whiteboard: [hardblocker][?workaround: disable IPC][help needed] → [hardblocker][?workaround: disable IPC]
So why doesn't this affect Firefox 3.6? If I'm not mistaken, IPC is also used for plugins there.
(In reply to comment #41)
> So why doesn't this affect Firefox 3.6? If I'm not mistaken, IPC is also used
> for plugins there.

Windowless widgetry in FX4.
(In reply to comment #40)
> Does disabling ipc.plugins require a restart?

yes if plugins have already been loaded before changing it.
(In reply to comment #42)
> (In reply to comment #41)
> > So why doesn't this affect Firefox 3.6? If I'm not mistaken, IPC is also used
> > for plugins there.
> 
> Windowless widgetry in FX4.

That and 3.6 only enabled oopp for certain plugins. (flash, silverlight and maybe qt I believe.)
Report from FS (Jaws) is that they hang on get_accState which they are calling on the unlabeled video play button.
(In reply to comment #45)
> Report from FS (Jaws) is that they hang on get_accState which they are calling
> on the unlabeled video play button.

It appears the problem is the flash's one (assuming the video is done on flash), perhaps the flash calls into Firefox making to hand JAWS. In other words the problem may be not related to WM_GETOBJECT handling.
Attached file a hanging call stack (obsolete) —
(In reply to comment #46)
> In other
> words the problem may be not related to WM_GETOBJECT handling.

Agreed.
(In reply to comment #47)
> Created attachment 506389 [details]
> a hanging call stack

perhaps JAWS is doing crossprocess call
FSDomNodeFlash.dll!0cb418dd()
FsDomNodeFirefox.dll!0cae34ea()

and I don't see the calls into firefox or flash in the stack.
(In reply to comment #48)
> Created attachment 506390 [details]
> perhaps more interesting stack

the same except it appears the JAWS is called from JS. Though no idea how it could happen.
Yeah probably can't trust those stack frames.
(In reply to comment #49)
> (In reply to comment #47)
> > Created attachment 506389 [details]
> > a hanging call stack
> 
> perhaps JAWS is doing crossprocess call

Almost certainly.
I'm trying to connect Adobe with FS on this (via email).
I'll prepare the prefs workaround for jaws only.
Attachment #506390 - Attachment description: perhaps more interesting stack → FF main process stack
Attachment #506389 - Attachment is obsolete: true
I could see us taking this route for FF4. I tested it locally.
Attachment #506420 - Flags: review?(surkov.alexander)
Attachment #506420 - Flags: review?(marco.zehe)
(In reply to comment #56)
> Created attachment 506420 [details] [diff] [review]
> Jaws Workaround
> 
> I could see us taking this route for FF4. I tested it locally.

how does it work taking into account comment #43?
(In reply to comment #44)
> > > So why doesn't this affect Firefox 3.6? If I'm not mistaken, IPC is also used
> > > for plugins there.
...
> That and 3.6 only enabled oopp for certain plugins. (flash, silverlight and
> maybe qt I believe.)
Note that I think Flash is actually the major culprit in this bug, but I've never seen this issue with Flash in 3.6.
Comment on attachment 506420 [details] [diff] [review]
Jaws Workaround V1

This is straight-forward. r=me
Attachment #506420 - Flags: review?(marco.zehe) → review+
(In reply to comment #57)
> (In reply to comment #56)
> > Created attachment 506420 [details] [diff] [review]
> > Jaws Workaround
> > 
> > I could see us taking this route for FF4. I tested it locally.
> 
> how does it work taking into account comment #43?

It doesn't protect against all first run situations but I think this is the best we can (or want to) do.
(In reply to comment #60)

> It doesn't protect against all first run situations but I think this is the
> best we can (or want to) do.

Could we detect installed JAWS in installer instead?
(In reply to comment #61)
> (In reply to comment #60)
> 
> > It doesn't protect against all first run situations but I think this is the
> > best we can (or want to) do.
> 
> Could we detect installed JAWS in installer instead?

I wouldn't prefer that approach.
Ok, can we do that on Firefox startup (check running JAWS or installed JAWS)?
(In reply to comment #63)
> Ok, can we do that on Firefox startup (check running JAWS or installed JAWS)?

1. We wouldn't want to turn off IPC if NVDA is running, but JAWS is also installed on the machine.

And that brings me to a question: When we turn off IPC like in the current patch, does that get saved to the preferences? In other words, is this turned off and STAYS turned off even if, for example, on the next run NVDA is running instead of JAWS? We wouldn't want that, esp for those who have multiple screen readers installed.

Note I only thought about this implication after my initial review.
(In reply to comment #64)
It would stay off until the user turned it back on. I could ensure it is turned on if NVDA is detected, but that seems strange. I would also worry about it complicating working with multiple screen reader end-users to resolve issues. Do you agree?

(In reply to comment #63)
I don't want to get into the startup code path for FF4 this late in the game but I would support investigating doing this post FF4. Does this sound reasonable?
(In reply to comment #65)

> I don't want to get into the startup code path for FF4 this late in the game
> but I would support investigating doing this post FF4. Does this sound
> reasonable?

Note, the used startup term may be stretched, it can be any place before plugin is loaded. The changes you suggest to do inside an a11y doesn't look more dangerous if they were outside an a11y. Taking into account your patch is a half solution then it's worth to do I think.

Following this you can restore perf value when a11y shutdowns to avoid an affect on NVDA next run.
(In reply to comment #66)
I'll take a look.
I could check inside nsNPAPIPlugin::RunPluginOOP directly returning PR_FALSE if jhook is detected; bypassing the pref altogether. I'm not sure yet if I prefer it and welcome feedback.
Attachment #506420 - Attachment description: Jaws Workaround → Jaws Workaround V1
Attachment #506420 - Flags: review?(surkov.alexander)
Note the plugin stack stops here (in SyncChannel::WaitForNotify):
   while (1) {
      MSG msg = { 0 };
      // Don't get wrapped up in here if the child connection dies.
      {
        MutexAutoLock lock(mMutex);
        if (!Connected()) {
          break;
        }
      }

      // Wait until we have a message in the queue. MSDN docs are a bit unclear
      // but it seems that windows from two different threads (and it should be
      // noted that a thread in another process counts as a "different thread")
      // will implicitly have their message queues attached if they are parented
      // to one another. This wait call, then, will return for a message
      // delivered to *either* thread.
      DWORD result = MsgWaitForMultipleObjects(1, &mEvent, FALSE, INFINITE,
                                               QS_ALLINPUT);
>     if (result == WAIT_OBJECT_0) {


How does MsgWaitForMultipleObjects work?
(In reply to comment #69)
>       DWORD result = MsgWaitForMultipleObjects(1, &mEvent, FALSE, INFINITE,
>                                                QS_ALLINPUT);
> How does MsgWaitForMultipleObjects work?
http://msdn.microsoft.com/en-us/library/ms684242%28v=vs.85%29.aspx
In short, this call waits for a window message or mEvent to become signaled and returns as soon as either of those happens.

Note this gotcha:
MsgWaitForMultipleObjects does not return if there is unread input of the specified type in the message queue after the thread has called a function to check the queue. This is because functions such as PeekMessage, GetMessage, GetQueueStatus, and WaitMessage check the queue and then change the state information for the queue so that the input is no longer considered new. A subsequent call to MsgWaitForMultipleObjects will not return until new input of the specified type arrives. The existing unread input (received prior to the last time the thread checked the queue) is ignored.

So, I guess if an in-process dll called PeekMessage, etc. without removing the message from the queue and Gecko doesn't handle that case by consuming pending messages before calling MsgWaitForMultipleObjects, this could cause a problem.
Just hanging a WIP for a more surgical approach. I haven't been able to easily test this yet as I'm blocked by and attending to higher priority bug 629137.
I can't recreate this bug right now; confirming with FS.
Confirmed! We are confident this was fixed via bug 626667. Woot!
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → WORKSFORME
So, it wasn't a plugin issue? JAWS doesn't hang and NVDA doesn't announce unknown anymore? Marco, do you confirm?
Blocks: 677883
I'm not sure if there is another bug for this. Flash intensive sites (such as Flash YouTube) still cause NVDA to lose focus and report unknown a lot. This is caused by Firefox not responding to WM_GETOBJECT. This is a big complaint from a lot of Firefox + NVDA users, though I'm not sure if all cases are due to Flash/IPC.
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
I'm starting to see this quite a lot and it makes browsing on many sites very frustrating.

I assume the patch from bug 677883 doesn't apply for non-e10s builds? Can that patch somehow be used as a base to fix this bug?
David can you start the issue tracking again please?
I can no longer reproduce this bug. I wasn't able to for quite a while, and the page has since also been re-done a bit so it may no longer show. I can, however, now reliably reproduce bug 781971, which seems to be related. However, I see an immediate hang, no unknown accessibles as described here.
(In reply to James Teh [:Jamie] from comment #76)
> I'm starting to see this quite a lot and it makes browsing on many sites
> very frustrating.
> 
> I assume the patch from bug 677883 doesn't apply for non-e10s builds? Can
> that patch somehow be used as a base to fix this bug?

I'm not seeing this in FF nightly on my windows 7 VM. Jamie can you attach your about:support contents (or email them to me)?

If the bug Marco mentions is closer to what you are experiencing please reclose this one.
I see the unknowns as described here, not a hang, so this is different to bug 781971.
Certain sites seem to reproduce this more regularly than others. I don't have any URLs right now, but I'll comment here when I find one.
(In reply to James Teh [:Jamie] from comment #81)
> Certain sites seem to reproduce this more regularly than others. I don't
> have any URLs right now, but I'll comment here when I find one.

Hi Jamie, any updates here?
Flags: needinfo?(jamie)
I haven't managed to find a page that reproduces it more regularly. I still see it often on any sites that have Flash, but it's not predictable. I wonder if we can put together a page that embeds a huge number of Flash objects in an attempt to make it more probable.
Flags: needinfo?(jamie)
Bug 1014673 should hopefully have fixed this. Certainly, I haven't seen it occurring lately. I'll close for this for now; it can always be reopened if we get further reports.
Status: REOPENED → RESOLVED
Closed: 13 years ago10 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: