Closed Bug 1149415 Opened 9 years ago Closed 11 months ago

Empty embedded object (HyperTextAccessible child) breaks line and word offsets

Categories

(Core :: Disability Access APIs, defect, P3)

x86_64
Windows 8.1
defect

Tracking

()

RESOLVED FIXED
Tracking Status
firefox113 --- fixed
firefox114 --- fixed
firefox115 --- fixed

People

(Reporter: Jamie, Assigned: MarcoZ)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

More text issues. :(

Str:
1. Open this URL: data:text/html,<a name="foo"></a>b
2. Get the accessible for the document.
3. Get the line offsets at offset 0.
Expected: (0, 2)
Actual: (0, 0)
Jamie, maybe we should push harder on new text interface rather than putting effort here?
We could consider that, but with respect, the reality is that we discussed the new text interface over 6 months ago and there has been no further progress. Even if we moved forward on it now, it would still be quite some time before we have anything remotely workable. Meanwhile, it's 2015 and we still can't get decent rich text editing support. :)

Also, while I don't presume to understand Gecko's text accessibility code, I'm not convinced the new interface would solve problems like this. There are still going to be nodes with no text that have to be taken into account, so the underlying problem will still need to be solved. In fact, the problem might even be worse. If inline nodes just become part of the text, how will a node with no text be represented? The node is still important because it might have a label, be interactive, etc.
I'm with Jamie on this one regarding the time-frame on having something workable.

And Orca is trying to use the accessible text interface for non-editable text too. So it would be super if these issues could be fixed.
Joanie, the fact that WebKit doesn't use embedded objects much aside, can you please tell me what it does for the following? (Yes, it's insane, but so is the world...)
<p>a<a href="foo" style="padding: 1px;"></a>b</p>
(Normally, that padding would be a background image or something.)
The point is that a user might need to get at that link, but it has no text, which makes it difficult to expose in our rather text-centric APIs.
Jamie! I've so missed our <strike>arguments</strike> happy chats about embedded objects. ;) Jokes aside, here's what I'm seeing:

1. For the accessible paragraph:
   a. text interface, line at offset 0: ('ab', 0, 2)
   b. hypertext interface, link count: 1, getting link 0 returns a valid hyperlink iface object
   c. hyperlink iface object returned by 1b: startIndex: 1, endIndex: 1, getObject: the accessible link
2. For the accessible link:
   a. text interface, line at offset 0: ('', 0, 0)
   b. hyperlink interface, same results as in 1c
Hmm. Do you have a hyperlinkIndex method on your Hypertext interface; i.e. get the index of the hyperlink at character offset o? What happens if you call that for offsets 0, 1 and 2? I'm guessing it returns index 0 for offset 1.

The reason I asked is that I was wondering whether the text alternatives we discussed last year would actually solve this problem. I guess the way WebKit exposes this could work, though it still seems weird that the hyperlink range is collapsed even though it reports as having an offset (not that there's any other way you could do it).
Yes, I have a getLinkIndex() method on the hypertext interface which takes a character offset as its argument. If there's no associated hyperlink for the specified offset, it returns -1. For your particular edge case, getLinkIndex() returns -1 for offset 0 and offset 1. One could make a case for that return value being correct. The character at offset 0 is "a". Is it a link? No. The character at offset 1 is "b". Is it a link? No. Of course, one could also make a case for that return value being incorrect, with index 0 being the expected return value for the link at offset 1 as you guessed would be the case. And with a 0-sized range, the engine wouldn't be lying to us. <insert shrug here>
Whether it's strictly correct is probably irrelevant, but the point is that an AT needs to get at this and walking through all the hyperlinks even when hyperlinkIndex doesn't return it (on the off chance that there *might* be a 0-length hyperlink) is horrible. It's certainly an edge case, but unfortunately, I've seen stuff like this on a few occasions.

Long story short: we don't have a good solution for this yet; even the proposed new text interface doesn't account for this. Gecko's embedded object stuff would work if it exposed the correct offsets, but as we all know, embedded objects are horrible (arguments about whether they're necessary aside).
This is actually worse than i had originally anticipated, though I guess it's logical. Expanding on my example, if you do:
data:text/html,<div><div><div><a name="foo"></a>b</div></div></div>
getting the line offsets at offset 0 for the document and all of the divs returns (0, 0). So it's broken for all ancestors, not just the embedding object.
(In reply to James Teh [:Jamie] from comment #2)
> We could consider that, but with respect, the reality is that we discussed
> the new text interface over 6 months ago and there has been no further
> progress. 

there's lack of focus from all of us, that's a primary reason I think why we keep stumbling

> Even if we moved forward on it now, it would still be quite some
> time before we have anything remotely workable. Meanwhile, it's 2015 and we
> still can't get decent rich text editing support. :)

that's much true for non AT users as well :) and that's the root problem I'd said.

> Also, while I don't presume to understand Gecko's text accessibility code,
> I'm not convinced the new interface would solve problems like this. There
> are still going to be nodes with no text that have to be taken into account,
> so the underlying problem will still need to be solved.

This certainly is true: the base of our text problems is broken Gecko's layout navigation. If we do the proposed approach then it will have all Gecko's bugs. There's implementation difference between current approach and proposed one though. Currently we do several movements in text to detect text lexems and chances to hit a problem are just higher. If we have fixed those Gecko's layout bugs then the current approach worked pretty much correctly and we would probably lived ok with it longer, but that's a whole thing to fix them and much out of scope of sole accessibility.

> In fact, the problem
> might even be worse. If inline nodes just become part of the text, how will
> a node with no text be represented? The node is still important because it
> might have a label, be interactive, etc.

maybe we should have something like content traversal API that would allow you move through text and content both.
As per duplicate bug 1493824, the embedded object just has to be empty; whether it's inline is irrelevant.
Summary: Inline embedded object with no text breaks line offsets → Empty embedded object (HyperTextAccessible child) breaks line offsets
Priority: -- → P3
Summary: Empty embedded object (HyperTextAccessible child) breaks line offsets → Empty embedded object (HyperTextAccessible child) breaks line and word offsets

Note to self: Reevaluate the inlineSpansWithBrs test case introduced in bug 1525631 when dealing with this bug.

(In reply to Marco Zehe (:MarcoZ) from comment #13)

Note to self: Reevaluate the inlineSpansWithBrs test case introduced in bug 1525631 when dealing with this bug.

Turns out the line offset fix for this bug has no adverse effect on the above mentioned test case at all. This one strictly deals with NextLine getting stuck on an empty or otherwise weird embedded character. "Otherwise weird" in this case refers to a center element that has two form elements inside, as the patch for this bug will fix the offsets for the embedded character for that example, too, see the mochitest patch in bug 1052000.

Jamie, am I right in assuming that the word offsets for the original test case are (0, 1, kEmbedChar) for offset 0, and (1, 2, "b") for offset 1?

Flags: needinfo?(jteh)

On some embedded characters, like ones that are empty, or may contain interactive elements such as form fields, the initial move to the next line from the offset fails. The returned tmpOffset is either the starting offset, or even less than, most likely 0. In this case, and if there are more characters in the HyperText, move to the next character and down. This alone fixes the bit where querying the start and end offsets of line_test_1 for the embed returned the same start and end offsets. The subsequent press of the Home key moves to the start of the next line, and all is well.

If that new tmpOffset and the subsequent press of the Home key result in landing on an embedded character, as is the case for the line starting with an empty embedded character, and querying that offset, pressing End from that offset will also result in that thisLineEndOffset to be less or equal the offset. Return the final nextLineBeginOffset for the embedded character only if that offset is greater or equal to the start offset, and the line end offset is greater than the start offset.

In all other cases, find the real line end, as introduced in bug 1525631, but adjust for the fact that the new tmpOffset might point to the very last position in the HyperText, represented by characterCount. For that case, there is no need to go into the loop, since we know that pressing Home from this position brought us the current calamity in the first place.

Assignee: nobody → mzehe
Status: NEW → ASSIGNED

(In reply to Marco Zehe (:MarcoZ) from comment #15)

Jamie, am I right in assuming that the word offsets for the original test case are (0, 1, kEmbedChar) for offset 0, and (1, 2, "b") for offset 1?

It's debatable. It could either be that, or alternatively, both offsets 0 and 1 could return (0, 2, kEmbedChar + "b"). I lean towards the latter because if you position the cursor on the "b" and press shift+control+rightArrow then delete, it removes the empty embed. So offset 1 is already correct, but offset 0 is wrong.

This test case I just came up with is even more fun:

data:text/html,<div contenteditable>w1 w2 <a name="foo"></a>w3

With this, not only does the embed return broken offsets, but the entire word before it (w2) breaks as well.

Flags: needinfo?(jteh)
Attachment #9183936 - Attachment description: Bug 1149415 Part 1: Return correct line boundaries for embedded characters where going to the next line fails, r?jamie! → Bug 1149415: Return correct end offsets for embedded characters where going to the next line fails, r?jamie!
Attachment #9183936 - Attachment description: Bug 1149415: Return correct end offsets for embedded characters where going to the next line fails, r?jamie! → Bug 1149415: Return correct offsets for lines or words if they contain embedded characters, r?jamie!

Unassigning myself from this bug to not block others from picking it up. Not expecting to work on this anytime soon. Feel free to use and finish the patch and make it your own.

Assignee: marco.zehe → nobody
Status: ASSIGNED → NEW
Assignee: nobody → marco.zehe
Status: NEW → ASSIGNED

Fixed by TextLeafPoint, bug 1729407 (currently behind the CTW pref).

Depends on: 1782873
Severity: normal → S3

This is resolved by Cache the World, which is enabled by default in Firefox 113.

Status: ASSIGNED → RESOLVED
Closed: 11 months ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: