Closed Bug 919508 Opened 11 years ago Closed 5 years ago

Incorrect results when get_text_at_offset() is used to get the last word on the line in wrapped text

Categories

(Core :: Disability Access APIs, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla67
Tracking Status
firefox67 --- fixed

People

(Reporter: jdiggs, Assigned: samuel.thibault)

References

(Blocks 1 open bug)

Details

Attachments

(6 files, 10 obsolete files)

857 bytes, text/plain
Details
283 bytes, text/html
Details
2.30 KB, patch
Details | Diff | Splinter Review
15.67 KB, patch
samuel.thibault
: review+
Details | Diff | Splinter Review
17.40 KB, patch
samuel.thibault
: review+
Details | Diff | Splinter Review
4.82 KB, patch
samuel.thibault
: review+
Details | Diff | Splinter Review
Attached file Event listener
1. Resize the Firefox window so that the attached test case's text wraps into the following four lines:

  View this text in a Firefox window so that the text wraps.
  Having done so, use get_text_at_offset() to get the word at
  the offset at the beginning of a line. Then repeat the test
  for the offset immediately before that word's start offset.

2. Start the attached event listener in a terminal.

3. Enable caret navigation in Firefox (F7).

4. Position the caret at the start of line 2, line 3, and line 4.

Expected results: The terminal output would show the current word and the previous word.

Actual results: The terminal output shows the current word twice.

Impact: Orca, which has to implement its own caret navigation for Gecko, gets stuck at the beginning of each line of wrapped text when moving backwards word by word.

Notes:

1. I repeated the test in Gedit and Epiphany/WebKitGtk. Both of them work as expected. Output for all three tests provided below.

2. This bug is present in at least versions 23 and 27.0a1 (2013-09-23).

Results from performing the steps described above:

==============================================
Firefox:

Word at offset 59: <Having > Start: 59 End: 66
Word at offset 58: <Having > Start: 59 End: 66

Word at offset 119: <the > Start: 119 End: 123
Word at offset 118: <the > Start: 119 End: 123

Word at offset 179: <for > Start: 179 End: 183
Word at offset 178: <for > Start: 179 End: 183

==============================================
Gedit:

Word at offset 59: <Having > Start: 59 End: 66
Word at offset 58: <wraps. > Start: 52 End: 59

Word at offset 119: <the > Start: 119 End: 123
Word at offset 118: <at > Start: 116 End: 119

Word at offset 179: <for > Start: 179 End: 183
Word at offset 178: <test > Start: 174 End: 179

==============================================
Epiphany:

Word at offset 59: <Having > Start: 59 End: 66
Word at offset 58: <wraps. > Start: 52 End: 59

Word at offset 119: <the > Start: 119 End: 123
Word at offset 118: <at > Start: 116 End: 119

Word at offset 179: <for > Start: 179 End: 183
Word at offset 178: <test > Start: 174 End: 179

==============================================
Attached file test case
must be a dupe of bug 880159
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → DUPLICATE
one thing, Joanie, can you answer bug 880159 comment #10 please? It doesn't seem like your example relies on -2 magic offset.
I'm not convinced this is a duplicate unless I'm misunderstanding something. Bug 880159 covers the case where the caret is at the end of a wrapping line, so the caret seems to be at the start of the word. Joanie is retrieving the word for the offset *before* the start of the word, so it should return the previous word, no questions asked. Strangely, in Windows, this one seems to work as expected.
right, I missed that given offsets are line end and line end - 1 offsets
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
Attached patch patch (obsolete) — Splinter Review

I have dived into this one, here is what happens, taking the case when we have the text wrapped this way:

Having done so, use get_text_at_offset() to get the word at
the offset at the beginning of a line. Then repeat the test

and we request the word at offset of the space just after "at", with the start-of-word boundary:

In nsIFrame::PeekOffset's eSelectWord case, a first call is made to current->PeekOffsetWord (i.e. nsTextFrame::PeekOffsetWord) with the content of the first line. This terminates immediately because we are already at the end of line. nsIFrame::PeekOffset then calls it a second time with the content of the second line, after setting state.SetSawBeforeType(); to account for the line jump equivalent to a whitespace. In nsTextFrame::PeekOffsetWord, aState->mAtStart is still true, and thus it believes it should continue within word "the". This is because state.Update() was in the end never called.

In the attached patch, I make nsIFrame::PeekOffset call state.Update() along state.SetSawBeforeType() to account for the fact that it is like we had processed a whitespace, and we should thus now stop at the next start of word.

It is being tested on https://treeherder.mozilla.org/#/jobs?repo=try&revision=cafad117118bba9e432f3b221fc4490ee1cec266

Attachment #9040105 - Flags: review?(surkov.alexander)

So it unbreaks some cases, but breaks others, notably words which get split because they just can't fit on a single line. In that case we indeed do not want to have such a linebreak to be taken as a whitespace. But do we have a way to know, from nsIFrame::GetFrameFromDirection, to know whether the line jump is from a split word (thus not whitespace), or from an expected line break (thus whitespace)? Will we just always have a frame with IsBrFrame()?

Actually, looking more into details in the "at" word case with the requested offset on the 'a', in nsTextFrame::PeekOffsetWord it looks at 'a', then 'n', then gets out of the loop without looking at the space after "at". when requesting from offset of the space character after "at", the loop exits immediately (like mentioned in previous comment) because of the same reason, and the next call directly starts with the next line, and thus the space was missed. In the case of "from" the space character is looked at. I'd say to have proper behavior the space character should also be looked at, because the linebreak only comes from lack of space, and in the split word case we don't actually want to consider it as a whitespace. So the ClusterIterator in that case shouldn't actually trim the whitespaces when used from nsIFrame::PeekOffset through PeekOffsetWord, perhaps we can just add a parameter to nsPeekOffsetStruct for HyperTextAccessible::FindOffset to specify it?

Attachment #9040105 - Flags: review?(surkov.alexander)
Flags: needinfo?(surkov.alexander)

Jamie, do you have any opinion on my latest comment?

Flags: needinfo?(jteh)

(In reply to Samuel Thibault from comment #7)

Actually, looking more into details in the "at" word case with the requested offset on the 'a', in nsTextFrame::PeekOffsetWord it looks at 'a', then 'n', then gets out of the loop without looking at the space after "at". when requesting from offset of the space character after "at", the loop exits immediately (like mentioned in previous comment) because of the same reason, and the next call directly starts with the next line, and thus the space was missed. In the case of "from" the space character is looked at. I'd say to have proper behavior the space character should also be looked at, because the linebreak only comes from lack of space, and in the split word case we don't actually want to consider it as a whitespace. So the ClusterIterator in that case shouldn't actually trim the whitespaces when used from nsIFrame::PeekOffset through PeekOffsetWord, perhaps we can just add a parameter to nsPeekOffsetStruct for HyperTextAccessible::FindOffset to specify it?

Daniel, how does the suggestion look ok from layout perspective? It seems it has no effect on keyboard navigation but unblocks a11y on text extraction issue.

Flags: needinfo?(surkov.alexander) → needinfo?(dholbert)

This is the first I've seen this PeekOffset code, so I don't really know what its invariants are & can't comment on the correctness. (Looking at the hg blame surrounding this line, it looks pretty old, though it seems jfkthame has reviewed at least one patch that touched it in the past couple years (Bug 1375825) -- so I'm going to punt the needinfo to him in the hopes that he's got a bit more orientation for this code than I do.)

Flags: needinfo?(dholbert) → needinfo?(jfkthame)

Mmm, Jamie, Joanmarie, in the case where a word gets split at end of line because it cannot fit in one line anyway, do we want to make the accessibility layer think that this is a word break, or not? I'd tend to say not since that's just a visual rendering, but the tests in accessible/tests/mochitest/text/test_atcaretoffset.html say yes (but we can change that).

Flags: needinfo?(jdiggs)

(In reply to Samuel Thibault from comment #11)

Mmm, Jamie, Joanmarie, in the case where a word gets split at end of line because it cannot fit in one line anyway, do we want to make the accessibility layer think that this is a word break, or not? I'd tend to say not since that's just a visual rendering, but the tests in accessible/tests/mochitest/text/test_atcaretoffset.html say yes (but we can change that).

I would tend to lean toward yes BECAUSE it's a visual rendering. That said, ask your colleagues who are blind, and perhaps on the Orca list too, what they want and expect. Then, if you make Gecko do that, and Orca just trusts Gecko, everyone will be happy. :)

Flags: needinfo?(jdiggs)
Attached patch patch (obsolete) — Splinter Review

Here is what it looks like, https://treeherder.mozilla.org/#/jobs?repo=try&revision=8c64b47fd864c8ee90aa614ee174642a192b6292 shows no regression, only unrelated intermitent-marked failures.

Attachment #9040105 - Attachment is obsolete: true
Attachment #9043360 - Flags: review?(jfkthame)

We really should have test coverage for this, so that we can verify the behavior (and detect any future regressions). Could you add appropriate tests (that fail currently, and pass with the fix) in a second patch?

Flags: needinfo?(jfkthame)
Attached patch testcase (obsolete) — Splinter Review
Attachment #9043847 - Flags: review?(jfkthame)

The try is positive.

Comment on attachment 9043360 [details] [diff] [review]
patch

Review of attachment 9043360 [details] [diff] [review]:
-----------------------------------------------------------------

This is probably OK, but I'd be a lot happier with it if we could update the GetTrimmedOffsets method as suggested below.

Also, AFAICT this patch will change behavior in cases that don't even involve any line-breaks. If you add a test element to test_wordboundary.html that has extra spaces that will be collapsed by normal white-space processing, e.g. as a variation on id="d6":

    <div id="d6a">  hello   all  </div>

and then set up the corresponding tests, this patch will alter the results for it even though it's a single-line test.

That may be fine -- I'm not sure the current behavior here is particularly intended or necessary -- but please look into this example and confirm whether the changed results are acceptable (and if so, let's add it to the test as well).

::: layout/generic/nsFrameSelection.h
@@ +121,5 @@
>    bool mJumpLines;
>  
> +  // mJumpLinesBreakWord: Whether jumping across line is to be considered as a
> +  // word break
> +  bool mJumpLinesBreakWord;

What does "be considered as a word break" mean here? It's not clear to me from this comment how `mJumpLinesBreakWord == true` would differ from `mJumpLines == false` when using eSelectWord; either one of them sounds like it would make the line-boundary act as a break.

Would it make sense to give this a name more closely linked to the space-trimming behavior that it affects? `bool mTrimSpaces` or something like that? The connection between "jumping across lines" and the flags eventually passed to GetTrimmedOffsets isn't obvious at the moment.

::: layout/generic/nsTextFrame.h
@@ +575,5 @@
>      int32_t GetEnd() const { return mStart + mLength; }
>    };
>    TrimmedOffsets GetTrimmedOffsets(const nsTextFragment* aFrag, bool aTrimAfter,
> +                                   bool aPostReflow = true,
> +                                   bool aTrimBefore = true) const;

This GetTrimmedOffsets API is kinda questionable already, having two trailing bool parameters (one of them with a default); it's awfully easy to get parameters mixed up and inadvertently pass the wrong bools to a method like this. Adding a third bool param, again with a default, makes it even worse. :(

So although it makes for a bit more work, I'd be much happier if we revise this to take strongly-typed flags instead of multiple "interchangeable" bools. Rather than three separate typed flags, I'd be inclined to go with a single TrimmedOffsetFlags enum that can hold the three flags here, with suitably named values; and update all the callsites to clearly spell out which flags they're passing.

(Looks like there are about 15 call sites to be updated, so this is fairly manageable. In fact, what I'd probably suggest would be to start with a patch that converts the existing two flags to a single typed-enum flags param, and update callers accordingly, with no change in behavior; and then a new version of this patch can add its flag to that enum.)

Given that 'true' seems to be the more common value for all these flags, and in particular the new flag wants to default to true, you could choose to define them inversely in the flags enum, something like

    enum class TrimmedOffsetFlags : uint8_t {
      kDefaultTrimFlags = 0,
      kNoPostReflow = 1 << 0,
      kNoTrimAfter = 1 << 1,
      kNoTrimBefore = 1 << 2
    };

so that the new flag for 'before' doesn't have to be explicitly added to all the existing callsites in order to maintain their behavior when it is introduced.
Attached patch patch (obsolete) — Splinter Review

I'd be much happier if we revise this to take strongly-typed flags instead of multiple "interchangeable" bools.

Right, I was wondering about it too. I have made such a rework in the attached patch, successfully tested on https://treeherder.mozilla.org/#/jobs?repo=try&revision=0b86dad68ba187bf56fecd4ec2f1436cd5907a3f&selectedJob=229258587

Attachment #9045159 - Flags: review?(jfkthame)
Attached patch testcase (obsolete) — Splinter Review

AFAICT this patch will change behavior in cases that don't even involve any line-breaks. If you add a test element to test_wordboundary.html that has extra spaces that will be collapsed by normal white-space processing, e.g. as a variation on id="d6":

<div id="d6a">  hello   all  </div>

My patch does not actually change the behavior here. With the current firefox, a trailing space in d6 would already show up in the gettext results. The heading space is eaten both by the current firefox and with my change. And intermediate spaces are collapsed in both cases.

I am currently testing by changes with the attached test changes.

::: layout/generic/nsFrameSelection.h

  • // mJumpLinesBreakWord: Whether jumping across line is to be considered as a
  • // word break
  • bool mJumpLinesBreakWord;

What does "be considered as a word break" mean here?
[...]
Would it make sense to give this a name more closely linked to the space-trimming behavior that it affects? bool mTrimSpaces or something like that?

Mmm, this sounds like a remnant of my previous thinking. Like Joanmarie said, my change lets the linebreak act as a word break, it just doesn't eat spaces. I'll fix that once we have the flag rework in.

Attachment #9043847 - Attachment is obsolete: true
Attachment #9043847 - Flags: review?(jfkthame)
Comment on attachment 9045159 [details] [diff] [review]
patch

Review of attachment 9045159 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks; although it's a bit verbose, I think this will help us keep track of what those flags mean much better than just true/false, especially once you add another flag in the main patch here.

::: layout/generic/nsTextFrame.h
@@ +578,5 @@
> +    kNoPostReflow = 1 << 0,
> +    kNoTrimAfter = 1 << 1
> +  };
> +  TrimmedOffsets GetTrimmedOffsets(const nsTextFragment* aFrag,
> +                                   TrimmedOffsetFlags aFlags) const;

At this point, what do you think about giving aFlags a default value of kDefaultTrimFlags? I count around 6 callsites where we unconditionally pass kDefaultTrimFlags, so making it a default value here would allow those places to be more concise, while having explicit flag names wherever we want non-default behavior.
Attachment #9045164 - Attachment is patch: true
Attachment #9045164 - Attachment mime type: application/octet-stream → text/plain

(In reply to Samuel Thibault from comment #19)

AFAICT this patch will change behavior in cases that don't even involve any line-breaks. If you add a test element to test_wordboundary.html that has extra spaces that will be collapsed by normal white-space processing, e.g. as a variation on id="d6":

<div id="d6a">  hello   all  </div>

My patch does not actually change the behavior here. With the current firefox, a trailing space in d6 would already show up in the gettext results. The heading space is eaten both by the current firefox and with my change. And intermediate spaces are collapsed in both cases.

Are you sure about this? I tried adding this version of test d6 with extra (collapsible) spaces. It passes as written with current code, but if I apply your patch, it starts failing, which indicates that behavior does change. I didn't look into it deeply but I think the handling of the trailing space must have changed.

Attached patch flags-patch (obsolete) — Splinter Review

Here is an updated version. The try server seems to have a hard time finding a host to run a few of the tests, but the result seems positive:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=10ac0f920f34be53f6298ecabe1c563d8082f1bc

Attachment #9045159 - Attachment is obsolete: true
Attachment #9045159 - Flags: review?(jfkthame)
Attachment #9045255 - Flags: review?(jfkthame)
Attachment #9045255 - Attachment is patch: true
Attached patch testcases (obsolete) — Splinter Review

My patch does not actually change the behavior here. With the current firefox, a trailing space in d6 would already show up in the gettext results. The heading space is eaten both by the current firefox and with my change. And intermediate spaces are collapsed in both cases.

Are you sure about this? I tried adding this version of test d6 with extra (collapsible) spaces. It passes as written with current code, but if I apply your patch, it starts failing, which indicates that behavior does change. I didn't look into it deeply but I think the handling of the trailing space must have changed.

Ah, indeed.

So the change is when the passed offset is beyond the end of the last word, i.e. after the space after the last word. My change makes getTextAtOffset return the space in that case, while it was returning the last word previously. Actually it wasn't just the last word, but " all ", which looks very odd for an end-of-word boundary. I'd say the new behavior is more coherent, actually. I have here attached the resulting testcases.

Attachment #9045164 - Attachment is obsolete: true

Yes, this seems reasonable to me, I just wanted to be sure we're aware of what's changing (and add test coverage so that any future changes in this area are more likely to be noticed, too).

Comment on attachment 9045255 [details] [diff] [review]
flags-patch

Review of attachment 9045255 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good, thanks. Re-reading the patch, I did wonder if it would be clearer to rename kNoPostReflow as kNotPostReflow? If you think that makes sense, feel free to rename it and carry over the r+.
Attachment #9045255 - Flags: review?(jfkthame) → review+
Attached patch flags-patch (obsolete) — Splinter Review
Attachment #9045255 - Attachment is obsolete: true
Attachment #9045361 - Flags: review+

Thanks!

Ah, I can't add the checkin-needed keyword myself, because the bug is not assigned to me.

Flags: needinfo?(jfkthame)

That we can fix!

However, while we could safely check in the first patch by itself, I'd actually suggest waiting until the full set is ready (which looks like it shouldn't take much now), and landing all together. It tends to complicate things if we land patches one by one within a single bug, especially if any uplifts or backouts start to get involved.

Assignee: nobody → samuel.thibault
Flags: needinfo?(jfkthame)
Attached patch patch (obsolete) — Splinter Review

Here is the reworked patch on top of the flags-patch

Attachment #9043360 - Attachment is obsolete: true
Attachment #9043360 - Flags: review?(jfkthame)
Attachment #9045912 - Flags: review?(jfkthame)
Attached patch testcases (obsolete) — Splinter Review
Attachment #9045913 - Flags: review?(jfkthame)
Attachment #9045293 - Attachment is obsolete: true
Attachment #9045912 - Flags: review?(jfkthame) → review+
Attachment #9045913 - Flags: review?(jfkthame) → review+
Keywords: checkin-needed

While trying to apply any of the two patches in my console to land in on the integration trees i get this error:

applying patch1
patching file layout/generic/nsTextFrame.cpp
Hunk #1 FAILED at 2960
Hunk #3 FAILED at 7950
2 out of 4 hunks FAILED -- saving rejects to file layout/generic/nsTextFrame.cpp.rej
patching file layout/generic/nsTextFrame.h
Hunk #2 FAILED at 571
1 out of 2 hunks FAILED -- saving rejects to file layout/generic/nsTextFrame.h.rej
patch failed, unable to continue (try -v)
patch failed, rejects left in working directory
errors during apply, please fix and qrefresh patch1

Can you take a look into it?

Flags: needinfo?(samuel.thibault)
Keywords: checkin-needed

Sorry, I forgot to mention that flags-patch (patch0) needs to be applied first, then patch (patch1), then testcases (patch2)

Flags: needinfo?(samuel.thibault)
Keywords: checkin-needed

Hi Samuel. I've tried every combination to land these patches, always get hunks failed because they are not applied in their order. Please take a look and try to fix it and then re-add chekin-needed.

Examples: https://irccloud.mozilla.com/file/uZNuG6OG/image.png
https://irccloud.mozilla.com/file/CKVeQIl0/image.png
https://irccloud.mozilla.com/file/w1a7kEUY/image.png

Also, please update the commit messages for each patch so we know which one is which.

Flags: needinfo?(samuel.thibault)
Keywords: checkin-needed
Attached patch flags-patchSplinter Review
Attachment #9045361 - Attachment is obsolete: true
Attachment #9046039 - Flags: review+
Attached patch patchSplinter Review
Attachment #9046040 - Flags: review+
Attachment #9045912 - Attachment is obsolete: true
Flags: needinfo?(samuel.thibault)
Attached patch testcasesSplinter Review
Attachment #9045913 - Attachment is obsolete: true
Attachment #9046042 - Flags: review+

Ok, here it is respinned:

  • first flags-patch
  • then patch
  • then testcases
Keywords: checkin-needed

Pushed by apavel@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/a2d905ec0a76
nsTextFrame::GetTrimmedOffsets: Rework flag parameters r=jfkthame
https://hg.mozilla.org/integration/mozilla-inbound/rev/6c9191097079
layout: Do not trim spaces when inspected from accessibility layer r=jfkthame
https://hg.mozilla.org/integration/mozilla-inbound/rev/4dee747dc404
layout: add more tests for text inspection from accessibility layer r=jfkthame

Keywords: checkin-needed
Status: REOPENED → RESOLVED
Closed: 11 years ago5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla67
Flags: needinfo?(jteh)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: