Word Frequency fails to position cursor #628

okrick · 2024-12-31T22:37:40Z

Ran the Alpha/Num check. Except for 000.jpg, selecting any item on the list automatically positions the cursor at the beginning of the item in the document.

travels.zip

The text was updated successfully, but these errors were encountered:

srjfoo · 2024-12-31T23:20:16Z

@okrick, What was the expected result?

okrick · 2024-12-31T23:49:29Z

I expected that 000.jpg would be selected, highlighted, and the item positioned in the editing window. Instead, I got a search failed warning beep.

srjfoo · 2025-01-01T00:12:28Z

I expected that 000.jpg would be selected, highlighted, and the item positioned in the editing window. Instead, I got a search failed warning beep.

Ah, I see. I had to turn off "whole word only" in the search form, but when I cmd+left-clicked on it in WF, the search window came up and find all found it. I wonder if WF is confused by the prefix; I didn't see anything in the log.

okrick · 2025-01-01T01:08:20Z

Yet, 001.png etc. are selectable. The only one on the list not selectable is 000.jpg.

srjfoo · 2025-01-01T01:48:48Z

Yet, 001.png etc. are selectable. The only one on the list not selectable is 000.jpg.

000.jpg is also the only one that has a prefix that would cause the assumption that it's not a "full word". The "whole word" checkbox appears to be checked by default when a search form is brought up using the cmd/ctrl+left click, presumably to omit partial matches).

I can cmd-left-click on 000.jpg, and the search form is brought up with that filled in. If I leave "whole word" checked, there are no results. If I un-check "whole word", I get two results.

Now, the question is -- should WF be grabbing the whole illustration name, i.e. i_000.jpg? I think if it had, you wouldn't be seeing this issue.

okrick · 2025-01-01T05:57:31Z

Yes. Either the i_ should be included in the WF or the Search & Replace tool should not consider the i_ as part of the word. They should have the same definition.

Because regexes include underscore in their idea of word characters, an occurrence preceded by underscore of a word found by WF would not be found by "whole word" search. So, the word would be listed in WF, but when the user clicked it, the word would not be found. Fixed by changing search's "whole word" to wrap the search string with a more complex regex, rather than `\b...\b` Also changed WF's idea of a word to include embedded underscores (in the same way as embedded periods are included), e.g. "i_001.jpg", but to strip leading or trailing underscores, which signify italics, e.g. "_dog_". Fixes DistributedProofreaders#628

windymilla · 2025-01-01T14:59:21Z

All the above comments from @okrick & @srjfoo are completely fair: there was a problem with word boundaries and underscores.

Because the origins of regexes are computing-based, the characters that count as part of a word include the underscore character. So, when you do a "whole word" search, and it tries to find
"<non-word char><some word chars><non-word char>",
if the character before the "real" word, like dog, is an underscore, e.g. _dog!, it won't be found because the character before dog is an underscore, so (in regex's mind) it's a word character.

However, certainly as far as WF is concerned, we don't want a leading or trailing underscore to be included in a "word", mostly because we use them to mark italics, e.g. _dog_. So the definition of a "word" for WF is essentially words separated by space, with things like punctuation removed.

I have implemented a solution which tackles this issue - see #634 for details.

Because regexes include underscore in their idea of word characters, an occurrence preceded by underscore of a word found by WF would not be found by "whole word" search. So, the word would be listed in WF, but when the user clicked it, the word would not be found. Fixed by changing search's "whole word" to wrap the search string with a more complex regex, rather than `\b...\b` Also changed WF's idea of a word to include embedded underscores (in the same way as embedded periods are included), e.g. "i_001.jpg", but to strip leading or trailing underscores, which signify italics, e.g. "_dog_". Fixes #628

windymilla mentioned this issue Jan 1, 2025

Use consistent definition of word in WF #634

Merged

windymilla added the bug Something isn't working label Jan 1, 2025

windymilla closed this as completed in #634 Jan 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Word Frequency fails to position cursor #628

Word Frequency fails to position cursor #628

okrick commented Dec 31, 2024

srjfoo commented Dec 31, 2024

okrick commented Dec 31, 2024

srjfoo commented Jan 1, 2025

okrick commented Jan 1, 2025

srjfoo commented Jan 1, 2025

okrick commented Jan 1, 2025

windymilla commented Jan 1, 2025 •

edited

Loading

Word Frequency fails to position cursor #628

Word Frequency fails to position cursor #628

Comments

okrick commented Dec 31, 2024

srjfoo commented Dec 31, 2024

okrick commented Dec 31, 2024

srjfoo commented Jan 1, 2025

okrick commented Jan 1, 2025

srjfoo commented Jan 1, 2025

okrick commented Jan 1, 2025

windymilla commented Jan 1, 2025 • edited Loading

windymilla commented Jan 1, 2025 •

edited

Loading