Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Word Frequency fails to position cursor #628

Closed
okrick opened this issue Dec 31, 2024 · 7 comments · Fixed by #634
Closed

Word Frequency fails to position cursor #628

okrick opened this issue Dec 31, 2024 · 7 comments · Fixed by #634
Labels
bug Something isn't working

Comments

@okrick
Copy link

okrick commented Dec 31, 2024

Capture

Ran the Alpha/Num check. Except for 000.jpg, selecting any item on the list automatically positions the cursor at the beginning of the item in the document.

travels.zip

@srjfoo
Copy link
Member

srjfoo commented Dec 31, 2024

@okrick, What was the expected result?

@okrick
Copy link
Author

okrick commented Dec 31, 2024

I expected that 000.jpg would be selected, highlighted, and the item positioned in the editing window. Instead, I got a search failed warning beep.

@srjfoo
Copy link
Member

srjfoo commented Jan 1, 2025

I expected that 000.jpg would be selected, highlighted, and the item positioned in the editing window. Instead, I got a search failed warning beep.

Ah, I see. I had to turn off "whole word only" in the search form, but when I cmd+left-clicked on it in WF, the search window came up and find all found it. I wonder if WF is confused by the prefix; I didn't see anything in the log.

@okrick
Copy link
Author

okrick commented Jan 1, 2025

Yet, 001.png etc. are selectable. The only one on the list not selectable is 000.jpg.

@srjfoo
Copy link
Member

srjfoo commented Jan 1, 2025

Yet, 001.png etc. are selectable. The only one on the list not selectable is 000.jpg.

000.jpg is also the only one that has a prefix that would cause the assumption that it's not a "full word". The "whole word" checkbox appears to be checked by default when a search form is brought up using the cmd/ctrl+left click, presumably to omit partial matches).

I can cmd-left-click on 000.jpg, and the search form is brought up with that filled in. If I leave "whole word" checked, there are no results. If I un-check "whole word", I get two results.

Now, the question is -- should WF be grabbing the whole illustration name, i.e. i_000.jpg? I think if it had, you wouldn't be seeing this issue.

@okrick
Copy link
Author

okrick commented Jan 1, 2025

Yes. Either the i_ should be included in the WF or the Search & Replace tool should not consider the i_ as part of the word. They should have the same definition.

windymilla added a commit to windymilla/guiguts-py that referenced this issue Jan 1, 2025
Because regexes include underscore in their idea of word
characters, an occurrence preceded by underscore of a
word found by WF would not be found by "whole word"
search. So, the word would be listed in WF, but when the
user clicked it, the word would not be found.

Fixed by changing search's "whole word" to wrap the
search string with a more complex regex, rather than
`\b...\b`

Also changed WF's idea of a word to include embedded
underscores (in the same way as embedded periods are
included), e.g. "i_001.jpg", but to strip leading or trailing
underscores, which signify italics, e.g. "_dog_".

Fixes DistributedProofreaders#628
@windymilla
Copy link
Collaborator

windymilla commented Jan 1, 2025

All the above comments from @okrick & @srjfoo are completely fair: there was a problem with word boundaries and underscores.

Because the origins of regexes are computing-based, the characters that count as part of a word include the underscore character. So, when you do a "whole word" search, and it tries to find
"<non-word char><some word chars><non-word char>",
if the character before the "real" word, like dog, is an underscore, e.g. _dog!, it won't be found because the character before dog is an underscore, so (in regex's mind) it's a word character.

However, certainly as far as WF is concerned, we don't want a leading or trailing underscore to be included in a "word", mostly because we use them to mark italics, e.g. _dog_. So the definition of a "word" for WF is essentially words separated by space, with things like punctuation removed.

I have implemented a solution which tackles this issue - see #634 for details.

@windymilla windymilla added the bug Something isn't working label Jan 1, 2025
windymilla added a commit that referenced this issue Jan 2, 2025
Because regexes include underscore in their idea of word
characters, an occurrence preceded by underscore of a
word found by WF would not be found by "whole word"
search. So, the word would be listed in WF, but when the
user clicked it, the word would not be found.

Fixed by changing search's "whole word" to wrap the
search string with a more complex regex, rather than
`\b...\b`

Also changed WF's idea of a word to include embedded
underscores (in the same way as embedded periods are
included), e.g. "i_001.jpg", but to strip leading or trailing
underscores, which signify italics, e.g. "_dog_".

Fixes #628
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants