-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposed wordcheck API #1371
Proposed wordcheck API #1371
Conversation
$this->assertEquals($expected, $response); | ||
} | ||
/* | ||
public function test_wordcheck_site_bad_word() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works on the test server but I don't know how to make it work in the Github ci. Perhaps not worth the effort.
pinc/ProofProject.inc
Outdated
$this->append_good($wordIndex); | ||
if (in_array($word, $badWords)) { | ||
$this->append_word_to_array($word, $badWordHash[$word]); | ||
} elseif (in_array($word, $accepted_words)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need to mark words the user has suggested?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a good way to test this PR via the API, or is the main testing just to make sure that in the proofreading interface and/or PM's WC interface, everything works as it should?
It would be possible to test it using curl. It could be possible to test it from inside the Swagger editor although the server may not be set up so that this will work from my sandbox. I tested it by using it with a basic proofreading interface connected to it. |
Sandbox updated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a couple of things that give me pause about this implementation:
- We're duplicating a lot of logic -- some of it literal copy/paste -- from
spellcheck_text.inc
without replacing the old one. Now we have two places to fix bugs and keep in sync. I'd rather we have a new class (like you're building) that does the logic the API needs and update the current functionality to include it. - The current WordChecker class merges WordCheck (which is a word-centric feature) with highlighting (punctuation and scripts) so we're cramming the data and the UI layers together somehow. The current code does this too when creating the HTML, but the API is a data layer. I need to think about this some more because I see why you're doing this but it feels wrong.
Rather than address the above yet, I suggest let's start by having you address my comment about documenting the returned structure and how you see the UI / API caller using it to make sure we're addressing the key use-case.
pinc/ProofProject.inc
Outdated
$this->puncArray = make_grapheme_array($puncCharacters); | ||
} | ||
|
||
public function analyse($project, $page_text, $languages, $accepted_words): array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need some documentation here for what the array returned from this function looks like. API callers need it, and frankly I the PR reviewer needs it. Please add a comment here with a very small example for reviewers and future devs and we can figure out how to best get that available to the API docs.
Thanks Casey. |
so it can be used in spellcheck_text
Sandbox updated |
* an array of languages to use for dictionaries | ||
* @param array $accepted_words | ||
* the words the user wishes to be accepted as good | ||
* TODO consider how to handle characters not enabled for the project |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we remove validate_text() from api_v1_project_wordcheck() then they will be highlit as uncommon scripts. We could adjust the way highlighting works to reserve a code/colour for them. An exception would still be thrown on saving the page.
I've put my finger on what's bugging me about this PR -- we're munging the data and presentation layers together in the API. The API should be a data layer first and foremost. The primary WordCheck API should return the list of words and what set they belong in (bad world, site, project, page). A PM or PP (or UI) could query this endpoint with some text and get back the list of words to do something with them. Similarly, we could have an endpoint that returns words in uncommon scripts for a similar use by PM, PP, or UI. eg: $router->add_route("PUT", "v1/projects/:projectid/wordcheck", "api_v1_project_wordcheck");
$router->add_route("PUT", "v1/projects/:projectid/uncommon_scripts", "api_v1_project_uncommon_scripts"); That doesn't mean we can't or shouldn't have server-side APIs that facilitate the presentation layer like the code in this PR does, but we should think about why we're not just putting them in the UI code to begin with and keep all of the presentation bits together. Just because we're doing things server-side now doesn't mean we should be doing it server-side in the future. I'd like to see the first WordCheck API be useful for the general case and decide if we want this data+presentation logic also as an API like you have here. |
I see what you are saying. |
I don't know that we need to pass around the word pattern. If we send the text blob to the API (like you have now) and the API returns a list of words, the caller should be able to treat those as opaque strings, search for them in the text it sent, and do something with them (highlight them, etc). I think in that case the caller doesn't need to know anything about "words" at all (not that it couldn't, just that I don't see that as a prereq). |
I don't think you can do it quite like that because a bad word can be a substring of a good word. |
Yup, good point. So the server and the UI need to agree on what a "word" is. It's probably sufficient that we define it / document it and not try to have the client request it from the server. |
I've tried using a similar regular expression in the client to find words. One thing I came across is that in unicode mode \w works differntly in php and javascript. In php it matches the ṭ character (in projectID5690905894029) but not in js. Changing js to use \p{L} matches it. Perhaps we should change the php expression in the same way so they are more likely to agree in all cases. |
Changing to returning an array of bad words. |
There are two endpoints:
Sandbox at: https://www.pgdp.org/~rp31/c.branch/api_wc1