-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues with lexicon.randomWord("nns") #409
Comments
Why don't we just do this in the code? |
hmm, yes, that sounds better. |
We should also check which other parts-of-speech cause problems |
Currently, only 'nns' returns zero result, as we remove them completely. |
Are you checking the full list? When I run the code below (in RiPos.java), I get other errors.
|
I believe these can also be removed from PosTagger as they are no longer used (at least from JS):
|
That is because some of the PENN tag is not in rita dictionary at all
|
Isn't n, v, r, a for simplified WordNet tag set? https://rednoise.org/rita/reference/RiTa/RiTa.getPosTags/index.php |
yes, however, i dont think the constants are used anywhere, are they (at least not in js)? |
Yes, they could be removed from PosTagger. |
I think we don't need to support randomWord for the following three penn Tags, they are rare cases, and it doesn't make much sense to support them anyway. For rp - particle I think we can add the tag to a list of words in the dictionary: Ex:
I'm not sure what to do with predeterminer though, any thought on this? |
Yes, I suppose so (though we can still delete the constants). Can you add a low-priority ticket for this to Java/JS? |
Agreed, but we need to catch the error and output a message: 'No words in the lexicon with this POS'. Not that we also need to consider the 1000-word lexicon, where this might happen more often.
Agreed. Lets use these:
This is part of a phrase, so not relevant for a word. Should be treated as the first case above and get a warning (anytime we dont find the POS requested, should be this same warning) |
I think we can also delete all the PosTagger Constants in RitaJS?
|
When I check randomWord, we actually have rp and pdt in the dictionary, but because we are only checking the first posTag from the posTag array of a word, rp and pdt are never identified. Is there any particular reason that we only check the first posTag? If I change the code to check the whole posTagArray instead of the first posTag only, it can pass the test for rp and pdt, however, it will fail this test, which makes sure that the matching tag for randomWord is always the first posTag from the array... |
It doesn't break anything after deleting them, so I think, yes... |
Ok. I will make these changes in java... |
Lets leave as is (we want words that are clearly the pos requested), and just issue the warning for those cases (rp, pdt) |
Note: for getRandomWord() in Java, I throw an Exception with the following message: 'No words with pos='+pos+ ' found' |
Hi! For the record, I just downloaded Rita for Processing for the first time, and the first thing I tried was getRandomWord( "nns" ). Instead of getting zero results, I get "negroes" over and over again. That's, uh, a bit weird? Needless to say, I'm glad I found this note about using pluralize(). |
That is strange. It is quite unlikely that you would get the same result twice, let alone over and over. Can you include your test code? I just downloaded the current RiTa from NPM, and ran a simple test, which gave me the following output (code follows):
var rita = require('rita');
for (var i = 0; i < 20; i++) {
var rs = rita.randomWord( "nns" );
console.log(i+') '+rs);
}
|
Sorry, I should have specified that this is the Java version, running in Processing 3.3.6. Here's sample code:
|
Confirmed and fixed in 1479f46. This will be in the next release (v1.3), scheduled for Jan 5, 2018. |
See also #408, #487
Proposed message:
All plural nouns have been removed from the latest rita dictionary. To random generate plural nouns, please use
RiTa.pluralize(lexicon.randomWord("nn"))
instead.The text was updated successfully, but these errors were encountered: