-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lemma: peoples vs people #53
Comments
"People" has an ethnic or national group sense as well as a 'persons' sense. I think "the original people of the islands" is ambiguous—it could refer to the individuals (persons) who originally inhabited the island, in which case it is plural, or it could be referring to a group, in which case it is singular. Does verb agreement resolve this? |
Ah, good point, in this case it is clearly a plural noun based on the verb in the sentence. One issue that arises in EWT is that "people" always has the lemma "people", even in the case of multiple persons. |
This was always an issue with WordNet-based lemmatizers that didn't have morphological subtypes of nouns. But we have number information so I don't see why we couldn't lemmatize people/NNS to person. |
So, update EWT (and CoreNLP)? |
alright, I submitted another PR for EWT which changes most of the people to person |
So CGEL (p. 345) says there are two senses of "people", one of which is plural-only and one of which is singular, pluralized as "peoples": Semantically, I feel like "the American people" is closer to the second sense than to a plural of "person", because it is talking about Americans as a national body, but I suppose plural agreement ("the American people were...") indicates it should be interpreted as the first. But note that CGEL is not claiming that the first sense of "people" is a plural of "person": they say "person being an ordinary noun with both singular and plural forms. Persons is then in competition with people1 [which is more common]". So I guess the CGEL point of view is that "people" should never be lemmatized to "person". But in practice, "people" is most often used in place of "persons". Will users of our corpus thus expect "person" as the lemma? And if so, what is the right criterion for cases like "the American people"? |
I think we have a good argument from https://twitter.com/complingy/status/1550730255433928704 regarding whether "the American people" is more like "those American people" or "this American people": "the American and German people" would most likely not refer to "a people" (an established social unit) but rather to an amalgamation of Americans and Germans. So this is the plural-only "people", not the singular, and by analogy "the American people" should not be considered singular "people", even though the members of a nationality are being referred to generically and in a way that makes it hard to substitute a transparent plural like "citizens". (Maybe this is a formula/construction: "the DemonymAdj people" used in political oratory.) |
How does this argument affect the "people" PR I filed? For example...
That last one, btw, yikes... sometimes people wonder how deep learning models wind up racist. A two for one:
Maybe each of these examples stay with the lemma "person"? |
In the following context,
peoples
becomespeople
:This is pretty similar to "people of the ..."
In the second case, it's a single group made up of multiple persons, and in the first case, it's multiple groups made of multiple persons. I think either the first case should have a lemma of "person" as well, or the second case should have a lemma of "people". It doesn't quite feel consistent otherwise.
The text was updated successfully, but these errors were encountered: