Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lemma: peoples vs people #53

Open
AngledLuffa opened this issue Jun 15, 2022 · 8 comments
Open

Lemma: peoples vs people #53

AngledLuffa opened this issue Jun 15, 2022 · 8 comments

Comments

@AngledLuffa
Copy link

In the following context, peoples becomes people:

# sent_id = GUM_speech_albania-2
6       the     the     DET     DT      Definite=Def|PronType=Art       7       det     7:det   Entity=(6-person-new-cf1-2-coref
7       peoples people  NOUN    NNS     Number=Plur     5       obj     5:obj|12:nsubj:xsubj|14:nsubj:xsubj     _
8       of      of      ADP     IN      _       10      case    10:case _
9       the     the     DET     DT      Definite=Def|PronType=Art       10      det     10:det  Entity=(7-place-new-cf7-2-coref
10      world   world   NOUN    NN      Number=Sing     7       nmod    7:nmod:of       Entity=7)6)

This is pretty similar to "people of the ..."

# sent_id = GUM_voyage_chatham-11
1       The     the     DET     DT      Definite=Def|PronType=Art       3       det     3:det   Discourse=context-background:18->21:1|Entity=(24-person-new-cf1\
-3-coref-Moriori
2       original        original        ADJ     JJ      Degree=Pos      3       amod    3:amod  _
3       people  person  NOUN    NNS     Number=Plur     9       nsubj   9:nsubj _
4       of      of      ADP     IN      _       6       case    6:case  _
5       the     the     DET     DT      Definite=Def|PronType=Art       6       det     6:det   Entity=(1-place-giv:act-cf2*-2-coref-Chatham_Islands
6       islands island  NOUN    NNS     Number=Plur     3       nmod    3:nmod:of       Entity=1)24)

In the second case, it's a single group made up of multiple persons, and in the first case, it's multiple groups made of multiple persons. I think either the first case should have a lemma of "person" as well, or the second case should have a lemma of "people". It doesn't quite feel consistent otherwise.

@nschneid
Copy link

"People" has an ethnic or national group sense as well as a 'persons' sense. I think "the original people of the islands" is ambiguous—it could refer to the individuals (persons) who originally inhabited the island, in which case it is plural, or it could be referring to a group, in which case it is singular. Does verb agreement resolve this?

@AngledLuffa
Copy link
Author

Ah, good point, in this case it is clearly a plural noun based on the verb in the sentence.

One issue that arises in EWT is that "people" always has the lemma "people", even in the case of multiple persons.

@nschneid
Copy link

nschneid commented Jun 15, 2022

This was always an issue with WordNet-based lemmatizers that didn't have morphological subtypes of nouns. But we have number information so I don't see why we couldn't lemmatize people/NNS to person.

@AngledLuffa
Copy link
Author

So, update EWT (and CoreNLP)?

@AngledLuffa
Copy link
Author

alright, I submitted another PR for EWT which changes most of the people to person

@nschneid
Copy link

So CGEL (p. 345) says there are two senses of "people", one of which is plural-only and one of which is singular, pluralized as "peoples":

Semantically, I feel like "the American people" is closer to the second sense than to a plural of "person", because it is talking about Americans as a national body, but I suppose plural agreement ("the American people were...") indicates it should be interpreted as the first. But note that CGEL is not claiming that the first sense of "people" is a plural of "person": they say "person being an ordinary noun with both singular and plural forms. Persons is then in competition with people1 [which is more common]".

So I guess the CGEL point of view is that "people" should never be lemmatized to "person". But in practice, "people" is most often used in place of "persons". Will users of our corpus thus expect "person" as the lemma? And if so, what is the right criterion for cases like "the American people"?

@nschneid
Copy link

I think we have a good argument from https://twitter.com/complingy/status/1550730255433928704 regarding whether "the American people" is more like "those American people" or "this American people": "the American and German people" would most likely not refer to "a people" (an established social unit) but rather to an amalgamation of Americans and Germans. So this is the plural-only "people", not the singular, and by analogy "the American people" should not be considered singular "people", even though the members of a nationality are being referred to generically and in a way that makes it hard to substitute a transparent plural like "citizens". (Maybe this is a formula/construction: "the DemonymAdj people" used in political oratory.)

@AngledLuffa
Copy link
Author

How does this argument affect the "people" PR I filed? For example...

# sent_id = weblog-blogspot.com_dakbangla_20041028153019_ENG_20041028_153019-0009
14      the     the     DET     DT      Definite=Def|PronType=Art       15      det     15:det  _
15      people  person  NOUN    NNS     Number=Plur     10      nmod    10:nmod:for     _
16      of      of      ADP     IN      _       17      case    17:case _
17      Pakistan        Pakistan        PROPN   NNP     Number=Sing     15      nmod    15:nmod:of      SpaceAfter=No
# sent_id = weblog-blogspot.com_rigorousintuition_20050518101500_ENG_20050518_101500-0027
9       and     and     CCONJ   CC      _       14      cc      14:cc   _
10      the     the     DET     DT      Definite=Def|PronType=Art       12      det     12:det  _
11      Venezuelan      Venezuelan      ADJ     JJ      Degree=Pos      12      amod    12:amod _
12      people  person  NOUN    NNS     Number=Plur     14      nsubj   14:nsubj        _
13      will    will    AUX     MD      VerbForm=Fin    14      aux     14:aux  _
14      ensure  ensure  VERB    VB      VerbForm=Inf    7       conj    7:conj:and      _
# sent_id = weblog-blogspot.com_rigorousintuition_20060511134300_ENG_20060511_134300-0067
1       "       "       PUNCT   ``      _       26      punct   26:punct        SpaceAfter=No
2       The     the     DET     DT      Definite=Def|PronType=Art       4       det     4:det   _
3       black   black   ADJ     JJ      Degree=Pos      4       amod    4:amod  _
4       race    race    NOUN    NN      Number=Sing     7       nsubj   7:nsubj _
5       is      be      AUX     VBZ     Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin   7       cop     7:cop   _
6       the     the     DET     DT      Definite=Def|PronType=Art       7       det     7:det   _
7       people  person  NOUN    NNS     Number=Plur     26      ccomp   15:obl|26:ccomp _
8       through through ADP     IN      _       9       case    9:case  _
...

That last one, btw, yikes... sometimes people wonder how deep learning models wind up racist.

A two for one:

# sent_id = weblog-blogspot.com_rigorousintuition_20060511134300_ENG_20060511_134300-0294
32      it      it      PRON    PRP     Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs  34      expl    34:expl _
33      does    do      AUX     VBZ     Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin   34      aux     34:aux  _
34      bother  bother  VERB    VB      VerbForm=Inf    0       root    0:root  _
35      me      I       PRON    PRP     Case=Acc|Number=Sing|Person=1|PronType=Prs      34      obj     34:obj  _
36      when    when    SCONJ   WRB     PronType=Int    38      mark    38:mark _
37      people  person  NOUN    NNS     Number=Plur     38      nsubj   38:nsubj        _
38      single  single  VERB    VBP     Mood=Ind|Number=Plur|Person=3|Tense=Pres|VerbForm=Fin   34      csubj   34:csubj        _
39      out     out     ADP     RP      _       38      compound:prt    38:compound:prt _
40      a       a       DET     DT      Definite=Ind|PronType=Art       42      det     42:det  _
41      specific        specific        ADJ     JJ      Degree=Pos      42      amod    42:amod _
42      group   group   NOUN    NN      Number=Sing     38      obj     38:obj  _
43      of      of      ADP     IN      _       44      case    44:case _
44      people  person  NOUN    NNS     Number=Plur     42      nmod    42:nmod:of      _
45      to      to      PART    TO      _       46      mark    46:mark _
46      pin     pin     VERB    VB      VerbForm=Inf    42      acl     42:acl:to       _
47      the     the     DET     DT      Definite=Def|PronType=Art       48      det     48:det  _
48      blame   blame   NOUN    NN      Number=Sing     46      obj     46:obj  _
49      on      on      ADP     IN      _       46      obl     46:obl  SpaceAfter=No
# sent_id = weblog-blogspot.com_alaindewitt_20060924104100_ENG_20060924_104100-0020
12      butcher butcher VERB    VB      VerbForm=Inf    5       conj    5:conj:and      _
13      his     he      PRON    PRP$    Gender=Masc|Number=Sing|Person=3|Poss=Yes|PronType=Prs  15      nmod:poss       15:nmod:poss    _
14      own     own     ADJ     JJ      Degree=Pos      15      amod    15:amod _
15      people  person  NOUN    NNS     Number=Plur     12      obj     12:obj  _

Maybe each of these examples stay with the lemma "person"?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants