Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

general MWE POS tag missing #78

Open
Tracked by #385
livyreal opened this issue Nov 1, 2016 · 7 comments
Open
Tracked by #385

general MWE POS tag missing #78

livyreal opened this issue Nov 1, 2016 · 7 comments

Comments

@livyreal
Copy link

livyreal commented Nov 1, 2016

each MWE must have a general MWE POS tag:

Example from Bosque 7.3 made by Dan:


# orig_file_sentence 087#39
1	Tel.	tel.	NOUN	n|M|S	Gender=Masc|Number=Sing	0	root	_	_
2	(011)	\(011\)	NUM	num|M|S	Gender=Masc|Number=Sing	1	nummod	_	**MWE=(011)_253-1588|****MWEPOS=NUM**
3	253-1588	253-1588	NUM	NUM	NumForm=Digit|NumType=Card	2	compound	_	_
4	.	.	PUNCT	punc	_	1	punct	_	_

All of general POS tag are missing on our version.

 id="2730" ref="CF654-1" source="CETENFolha n=654 cad=Caderno Especial sec=nd sem=94a" forest="1" text="«Eles são a razão de tudo o que faço, de tudo aquilo em que acredito», disse, com grandes espaços entre as palavras."

7	tudo	tudo	PRON	<rel>|<quant>|INDP|M|S|@ACC>	PronType=Rel|Gender=Masc|Number=Sing	10	dobj	_	**MWE:tudo=o=que**
8	o	o	DET	<dem>|DET|M|S|@>N	PronType=Dem|Gender=Masc|Number=Sing	9	mwe	_	_
9	que	que	PRON	<dem>|INDP|M|S|@N<	PronType=Dem|Gender=Masc|Number=Sing	7	mwe	_	_

We should have something as: MWE:tudo=o=que MWEPOS=PRON or DET.

@claudiafreitas what do you think it should be the POS of "tudo_o_que"?

@arademaker
Copy link
Collaborator

related with #72

@claudiafreitas
Copy link

claudiafreitas commented Nov 2, 2016

@livyreal , originally, it is PRON.
The ideal would be to copy the original mwe POS in the pre-mwe-splitting version

12  mudar   mudar   VERB    V_INF_@ICL-<SUBJ    VerbForm=Inf    10  csubj       
13  tudo_o_que  tudo_o_que  PRON    INDP_M_S_@SUBJ> PronType=Indp|PronType=Rel|Number=Sing|Gender=Masc  15  nsubj   
14  está   estar   VERB    V_PR_3S_IND_@FS-<ACC    Mood=Ind|Tense=Pres|Person=3|Number=Sing    

However, there are other "tudo o que" which are not mwe... I'm not sure if they are really different cases

<s id="106" ref="CF27-5" source="CETENFolha n=27 cad=Mais! sec=nd sem=94a" forest="1" text="E, assim, **tudo o que** os afro-americanos faziam bem teve de....
1   E   e   CONJ    KC_@CO  _   15  cc      
2   ,   ,   PUNCT   PU_@PU  _   3   punct       
3   assim   assim   ADV ADV_@ADVL>  _   15  advmod      
4   ,   ,   PUNCT   PU_@PU  _   3   punct       
5   tudo    tudo    DET DET_M_S_@>N _   6   det     
6   o   o   PRON    DET_M_S_@SUBJ>  PronType=Art|Number=Sing|Gender=Masc    15  nsubj       
7   que que PRON    INDP_M_S_@ACC>  PronType=Indp|PronType=Rel|Number=Sing|Gender=Masc  10  dobj        
8   os  o   DET ART_M_P_@>N PronType=Art|Number=Plur|Gender=Masc    9   det     
9   afro-americanos afro-americano  NOUN    N_M_P_@SUBJ>    Number=Plur|Gender=Masc         
l       
<s id="809" ref="CF191-3" source="CETENFolha n=191 cad=Brasil sec=pol sem=94b" forest="1" text="A equipe econômica considera que cedeu **tudo o que** podia durante a votação da emenda no primeiro turno.">

6   cedeu   ceder   VERB    V_PS_3S_IND_@FS-<ACC    Mood=Ind|Tense=Pret|Person=3|Number=Sing    4   ccomp       
7   tudo    tudo    PRON    DET_M_S_@<ACC   Number=Sing|Gender=Masc 6   dobj        
8   o   o   DET ART_M_S_@<SUBJ  PronType=Art|Number=Sing|Gender=Masc    6   nsubj       
9   que que PRON    INDP_M_P_@SUBJ> PronType=Indp|PronType=Rel|Number=Plur|Gender=Masc  10  nsubj       
10  podia   poder   VERB    V_IMPF_3S_IND_@FS-N<    Mood=Ind|Tense=Imp|Person=3|Number=Sing 8   acl:relcl       

@livyreal
Copy link
Author

livyreal commented Nov 3, 2016

they seem the same to me, or at least, I cannot easily think in criteria to distinguish them. This mwe "tudo o que" is trick... I'd not consider this as mwe, but Bosque treats it in this way and as mwe are so difficult to define I was not judging it. What should we do about this expression "tudo o que"? I think it is compositional and not a mwe.

For now we agree the best way to keep the general lemma of mwe is looking to an ancient version (before the mwe split) and pick from it the mwe general label.

@vcvpaiva
Copy link

vcvpaiva commented Nov 3, 2016

@livyreal agreed that "tudo o que" should not be a mwe. also agree that the best course of action is bring back the original mwe tags, if possible, with their original pos.

@livyreal
Copy link
Author

livyreal commented Nov 24, 2016

we agree that "tudo o que" is not a mwe. it should be split in "DET PRON PRON"

tudo DET
o PRON
que PRON

although other analyses are possible (as in "cedeu tudo aquilo o que podia" -> tudo PRON, since "cedeu tudo" is ok), that "DET PRON PRON" analysis is always possible.

Sobe "tudo o que": Como fazer com as dep rel? ainda não sabemos
como isto vai ser implementado? ainda não sabemos.

podemos fazer na mão ou podemos fazer automaticamente

se automaticamente:

"a equipe considera que cedeu tudo o que podia" -> det(o, tudo); aclrelcl(considera, podia), o liga para o main verb (primeiro elemento da relação acl:relcl), que liga para o verbo da relative clause. qual vai ser a dep rel entre "o" e a head da main; qual vai ser a dep rel entre "que" e a head da relative? nao sabemos, isto deve ser anotado na mao, tipos prováveis: nsubj, nsubjpass, dobj.

próximo passo: fazer alguns casos na mão para verificar se a regra funciona.

@livyreal
Copy link
Author

livyreal commented Nov 24, 2016

we also agree "o que" is not a mwe. what is needed is to delete the label MWE:o=que from all lines and change the POS of "o", it is not DET, but PRON. The depedency relations between "o que" are corrected.

it is:

7 o o DET DET|M|S|@>N Gender=Masc|Number=Sing 8 det _ MWE:o=que
8 que que PRON |INDP|M|S|@acc> Gender=Masc|Number=Sing|PronType=Rel 9 dobj _ _

it should be:

7 o o PRON N Gender=Masc|Number=Sing 8 det _
8 que que PRON |INDP|M|S|@acc> Gender=Masc|Number=Sing|PronType=Rel 9 dobj _ _

new issue in #90

@livyreal
Copy link
Author

general lemmas we decided (by e-mail, with @claudiafreitas and @MCGoes )

"ou seja", "isto é" e "por exemplo" são CONJ, falta saber se é uma conjunção subordinativa ou não.

http://universaldependencies.org/u/pos/CONJ.html
http://universaldependencies.org/u/pos/SCONJ.html

"por=assim dizer / deste=modo / por=outro=lado /sendo=assim" são ADV ou CONJ? (a decidir)

@livyreal livyreal removed the decidido label Nov 29, 2016
@arademaker arademaker mentioned this issue Nov 4, 2021
19 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants