Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

potential bug report #19

Open
alephpi opened this issue May 21, 2023 · 4 comments
Open

potential bug report #19

alephpi opened this issue May 21, 2023 · 4 comments

Comments

@alephpi
Copy link

alephpi commented May 21, 2023

image
Hi, I'm just curious that the first aurai exists in French?

@chrplr
Copy link
Owner

chrplr commented May 21, 2023 via email

@alephpi alephpi changed the title aurai potential bug report May 21, 2023
@alephpi
Copy link
Author

alephpi commented May 21, 2023

I'll keep reporting potential bugs I find in this issue. since I'm doing some data processing for my project, it's just a side task.

invari = re.compile('ADV|CON|PRE')
df_invari = df.loc[df.cgram.str.contains(invari)]
df_invari[df_invari['ortho'] != df_invari['lemme']]

gives me

ortho phon lemme cgram genre nombre freqlemlivres freqlivres infover
aujourd'hui oZuRd8i aujourd'huie ADV     0.14 0.14  
bons-cadeaux b§kado bon-cadeaux ADV     0.00 0.00  
c'est-à-dire sEtadiR c'est-à-diree ADV     0.07 0.07  
d'emblée d@ble d'embléee ADV     0.07 0.07  
n n ne ADV     13841.89 5.68  
n' n ne ADV     13841.89 6084.12  
re R2 r ADV     7.50 7.50  
y i yu ADV     0.27 0.27

The lemma seems not correct. (I suppose invariant words' lemma are themselves)

@alephpi
Copy link
Author

alephpi commented May 21, 2023

ortho phon lemme cgram genre nombre freqlemlivres freqlivres infover
e 2 2e ADJ     0.00 0.00  
e 2 58e ADJ     0.00 0.00  
e 2 7e ADJ     0.07 0.07

@alephpi
Copy link
Author

alephpi commented May 21, 2023

bug.csv
Here is a table of words whose lemma's cgram is not the same as its own. (I think the lemma should be a closed operation right?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants