Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poor F1 results of FinBERT for NER #9

Open
jakelin212 opened this issue Jan 19, 2024 · 4 comments
Open

Poor F1 results of FinBERT for NER #9

jakelin212 opened this issue Jan 19, 2024 · 4 comments

Comments

@jakelin212
Copy link

Hi, thanks for releasing FinBERT, we are using FinBERT (case) for NER on some unstructured Finnish medical records and have noticed some poor (F1 < 0.50) results on negative sentiment label entity, for example 'not lonely' containing texts that includes 'ei ' while the 'lonely' labels would have good F1 (~0.80) and was wondering if you have any experience or advice. It is not unbalanced since we tried doing labeling with only negative entity.

@jouniluoma
Copy link

Hi, I have used FinBERT for NER with good results. Without knowing more about your dataset, it is really hard to give any advice.

@jakelin212
Copy link
Author

Hi, thank you for the response and my apologies for the inaccurate issue title (I wish I can change it), you are right that for the most part, FinBERT produces very good NER results. In some cases when the negative category entries are very close to the positive, it seems like that BERT is not doing well, i.e. I have Lonely and NotLonely, yksinaisyys = lonely, but yksinaisyys ei ole ongelmia is then a not lonely, just like ei koe yksinaisyyttä and on these entries, the NotLonely is performing badly, recall and precision both ~0.50 while the Lonely is ~0.75-0.80; yes Lonely entries are much more common, but I tried a project with only NotLonely (negative) entries and it had the same effect. I think it could be the tokenisation, where the non-0 values are assigned to individual words, and the usage of 'strict' makes it worse compared to 'partial' on compute metrics. I think we will only use BERT for positive NER and then apply regular expression to assign negative categories.

@jouniluoma
Copy link

jouniluoma commented Jan 23, 2024

Named entities are usually nouns or noun phrases (something that has a name) or something that can be handled in a similar fashion. I have not really tested NER for adjectives and therefore was asking about dataset. Perhaps there is another way to solve your problem with FinBERT than NER? Is there some evidence of this kind of approach working e.g. in other languages?

@jakelin212
Copy link
Author

Thanks for your feedback, I have read that BERT does not work well with negation in English too. Feel free to close the ticket.
Best!

https://aclanthology.org/2023.blackboxnlp-1.23.pdf

Allyson Ettinger. 2020. What bert is not: Lessons from
a new suite of psycholinguistic diagnostics for language models. Transactions of the Association for
Computational Linguistics, 8:34–48

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants