BertLang

BertLang is a webapp that contains info about language-specific BERT models.

How to Contribute

This is a collaborative resource to help researchers understand and find the best BERT model for a given dataset, task and language. The numbers here rely on self reported performance (we can give no guarantees for their accuracy. In the future, we hope to independently verify each of the models).

We currently store all the information in a .json file static/data/data_example.json. We are keeping this structure that is easy to parse and to check. Do you want to add a new model or suggest updates? Send us a pull request! Please note that we aim for consistency in the performance metric across tasks (e.g. Sentiment Analysis -> Accuracy).

See the following example for the Italian BERT model, ALBERTO.

 {
     "name": "ALBERTO",
     "language": "Italian",
     "tasks": [
       {
         "source": "http://ceur-ws.org/Vol-2481/paper57.pdf",
         "code": "https://github.com/marcopoli/AlBERTo-it",
         "name": "SA",
         "dataset": {
           "name": "SENTIPOLC 2016",
           "link": "http://www.di.unito.it/~tutreeb/sentipolc-evalita16/data.html",
           "domain": "twitter"
         },
         "measure": "F1 (test)",
         "performance": 72.23,
         "multi_lingual": "nan",
         "multi_difference": "nan"
       },
       {
         "name": "SC",
         "source": "http://ceur-ws.org/Vol-2481/paper57.pdf",
         "code": "https://github.com/marcopoli/AlBERTo-it",
         "dataset": {
           "name": "SENTIPOLC 2016",
           "link": "http://www.di.unito.it/~tutreeb/sentipolc-evalita16/data.html",
           "domain": "twitter"
         },
         "measure": "F1 (test)",
         "performance": 79.06,
         "multi_lingual": "nan",
         "multi_difference": "nan"
       },
       {
         "name": "ID",
         "source": "http://ceur-ws.org/Vol-2481/paper57.pdf",
         "code": "https://github.com/marcopoli/AlBERTo-it",
         "dataset": {
           "name": "SENTIPOLC 2016",
           "link": "http://www.di.unito.it/~tutreeb/sentipolc-evalita16/data.html",
           "domain": "twitter"
         },
         "measure": "F1 (test)",
         "performance": 60.9,
         "multi_lingual": "nan",
         "multi_difference": "nan"
       }
     ]
   }

NLP Task Acronyms

Please refer to this table for using the correct NLP task acronym.

NLP task	Acronym
POS	Part of Speech Tagging
DP	Dependency Parsing
NER	Named Entity Recognition
NLI	Natural Language Inference
PI	Paraphrase Identification
STS	Semantic Textual Similarity
WSD	Word Sense Disambiguation
TC	Text Classification
CP	Constituency Parsing
SA	Sentiment Analysis
SRL	Semantic Role Labeling
STR	Spatio-Temporal Relation
LPR	Linguistic Properties Recognition
OLI	Offensive Language Identification
DP-UAS	Unlabeled Attachment Score
DP-LAS	Labeled Attachment Score
VSD	Verb Sense Disambiguation
NSD	Noun Sense Disambiguation
SC	Subjectivity Classification
ID	Irony Detection
DDD	Die/Dat Disambiguation
MRC	Machine Reading Comprehension
SPM	Sentence Pair Matching
POS (coarse)	Part of Speech Tagging
POS (fine-grained)	Part of Speech Tagging
XPOS	Language-specific POS tagging
Morph	Morphological tagging
LA	Linguistic Acceptability
TER	Textual Entailment Recognition
QA	Question Answering
CI	Commonsense Inference
RC	Reading Comprehension

Contributors

Debora Nozza - Twitter | Personal Website | debora.nozza@unibocconi.it
Federico Bianchi - Twitter | Personal Website | federico.bianchi@unibocconi.it
Dirk Hovy - Twitter | Personal Website | dirk.hovy@unibocconi.it

Copyright and License

Built with Start Bootstrap.

Start Bootstrap is an open source library of free Bootstrap templates and themes. All of the free templates and themes on Start Bootstrap are released under the MIT license, which means you can use them for any purpose, even for commercial projects. Copyright 2013-2019 Blackrock Digital LLC. Code released under the MIT license.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

BertLang

How to Contribute

NLP Task Acronyms

Contributors

Copyright and License

Files

README.md

Latest commit

History

README.md

File metadata and controls

BertLang

How to Contribute

NLP Task Acronyms

Contributors

Copyright and License