Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Research / Analysis] Fine-tune embedding model on tweet dataset #2

Open
AbrahamSanders opened this issue Apr 11, 2020 · 1 comment
Labels
research needs investigation or trial of one or more approaches

Comments

@AbrahamSanders
Copy link
Collaborator

Currently we are using the pre-trained Universal Sentence Encoder (large) from TensorFlow hub.

Open area for investigation:
The model parameters are marked trainable, so it should be possible to fine-tune on our own COVID tweet dataset.

Alternatively, explore fine tuning other models such as BERT on a semantic similarity task as done here: Sentence-BERT

Comparison of base pre-trained vs. fine-tuned Universal Sentence Encoder (USE) can be done quantitatively or qualitatively , see #1. Same goes for comparing USE vs. BERT or any other model.

@AbrahamSanders AbrahamSanders added the research needs investigation or trial of one or more approaches label Apr 12, 2020
@AbrahamSanders
Copy link
Collaborator Author

A good candidate for a pretrained BERT model is covid-twitter-bert

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
research needs investigation or trial of one or more approaches
Projects
None yet
Development

No branches or pull requests

1 participant