Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Has the work of adding BERT done? #14

Open
SparkJiao opened this issue Jan 10, 2019 · 4 comments
Open

Has the work of adding BERT done? #14

SparkJiao opened this issue Jan 10, 2019 · 4 comments

Comments

@SparkJiao
Copy link

Hi,
do you have finished the word of adding bert? Could you please share the results?
Thank you very much!

@matthew-z
Copy link
Owner

matthew-z commented Jan 10, 2019

I have added BERT, but it is very different the original one:

  1. I only used the BERT embedding of the last word piece of each word as I have not converted the labels into word-piece based.

  2. Question and context are not encoded together by BERT in my implementation (the BERT paper concat them as input). Currently, they are encoded separately.

As the result, this BERT version only achieved about 78 F1. Due to the hardware limitation, I only trained it with Mixed Precision (I am not sure if it is also a reason)

@SparkJiao
Copy link
Author

Sorry to say, I have got a similar worse performance on biadf++ using allennlp.
I think there are two problems here.
The first is that we need to calculate an average hidden state of the word pieces for one word instead of the last piece. This method comes from here: https://arxiv.org/abs/1812.03593. This may be easy to modified but recently I don't have much time.
The second thing is that because of limitation of max sequence length, a sentence may be split as many segments, and for details: google-research/bert#66 (comment). But I think this may be difficult to implement using Allennlp.
And I have also used bert embedding like https://github.com/huggingface/pytorch-pretrained-BERT/blob/master/examples/extract_features.py, and outperformed than elmo in another reading comprehension model. But due to some difficulty, I didn't used slide window like mentioned in the google-research.
Hope these will help you!
If I have completed my model, I will come back to do something ~

@matthew-z
Copy link
Owner

Thank you for sharing your experience! It is very helpful.

I will also try to improve it and will let you know when I have any good news.

@jind11
Copy link

jind11 commented Apr 14, 2020

Any updates from you on the BERT related experiments? Thank you for sharing your experience!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants