-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
how to run BERT #12
Comments
The demo code from keras-bert should still work if you rename a few dependencies. Delete the period in "from .layers import" in bert.py and try running this: from tensorflow import keras
from bert import get_base_dict, get_model, gen_batch_inputs
# A toy input example
sentence_pairs = [
[['all', 'work', 'and', 'no', 'play'], ['makes', 'jack', 'a', 'dull', 'boy']],
[['from', 'the', 'day', 'forth'], ['my', 'arm', 'changed']],
[['and', 'a', 'voice', 'echoed'], ['power', 'give', 'me', 'more', 'power']],
]
# Build token dictionary
token_dict = get_base_dict() # A dict that contains some special tokens
for pairs in sentence_pairs:
for token in pairs[0] + pairs[1]:
if token not in token_dict:
token_dict[token] = len(token_dict)
token_list = list(token_dict.keys()) # Used for selecting a random word
# Build & train the model
model = get_model(
token_num=len(token_dict),
head_num=5,
transformer_num=12,
embed_dim=25,
feed_forward_dim=100,
seq_len=20,
pos_num=20,
dropout_rate=0.05,
)
model.summary()
def _generator():
while True:
yield gen_batch_inputs(
sentence_pairs,
token_dict,
token_list,
seq_len=20,
mask_rate=0.3,
swap_sentence_rate=1.0,
)
model.fit_generator(
generator=_generator(),
steps_per_epoch=1000,
epochs=100,
validation_data=_generator(),
validation_steps=100,
callbacks=[
keras.callbacks.EarlyStopping(monitor='val_loss', patience=5)
],
)
# Use the trained model
inputs, output_layer = get_model(
token_num=len(token_dict),
head_num=5,
transformer_num=12,
embed_dim=25,
feed_forward_dim=100,
seq_len=20,
pos_num=20,
dropout_rate=0.05,
training=False, # The input layers and output layer will be returned if `training` is `False`
trainable=False, # Whether the model is trainable. The default value is the same with `training`
output_layer_num=4, # The number of layers whose outputs will be concatenated as a single output.
# Only available when `training` is `False`.
) I'm pushing a main.py for lesson29-BERT soon so hopefully that'll make it more convenient to use. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Could you please give some instructions on how to run lesson29-BERT?
I tried to do "pip3 install keras-bert" which was from the link to the other work that you are based on, then run "python3 tokenizer.py" but got: from .bert import TOKEN_CLS, TOKEN_SEP, TOKEN_UNK
ModuleNotFoundError: No module named '__ main__.bert'; '__ main__' is not a package
or if I tried "python3 bert.py" but got:
from .layers import (get_inputs, get_embedding, TokenEmbedding, EmbeddingSimilarity, Masked, Extract)
ModuleNotFoundError: No module named '__ main__.layers'; '__ main__' is not a package
The text was updated successfully, but these errors were encountered: