Skip to content

Commit

Permalink
Fix tokenization (fixes facebookresearch#926) (facebookresearch#929)
Browse files Browse the repository at this point in the history
Summary:
Fixes facebookresearch#926
Pull Request resolved: facebookresearch#929

Differential Revision: D16560281

Pulled By: myleott

fbshipit-source-id: 751051bcdbf25207315bb05f5bee0235d21be627
  • Loading branch information
myleott authored and facebook-github-bot committed Jul 30, 2019
1 parent 138dc8e commit c132b9b
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions fairseq/models/roberta/hub_interface.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,8 +36,8 @@ def device(self):
def encode(self, sentence: str, *addl_sentences) -> torch.LongTensor:
bpe_sentence = '<s> ' + self.bpe.encode(sentence) + ' </s>'
for s in addl_sentences:
bpe_sentence += ' </s> ' + self.bpe.encode(s)
tokens = self.task.source_dictionary.encode_line(bpe_sentence, append_eos=True)
bpe_sentence += ' </s> ' + self.bpe.encode(s) + ' </s>'
tokens = self.task.source_dictionary.encode_line(bpe_sentence, append_eos=False)
return tokens.long()

def extract_features(self, tokens: torch.LongTensor, return_all_hiddens=False) -> torch.Tensor:
Expand Down

0 comments on commit c132b9b

Please sign in to comment.