bilstm_tokenizer is used for tokenizing text file or input sentences. tokenize_file_bilstm is for tokenizing text file and tokenize_sentences_bilstm is for tokenizing input sentences.
git clone https://github.com/nakanyseth-vuth/git
cd segmentation
pip install -r requirements.txt
Import the funtions to your code:
from bilstm_tokenizer import tokenize_sentences_bilstm, tokenize_file_bilstm
input_sents = ["ខ្ញុំទៅសាលា", "សាលារៀនខ្ញុំនៅព្រែកលាប។"]
res = tokenize_sentences_bilstm(input_sents)
print(res)
Replace the below code from:
seq,pos = decode(pred_sent, lines[i])
result = [s for s in seq ]
to this:
seq,pos = decode(pred_sent, lines[i])
result = [s+"/"+p for s,p in zip(seq,pos) ]
@Created by Nakanyseth VUTH