Skip to content

Latest commit

 

History

History
45 lines (30 loc) · 1 KB

README.md

File metadata and controls

45 lines (30 loc) · 1 KB

KNLP | BI-LSTM Word Segmentation and POS Tagging tool

bilstm_tokenizer is used for tokenizing text file or input sentences. tokenize_file_bilstm is for tokenizing text file and tokenize_sentences_bilstm is for tokenizing input sentences.


Installation

git clone https://github.com/nakanyseth-vuth/git
cd segmentation
pip install -r requirements.txt

Usage 😮 🔑

Import the funtions to your code:

from bilstm_tokenizer import tokenize_sentences_bilstm, tokenize_file_bilstm

input_sents = ["ខ្ញុំទៅសាលា", "សាលារៀនខ្ញុំនៅព្រែកលាប។"]
res = tokenize_sentences_bilstm(input_sents)

print(res)

To include POS Tagging in the results: 😎

Replace the below code from:

seq,pos = decode(pred_sent, lines[i])
result = [s for s in seq ]

to this:

seq,pos = decode(pred_sent, lines[i])
result = [s+"/"+p for s,p in zip(seq,pos) ]

@Created by Nakanyseth VUTH