KNLP | BI-LSTM Word Segmentation and POS Tagging tool

bilstm_tokenizer is used for tokenizing text file or input sentences. tokenize_file_bilstm is for tokenizing text file and tokenize_sentences_bilstm is for tokenizing input sentences.

Installation

git clone https://github.com/nakanyseth-vuth/git
cd segmentation
pip install -r requirements.txt

Usage 😮 🔑

Import the funtions to your code:

from bilstm_tokenizer import tokenize_sentences_bilstm, tokenize_file_bilstm

input_sents = ["ខ្ញុំទៅសាលា", "សាលារៀនខ្ញុំនៅព្រែកលាប។"]
res = tokenize_sentences_bilstm(input_sents)

print(res)

To include POS Tagging in the results: 😎

Replace the below code from:

seq,pos = decode(pred_sent, lines[i])
result = [s for s in seq ]

to this:

seq,pos = decode(pred_sent, lines[i])
result = [s+"/"+p for s,p in zip(seq,pos) ]

@Created by Nakanyseth VUTH

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

KNLP | BI-LSTM Word Segmentation and POS Tagging tool

Installation

Usage 😮 🔑

To include POS Tagging in the results: 😎

Files

README.md

Latest commit

History

README.md

File metadata and controls

KNLP | BI-LSTM Word Segmentation and POS Tagging tool

Installation

Usage 😮 🔑

To include POS Tagging in the results: 😎