Skip to content

IcyTempest/segmentation

 
 

Repository files navigation

KNLP | BI-LSTM Word Segmentation and POS Tagging tool

bilstm_tokenizer is used for tokenizing text file or input sentences. tokenize_file_bilstm is for tokenizing text file and tokenize_sentences_bilstm is for tokenizing input sentences.


Installation

git clone https://github.com/nakanyseth-vuth/git
cd segmentation
pip install -r requirements.txt

Usage 😮 🔑

Import the funtions to your code:

from bilstm_tokenizer import tokenize_sentences_bilstm, tokenize_file_bilstm

input_sents = ["ខ្ញុំទៅសាលា", "សាលារៀនខ្ញុំនៅព្រែកលាប។"]
res = tokenize_sentences_bilstm(input_sents)

print(res)

To include POS Tagging in the results: 😎

Replace the below code from:

seq,pos = decode(pred_sent, lines[i])
result = [s for s in seq ]

to this:

seq,pos = decode(pred_sent, lines[i])
result = [s+"/"+p for s,p in zip(seq,pos) ]

@Created by Nakanyseth VUTH

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%