Best engines to choose from #816

Nihisil · 2023-07-03T05:04:52Z

Nihisil
Jul 3, 2023

Hello! Thanks for your hard work on that library, it looks really interesting.

I'm learning the Thai language and I thought of using your library for automating flashcard creation through text word tokenizing and transliteration. Basically, I want to input any Thai word or sentence and receive the correct transliteration with set tones signs.

For example: สวัสด should be converted to sa(L) wat(L) dee(M)

There are two methods that can help me achieve this:

But the issue is that they may work with multiple engines, and as a non-native speaker, it is hard for me to estimate their output.

Do you have any recommendations on which engines are better to use for my specific case? The resource usage is not important to me, so any neural network model will work too. I only care about the correctness of the output at this stage.

Thank you!

Answered by wannaphong

Jul 3, 2023

Hello! Our language doesn't has Thai word segmentation (or word tokenization) standard from the planning and regulation of the Thai language, so word segmentation depend on each standards. If you has the resource, you should use deep learning base but It can has out-of-domain problem. (see Handling Cross- and Out-of-Domain Samples in Thai Word Segmentation). If your work is out-of-domain from deep learning model and you doesn't has resource to hire a Thai linguist, you can use newmm and improve Thai dictionary.

View full answer

wannaphong · 2023-07-03T05:31:36Z

wannaphong
Jul 3, 2023
Maintainer

Hello! Our language doesn't has Thai word segmentation (or word tokenization) standard from the planning and regulation of the Thai language, so word segmentation depend on each standards. If you has the resource, you should use deep learning base but It can has out-of-domain problem. (see Handling Cross- and Out-of-Domain Samples in Thai Word Segmentation). If your work is out-of-domain from deep learning model and you doesn't has resource to hire a Thai linguist, you can use newmm and improve Thai dictionary.

0 replies

wannaphong · 2023-07-03T05:33:40Z

wannaphong
Jul 3, 2023
Maintainer

For transliterate, you should use machine learning based model to get the best result. (thai2rom, thai2rom_onnx, or tltk)

0 replies

Nihisil · 2023-07-03T06:06:32Z

Nihisil
Jul 3, 2023
Author

@wannaphong thank you for the fast response! I'm familiar with the running deep learning models and I will check options that you provided.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Best engines to choose from #816

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{title}}

Select a reply

Best engines to choose from #816

Nihisil Jul 3, 2023

Replies: 3 comments

wannaphong Jul 3, 2023 Maintainer

wannaphong Jul 3, 2023 Maintainer

Nihisil Jul 3, 2023 Author

Nihisil
Jul 3, 2023

wannaphong
Jul 3, 2023
Maintainer

wannaphong
Jul 3, 2023
Maintainer

Nihisil
Jul 3, 2023
Author